Discussion/Promoter definition
From Wiki
(Redirected from Promoter definition)
Jump to navigationJump to search
- See ML threads starting from [fantom5:00032] and [fantom5:00064]
Data format
proposal 1
A OSCtable format. Specifically,
- A tab delimited file
- At the beginning of the file, these lines should appear:
##ProtocolREF = ARBITRARY_NAME_TO_DISTINGUISH_MEHTODS ##Date = 2011-01-12 ##InputFile = https://fantom5-collaboration.gsc.riken.jp/files/data/shared/UPDATE_008/f5pipeline/human.cell_line.hCAGE/*.ctss.bed.gz ##InputFile = https://fantom5-collaboration.gsc.riken.jp/files/data/shared/UPDATE_008/f5pipeline/human.primary_cell.hCAGE/*.ctss.bed.gz ... ##ContactName = YOUR_NAME ##ContactEmail = YOUR_MAIL
- The first 4 columns should be: chrom, start.0base, stop, strand
- The subsequent columns (arbitrary order) should include the number of reads in a following way: counts.RNA_DESCRIPTION.CNhsXXXX.XXXX-XXXX
- The subsequent columns (arbitrary order) should include the tpm (tags per million) in a following way: tpm.RNA_DESCRIPTION.CNhsXXXX.XXXX-XXXX
for example,
| chrom | start.0base | stop | strand | counts.Burkitt%27s%20lymphoma%20cell%20line%3aDAUDI.CNhs10739.10422-106C8 | counts.acute%20lymphoblastic%20leukemia%20%28B-ALL%29%20cell%20line%3aBALL-1.CNhs11251.10455-106G5 | counts.acute%20lymphoblastic%20leukemia%20%28B-ALL%29%20cell%20line%3aNALM-6.CNhs11282.10534-107G3 | tpm.Burkitt%27s%20lymphoma%20cell%20line%3aDAUDI.CNhs10739.10422-106C8 | tpm.acute%20lymphoblastic%20leukemia%20%28B-ALL%29%20cell%20line%3aBALL-1.CNhs11251.10455-106G5 | tpm.acute%20lymphoblastic%20leukemia%20%28B-ALL%29%20cell%20line%3aNALM-6.CNhs11282.10534-107G3 |
|---|---|---|---|---|---|---|---|---|---|
| chr1 | 10 | 20 | + | 8 | 14 | 103 | 0.8 | 1.4 | 10.3 |
| chr1 | 30 | 40 | - | 24 | 3 | 7 | 2.4 | 0.3 | 0.7 |
proposal 2
- BED file http://genome.ucsc.edu/FAQ/FAQformat.html#format1
- 'Score', and 'name' can be arbitrary.
- Expression intensities will be added later on at the same time, with a single script, based on only chrom, start, stop, and strand information
Suggestions/requests
- Many analyses will require a single point coordinate for a TSS, rather than a range. Previously we have used modal tag position. Modal tag may or may not end up being the right choice for Fantom5 but the reference position probably won't be the 5' or 3' edge of the tag distribution. So a useful additional field in this format would be "refPos".
- Good point. How about treat this as distinct clustering method? For example, one file for 'tag cluster' method, and one for 'refPos' method [kawaji]