File release
Data overview
All the data produced in the FANTOM5 collaboration will be shared by any of the FANTOM5 collaborators to encourage a wide range of analysis with following the FANTOM5 rules. Please note that all the data is confidential and don't share with other people before public release. The update of the data (mainly addition of transcriptome profiles) is going to happen routinely. At some stages, in accordance with the status of the main paper analysis/project phases, we are going to make several data freeze, which is a unit of data submission/publication to public. Please read FANTOM5_overview for the project plan.
Shared directory is for the preliminary analyses. The primary data is in LATEST_UPDATE, which is produced by the FANTOM5 production group (WP2). Contributed analysis performed separately from the production pipeline and external relevant data set to be shared commonly are also maintained here. We will make data freeze for paper writing, when needed (not available yet)
Phase1 (snapshot) data freeze
Known issues
- Category is wrong for following samples: CNhs13477,CNhs13478,CNhs13479, CNhs13496,CNhs13497,CNhs13498,CNhs13499, CNhs13500, CNhs13501, CNhs13508, CNhs13099. To be moved from primary cell to cell line.
- Typos (double space) in following samples: CNhs12192, CNhs11820, CNhs10743, CNhs11848, CNhs11859, CNhs11870, CNhs11851, Cnhs12805, CNhs11861, CNhs11873, CNhs11866, CNhs11890, CNhs12824, CNhs12837
- To add tech_rep info for following samples: CNhs12770, CNhs13065
- Rename for following samples: CNhs13552, CNhs13553
- Sample name is wrong for following samples: CNhs12996, CNhs12997, CNhs12310, CNhs12316
Data files
- primary data files - https://fantom5-collaboration.gsc.riken.jp/files/data/shared/FREEZE_PHASE1/
CAGE peaks (DPI clusters)
note that we have two sets, depending on the threshold. One is for expression analysis (called robust set). The other is for promoter identification (called permissive set). See https://fantom5-collaboration.gsc.riken.jp/webdav/home/kawaji/111220-DPI/00readme.html
'Update' data set
Known issues in the latest update
- Some RNA-Seq files (BAM) contain "*" as chromosome name. This can be caused by Trinity/cufflinks. Samtools can handle the "*" chromosome, but bedtools cannot. We are going to fix this issue.
- Archived issues
- CNhs12310, CNhs12316, CNhs14342, CNhs14346 need to be renamed.
- CNhs14298, CNhs14067, CNhs14068, CNhs10743, CNhs11848, CNhs12192, CNhs11859, CNhs11462, CNhs11873, CNhs11866 contain double space in the sample name.
- Cnhs11614~CNhs11631 contain wrong description ("after infection" should be "after stimulation")
- CNhs11163 and CNhs11164 are mapped on hg18, not hg19. These are 'negative control' type experiments (the libraries are loaded onto flow cell without A-tailing), and we expect that these wouldn't affect to any analysis of real samples. We have decided to remove them from the dataset.
- CNhs12510 and CNhs12515 might be swapped. Require further checking - see [fantom5-wp2:01409] and [fantom5-wp4:00526]
- See Requests_and_fixes page, too
Directories
- https://fantom5-collaboration.gsc.riken.jp/files/data/shared/UPDATE_020
- https://fantom5-collaboration.gsc.riken.jp/files/data/shared/UPDATE_019
- https://fantom5-collaboration.gsc.riken.jp/files/data/shared/UPDATE_018
- https://fantom5-collaboration.gsc.riken.jp/files/data/shared/UPDATE_017
- https://fantom5-collaboration.gsc.riken.jp/files/data/shared/UPDATE_016
- https://fantom5-collaboration.gsc.riken.jp/files/data/shared/UPDATE_015 - baseline for phase1 freeze
- https://fantom5-collaboration.gsc.riken.jp/files/data/shared/UPDATE_014
- https://fantom5-collaboration.gsc.riken.jp/files/data/shared/UPDATE_013
- https://fantom5-collaboration.gsc.riken.jp/files/data/shared/UPDATE_012
- an issue on Rat%20hepatocytes%20donor2.CNhs11302.11372-118A4.rn4.ctss.bed.gz (that includes this line CTSS, chr20:55268303..55268304,-, while rn4 chr20 only has 55268282 bases) is corrected.
- https://fantom5-collaboration.gsc.riken.jp/files/data/shared/UPDATE_011
- https://fantom5-collaboration.gsc.riken.jp/files/data/shared/UPDATE_010
- strange symbols in the sample name in CNhs11316 is corrected.
- three libraries in wrong categories (tissue / primary cells) are corrected.
- https://fantom5-collaboration.gsc.riken.jp/files/data/shared/UPDATE_009 - recommended version for the Feb meeting
- https://fantom5-collaboration.gsc.riken.jp/files/data/shared/UPDATE_008
- https://fantom5-collaboration.gsc.riken.jp/files/data/shared/UPDATE_007
- https://fantom5-collaboration.gsc.riken.jp/files/data/shared/UPDATE_006
- https://fantom5-collaboration.gsc.riken.jp/files/data/shared/UPDATE_005
- https://fantom5-collaboration.gsc.riken.jp/files/data/shared/UPDATE_004
- https://fantom5-collaboration.gsc.riken.jp/files/data/shared/UPDATE_003
- https://fantom5-collaboration.gsc.riken.jp/files/data/shared/UPDATE_002
- https://fantom5-collaboration.gsc.riken.jp/files/data/shared/UPDATE_001