File release: Difference between revisions

From Wiki
Jump to navigationJump to search
No edit summary
No edit summary
Line 1: Line 1:
== Data structure ==

[https://fantom5-collaboration.gsc.riken.jp/files/data/shared/ Shared directory] is for the preliminary analyses. The primary data is in [https://fantom5-collaboration.gsc.riken.jp/files/data/shared/LATEST_UPDATE LATEST_UPDATE], which is produced by the FANTOM5 production group (WP2). [https://fantom5-collaboration.gsc.riken.jp/files/data/shared/contrib/ Contributed analysis] performed separately from the production pipeline and [https://fantom5-collaboration.gsc.riken.jp/files/data/shared/external/ external relevant data set] to be shared commonly are also maintained here. We will make [https://fantom5-collaboration.gsc.riken.jp/files/data/freeze/ data freeze] for paper writing, when needed (not available yet)

This page is about [https://fantom5-collaboration.gsc.riken.jp/files/data/shared/LATEST_UPDATE LATEST_UPDATE].

== Known issue ==
== Known issue ==


Line 15: Line 21:
* '@PG' tags in the BAM files is not included for some libraries
* '@PG' tags in the BAM files is not included for some libraries


== UPDATE ==
== Updates ==


* https://fantom5-collaboration.gsc.riken.jp/files/data/shared/UPDATE_009 - recommended version for the Feb meeting
* https://fantom5-collaboration.gsc.riken.jp/files/data/shared/UPDATE_009 - recommended version for the Feb meeting

Revision as of 19:50, 8 February 2011

Data structure

Shared directory is for the preliminary analyses. The primary data is in LATEST_UPDATE, which is produced by the FANTOM5 production group (WP2). Contributed analysis performed separately from the production pipeline and external relevant data set to be shared commonly are also maintained here. We will make data freeze for paper writing, when needed (not available yet)

This page is about LATEST_UPDATE.

Known issue

Issues that will be corrected in the next release

  • strange symbols in the sample name: CNhs11316
  • inconsistent sample name (t cell / T cell)
  • three libraries are in wrong categories (tissue / primary cells)

Issues that will be corrected at some stage

  • Incomplete 'study' SDRF : process after sequencing
    • file names will be added at some stage
    • sex (male/female/unknown) information used in the mapping will be added at some stage.
  • Genomic coordinates of HeliScopeCAGE reads sometimes strange (beyond chromosomal size)
    • BAM file (delve mapping file) and CTSS files will be updated at some stage. Note that It only affects tens of reads.
  • Long reads (>64bp) of HeliScoeCAGE are mapped wrongly. Note that most of such long reads are bad reads (CTAG repeats, a.k.a BAO)
  • '@PG' tags in the BAM files is not included for some libraries

Updates