Task assignments

From Wiki
Jump to navigationJump to search

Task1: Sample acquisition/provision:
Committed names: Al Forrest, Peter Klinken, Peter Heutink, Claudio Schneider, Kim Summers, Terry Meehan
Output requirements/formats: Sample list - text to Al ASAP
Milestones:
      1. List of missing cellular states on wiki – March 10
      2. Potential sources for missing states – March 10
      3. Acceptance of last snapshots for phase 1 – March 31

STATUS COMPLETED: Sample collection is completed, phase1 samples are completed, phase2 samples will be completed by march2011


Task2: Sample Annotation:
Committed names: Terry Meehan, Win Hide, Tom Freeman, Al Forrest + sample providers
Output requirements/formats: Cell ontology mapping, Tissue ontology mapping
  Tom’s suggested Sample Annotation
     1. UniqueID: Riken tracking number
     2. Unique_sample_name: Adult_liver.r1 , Tcell_HPC-induced_10h (preferably short, informed by Cell_Ontology)
     3. Species: Hs., Mm., etc.
     4. Sample_Class: Adult_tissue (AT.), Foetal_tissue (FT.), Primary_cell (PC.), Cell_culture (CC.), Time_course (TC-PC.), (TC-CC) etc.
     5. Developmental stage: Adult, Foetal
     6. Pathology: Normal, disease
     7. Tissue: Liver, brain, heart etc
     8. Cell_Ontology (maybe more than one level, to be used in primary sample ordering): Mesenchymal etc, etc
     9. Cell_type: CO approved name e.g. Monocyte, Smooth_muscle, Intestinal_epithelium etc.
    10. Pertubation: LPS, HPC
    11. Time: 0, 1h, 2h, 3h etc
    12. Replicate: r1, r2, r3
    13. Collection_method: FACS_sorting etc. with short description
    14. Collection_method_reference: Pubmed_ID, web_address, protocol
    15. Source: Roslin_Institute
    16. Primary_contact: Joe_Bloggs
    17. Email: joe.bloggs@roslin.ed.ac.uk
    18. Tel: 0044 131 123 4567
    19. Unique Donor ID

Milestones:
     1. Annotation of Data freeze 1 samples (cell, tissue – minimum to compare replicates)
     2. Cell ontology – completion by March 15?
     3. Tissue ontology – March 15  

STATUS COMPLETED: Phase 1 annotation completed - Chris Mungall/Terry Meehan (FONSE) and Kawaji/Shimoji (FANTOM5 resource viewer https://fantom5-collaboration.gsc.riken.jp/resource_wiki/index.php/Main_Page)   


Task3: Mapping:
Committed names: Timo Lassmann, Geoff Faulkner
Output requirements/formats:
     BAM
     CTSS
Milestones:
     1. Rescuing assessment (March 5)
     2. Decision (March 7)
     3. Genome version agreement – comment on pseudoautosomal regions
     4. Mapping of Data freeze 1 (GENAS??)

STATUS COMPLETED: Mapped using Delve, >=Q20 reads used - Timo Lassmann


Task4: Tag clustering (Working Group 1 - Tag clustering)
Committed names: Piero Carninci, Cesare Furlanello, Piotr Balwierz, Martin Taylor, Martin Frith, Kawaji-san, Boris Lenhard, Albin Sandelin - clustering. David Hume, Ben Brown, Al Forrest - assessment
Output requirements/formats:
     1. Clusters defined as regions on a genome with strand, start, stop, peak and build(Bed?)
     2. Intersect of the defined regions as an expression matrix/table across all samples (ie. intersect of clusters with expression in all libraries)
     3. Possibly.. intersected CTSS file of same regions to allow study of independent peak regulation
     4. Peak rec
 Tom’s suggestion
     Data Matrix Annotation
     To be provided by Riken as raw counts (.raw) and tags per million (.tpm) but ultimately data may be normalised by other methods (.xxx)
     1. UniqueID: Gene Level (MGD, HGNC ID), Transcript or promoter level (ABC1.1, ABC1.2 etc), ncRNA (Leonard’s ID)
     2. Class: Gene_promoter, ncRNA_promoter, other
     3. Chromosomal_location: e.g. alignment range, promoter peak
     4. Chromosome: Chr1
     5. Associated_seqs: refseq, ensembl_gene/transcript, ncRNA_ref
     6. Other_associations: KEGG, GO etc

Milestones:
    1. Agreement on format and training/tuning/assessment data and metrics (March 5)
    2. Competitive tracks available – March 25
    3. Assessment – April 5
    4. Run over paper 1 data freeze – mid April

STATUS COMPLETED: Kawaji DPI clusters decided, filters for permissive and robust sets

New related task - confirmation of independence


Task5: State enriched(expression weighted) motif predictions (ab-initio and known): (Working Group 3 - Motifs and conservation)
Committed names:Vlad Bajic, Michiel de Hoon, Boris Lenhard, Kenneth Baillie, Timo Lassmann, Piotr Balwierz, Yulia Medvedeva
Output requirements/formats:
Milestones:
     1. Ranked list of motifs enriched in each state for release 010
     2. Bed file(or similar) with actual predictions for release 010
     3. As above on FREEZE 1


Task5: Tag Cluster Annotation:
Committed names: Piero Carninci, Laurens Wilming, Timo Lassmann, Richard Baldarelli, Juha Kere, Leonard Lipovich(long ncRNA promoters, sense-antisense pair promoters, bidirectional promoters), Boris Lenhard(enhancers), Alison Meynert, Yulia Medvedeva (CpG islands, DNA methylation, Repeats)
Output requirements/formats:
Milestones:
     1. Agreement on annotations to use (now?) (I will supply the global human lncRNAome and sense-antisense coordinates for the annotation. - LL)
     2. Annotation of release 009 clusters using agreed strategy – available ASAP
     3. Annotation of data freeze 1 (ASAP after the clusters are provided)


Task6: Cross species promoter mapping:
Committed names: Martin Taylor, Colin Semple, Vlad Bajic, Peter Heutink, Max Burroughs, Soichi Ogishima, Leonard Lipovich (if we are doing non-conserved promoters)
Output requirements/formats:

  Martins Taylor's suggested format

     species1_tag_cluster_ID
     species1_genome_assembly_ID
     species1_chrom
     species1_refPos
     species1_strand
     species2_tag_cluster_ID
     species2_genome_assembly_ID
     species2_chrom
     species2_refPos
     species2_strand
     projection_method (a list of rule sets whose criteria were met*)
     projection_distance (a measure of confidence in the projection)
     projection_result (e.g. species1_rescue, species2_rescue....)

     *e.g. identical projected modal tag position, quantile overlap of
     projected tag cluster distributions, cluster coordinate overlap.

Milestones:
     1. Prediction/mapping of human promoters using mouse data (April 15)
     2. Validation on the matched 10-30 human-mouse pairs (ie predict with mouse and check with actual human data). Assessment of strategy.
     3. Prediction of human counterpart promoters for the rare mouse cells that we have collected (eg. intestinal stem cells, inner ear hair cells etc.).
     4. Do we need a preliminary count of nonconserved human promoters (those absent from the other 4 F5 species)? (LL)
    


Task7: Expression visualization (gene level AND TSScluster level):
Committed names: Tom Freeman,Kenneth Baillie, Carsten Daub, Win Hide, Boris Lenhard, Albin Sandelin 
Output requirements/formats: potential figures for displaying relationship of samples based on expression clustering
Milestones:
     1. Gene level information humanx3(tissue, cell line, primary cells) -> Biolayout webstart
     2. Distance matrix, genes and pathways that separate each state - Win Hide


Task8: Promoter level expression analysis (differentially expressed genes/markers/transcription factors/ncRNAs):
Committed names: Piero Carninci, Al Forrest, Albin Sandelin, Vlad Bajic, Yulia Medvedeva, Hideya Kawaji, Ben Brown, Tom Freeman, Harukazu Suzuki, Colin Semple, David Human, Cesare Furlanello, Kenneth Baillie and Jess Mar, Timothy Ravasi, Leonard Lipovich
Output requirements/formats:
Milestones:
1. Agreement on metric for specificity/enrichment – entropy Ravasi March 5
2. Ranked list of most specific TFs for each state
3. Ranked list of ncRNAs specific for each state (incl. curated lncRNAs that define specific steady states -LL)
4. Ranked list of all genes specific for each state


Task9: Expression data mining:
Committed names: Carlo, Tim
Output requirements/formats:
Milestones:

1. Explore the data set using maximum curvilinearity methods and see if it helps classify the layers

Boosting? SVMs?


Task10: Motif activity and TF expression integration (including deorphanization):
Committed names: Vlad Bajic, Michiel de Hoon, Piotr Balwierz, Yulia Medvedeva, Matthias Harbers, Al Forrest
Output requirements/formats:
Milestones:

1. Expanding Motifs
2. Core predicted set
3. Attempt at integrating list of sample enriched TFs and sample enriched motifs.
4. Prioritized orphan associations for validation


Task10: Sanity check:
Committed names: Al Forrest, Piero Carninci
Output requirements/formats:
Milestones:
1. Assessment of strategy above
2. OK or repeat from step XYZ


Task11: Data dissemination and nomenclature:
Committed names: Win Hide, David Hume, Piero Carninci, Richard Baldarelli, Vlad Bajic, Tom Freeman, Yoshihide Hayashizaki, John Quackenbush, Laurens, Terry Meehan, Hideya Kawaji, Timo Lassmann, Albin Sandelin
Output requirements/formats:
     ‘Promoter’ – dissemination
     ‘expression’ – dissemination
     ‘cell/sample’ – dissemination?
Milestones:
     1. Agreement on strategy
     2. Agreement on formats
     3. Agreement on third party data repositories (especially UCSC and Ensembl)
     4. Core promoters with accessions and link to our data nomenclature


Task12: ChipSeq Validation:
Committed names: RIKEN OSC, Tim Ravasi, Al Forrest, Matthias Harbers, WP9
Output requirements/formats:
Milestones:
     1. Target selection – considering cell type, predictions, chip grade antibody, impact
     2. Assessment of targets
     3. Motif finding
     4. Public chip-seq data


Task13: Public data integration:
Committed names: Vlad, David, Yulia, Louise, Thomas, Terry, Matthias

Milestones:

     1. Extract public Chip-seq data

     2. Extract public mouse KO

     3. Extract edges from literature mining (vlad)

     4. Extract in-situ mapping from Allen brain atlas, eurexpress, emage

     5. Extract localization information from human protein atlas


Task14: KDCAGE Validation:
Committed names: RIKEN OSC, WP9 (intersection of chip-seq known and )
Output requirements/formats:
Milestones:

     1. Target selection – considering cell type, predictions, impact
     2. Assessment of targets
     3. Motif finding


Task15: In-situ validation: (likely very late in project)
Committed names: Juha? Peter H? Silivia,
Output requirements/formats:
Milestones:
   1. Target selection
   2. In-situ on a small set of human samples
   3. Likely very late in the project

Task16: Paper4 - Cross species network conservation:
Committed names: Al Forrest, Martin Taylor, Peter Heutink, Michiel de Hoon, Mamoon Rashid, Colin Semple, Vlad Bajic, Max Burroughs, Soichi Ogishima, Leonard Lipovich
Output requirements/formats:
Milestones:

     1. Gene level ortholog pairs (CDS matching)
     2. TSS cluster level ortholog pairs (genome matching)
     3. Ortholog expression correlations (use expression data from above group, and ortholog mappings from 1 and 2)
     4. State specific motif enrichment (conservation independent)
     5. Tf state specific expression
     6. siRNA KD of SMC specific TFs in multiple species
     7. Potential chip-seq
     8. Availability of Macaque samples? Aortic SMC, hepatocytes, Bone marrow MSCs
     9. Macrophages across all species? Peripheral blood (PBMCs)
     10. Integration of cis-networks (bidirectional promoters; TF to lncRNA; antisense lncRNA to sense mRNA gene) with existing networks -LL
     11. Examples of specific non-conserved networks -LL