CAGE Cluster update: Difference between revisions
No edit summary |
No edit summary |
||
| Line 34: | Line 34: | ||
== Expression profiles to identify cellular states == |
== Expression profiles to identify cellular states == |
||
participants: Cesare, Marco, Yishai, Piotor, Kawaji, Christine, Tom, Al |
|||
| ⚫ | |||
| ⚫ | |||
Christine, Tom, Al |
Christine, Tom, Al |
||
Revision as of 15:17, 21 October 2011
This page is to update information about CAGE clusters on UPDATE_012
CAGE clusters for the main paper
Please contact to fantom5-wp4@gsc.riken.jp about the whole status, and corresponding providers for individual questions.
'Promotome' set
Note that the set is not finalized yet. The current status is promotome v1.0 RC1, where only the CAGE clusters supported by independent resources (evidences) are included.
- available data: expression and human readable (but computer-generated) description for each clsuter. See method for the cluster description here: 110926-promotome-v1.0-RC1.pdf
- directory: https://fantom5-collaboration.gsc.riken.jp/webdav/home/kawaji/110804-dpi-clusters-expression
- 'annotations' discussed by the cluster working group (see below) is not included yet - rough plan is shared and agreed, however, the details are not clearly agreed yet.
Genomic coordinates
https://fantom5-collaboration.gsc.riken.jp/webdav/home/kawaji/110720-dpi-clusters/
ZENBU configuration
TSS classifier
[Sebastian et al.]
- https://fantom5-collaboration.gsc.riken.jp/webdav/home/seb/update12/documentation_ud12.pdf
- https://fantom5-collaboration.gsc.riken.jp/webdav/home/seb/update12/rescue/documentation_ud12_update.pdf
[Timo]
Expression profiles to identify cellular states
participants: Cesare, Marco, Yishai, Piotor, Kawaji, Christine, Tom, Al
Removal of poor quality libraries
Christine, Tom, Al
Expression thresholding
For a shared set for expression profiles to identify cellular states, we would like to set a threshold of expression (not relying on any other evidences - such as DNase HS site, since it could ignore rare cells). We will decide this until 28th Oct, 2011. The candidates are:
- 10 reads at least in a library (while this is arbitrary choise)
- Data driven approach (Piotr will give a try)
Normalization
Key question: collapsing TSS will change the data structure or not?
Yishai and Cesare will compare the normalised data to check whether the thresholding and collapsing the cluster will impact on the normalised structure. Stick with TSS.
Key question: which normalization scheme should be used?
Compare power-law method of Piotr's or Yishai with EdgeR TMM RLE and TPM. See the 'Impact of normalization' section below, too.
[Plan and progress]
- step0: set the target data
- DPI cluster, >= 10 reads in a library, at least
- https://fantom5-collaboration.gsc.riken.jp/webdav/home/kawaji/110804-dpi-clusters-expression/hg19-v0.0/tc.chrom_tpm.annotation.max_counts10.txt.gz
- step1: different normalization
- Piotr
- TPM
- TMM
- RLE
- Step2: evaluation
- Systematic evaluation - Cesare, Marco, Yishai
- Biological evaluation - Christine, Tom, Al
Impact of Normalization on FANTOM5 hCAGE promoterome: Implementation Plan (0.1)
- We (Trento, Columbia – Piotr is invited) will start implementing the normalization check scheme. It means building a generic script to run on DPI TSS tag cluster set(s) when they are ready. Can be run on any other clustering scheme for TSS, or before the clustering. Hopefully
- Run power-law fits on single libraries, compare with pooled data; check range of parameters, infer stability on pooled; compare with QC indicators on single libraries; consider; check for batch effect. Start from YB script; Piotr’s version as available.
- Match with normalization and QC methods from edgeR, also in script form. Keep it generic, automate.
- Apply on existing tag DPI clusters. Run also before and after other clustering. Prepare reporting scheme.
- Verify impact on a few examples of downstream analysis, e.g. quality/stability/accuracy of classifiers and networks
- Apply on DPI tag cluster set(s) as they are ready and report, to be used for main paper
- It can make sense to reuse all material for the “peak clustering and normalization” satellite paper to submit aside the main paper, or later
Cluster annotations discussed in the cluster working group
Method Description
Transcript model derived annotation protocol
Annotation Results
Human Decomposition-based Peak Identification (DPI) cluster
- https://fantom5-collaboration.gsc.riken.jp/webdav/home/nbertin/CAGE-Tag-Cluster-Annotation_Aug11/tc.decompose_smoothing_merged.hg19.annotations/
- tc.decompose_smoothing_merged.hg19.CpGislands.annotated.osc.gz
- tc.decompose_smoothing_merged.hg19.EST.annotated.osc.gz
- tc.decompose_smoothing_merged.hg19.Ensembl.non_coding.annotated.sym.osc.gz
- tc.decompose_smoothing_merged.hg19.Ensembl.protein_coding.annotated.sym.osc.gz
- tc.decompose_smoothing_merged.hg19.F5_human_lncRNAome.annotated.osc.gz
- tc.decompose_smoothing_merged.hg19.RefSeq.non_coding.annotated.sym.osc.gz
- tc.decompose_smoothing_merged.hg19.RefSeq.protein_coding.annotated.sym.osc.gz
- tc.decompose_smoothing_merged.hg19.TBP_JASPAR_CORE_MA0108.2.annotated.osc.gz
- tc.decompose_smoothing_merged.hg19.gencode-pseudo.annotated.sym.osc.gz
- tc.decompose_smoothing_merged.hg19.gencode.non_coding.annotated.sym.osc.gz
- tc.decompose_smoothing_merged.hg19.gencode.protein_coding.annotated.sym.osc.gz
- tc.decompose_smoothing_merged.hg19.knownGene.non_coding.annotated.sym.osc.gz
- tc.decompose_smoothing_merged.hg19.knownGene.protein_coding.annotated.sym.osc.gz
- tc.decompose_smoothing_merged.hg19.mRNA.annotated.osc.gz
- tc.decompose_smoothing_merged.hg19.rmsk.annotated.repClass.repFamily.osc.gz
Mouse Decomposition-based Peak Identification (DPI) cluster
- https://fantom5-collaboration.gsc.riken.jp/webdav/home/nbertin/CAGE-Tag-Cluster-Annotation_Aug11/tc.decompose_smoothing_merged.mm9.annotations/
- tc.decompose_smoothing_merged.mm9.CpGislands.annotated.osc.gz
- tc.decompose_smoothing_merged.mm9.EST.annotated.osc.gz
- tc.decompose_smoothing_merged.mm9.Ensembl.non_coding.annotated.sym.osc.gz
- tc.decompose_smoothing_merged.mm9.Ensembl.protein_coding.annotated.sym.osc.gz
- tc.decompose_smoothing_merged.mm9.RefSeq.non_coding.annotated.sym.osc.gz
- tc.decompose_smoothing_merged.mm9.RefSeq.protein_coding.annotated.sym.osc.gz
- tc.decompose_smoothing_merged.mm9.TBP_JASPAR_CORE_MA0108.2.annotated.osc.gz
- tc.decompose_smoothing_merged.mm9.knownGene.non_coding.annotated.sym.osc.gz
- tc.decompose_smoothing_merged.mm9.knownGene.protein_coding.annotated.sym.osc.gz
- tc.decompose_smoothing_merged.mm9.mRNA.annotated.osc.gz
- tc.decompose_smoothing_merged.mm9.rmsk.annotated.repClass.repFamily.osc.gz
Other clusters on UPDATE_012
Please don't hesitate to use/produce other clusters for other purposes
- by Maritn Frith https://fantom5-collaboration.gsc.riken.jp/webdav/home/mcfrith/110725-update012-pclu/
- by Michael Rehli https://fantom5-collaboration.gsc.riken.jp/wfebdav/home/rehli/
- by Cesare Furlanello, Marco Chierici, Davide Albanese, Marco Roncador - https://fantom5-collaboration.gsc.riken.jp/webdav/home/FBK/110905-clusters/
Related pages
- Clustering evaluation - see here for the clustering 'competition', which is performed on UPDATE_011
- Promoter definition