CAGE Cluster update: Difference between revisions

From Wiki
Jump to navigationJump to search
No edit summary
No edit summary
 
(45 intermediate revisions by 2 users not shown)
Line 5: Line 5:
Please contact to fantom5-wp4@gsc.riken.jp about the whole status, and corresponding providers for individual questions.
Please contact to fantom5-wp4@gsc.riken.jp about the whole status, and corresponding providers for individual questions.


=== Expression table with annotation ===
=== 'Promotome' set ===

https://fantom5-collaboration.gsc.riken.jp/webdav/home/kawaji/110804-dpi-clusters-expression
Note that the set is not finalized yet. The current status is promotome v1.0 RC1, where only the CAGE clusters supported by independent resources (evidences) are included.
(promotome v1.0 RC1 is available at 26 Sep, 2011)


=== Genomic coordinates ===
=== Genomic coordinates ===
Line 17: Line 17:
[https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=od32NnsKIlWqzU3jHclziD human] and [https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=bippavM-Cxqj3r6PYf0Zz mouse]
[https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=od32NnsKIlWqzU3jHclziD human] and [https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=bippavM-Cxqj3r6PYf0Zz mouse]


=== TSS classifier ===
=== TSS classifier ===


[Sebastian et al.]
[Sebastian et al.]
* https://fantom5-collaboration.gsc.riken.jp/webdav/home/seb/update12/documentation_ud12.pdf
* https://fantom5-collaboration.gsc.riken.jp/webdav/home/seb/update12/rescue/documentation_ud12_update.pdf


*https://fantom5-collaboration.gsc.riken.jp/webdav/home/seb/update12/documentation_ud12.pdf
[Timo]
* https://fantom5-collaboration.gsc.riken.jp/webdav/home/Lassmann/TSS_classification/
*https://fantom5-collaboration.gsc.riken.jp/webdav/home/seb/update12/rescue/documentation_ud12_update.pdf

[Timo]

*https://fantom5-collaboration.gsc.riken.jp/webdav/home/Lassmann/TSS_classification/

== Expression profiles to identify cellular states ==

participants: Cesare, Marco, Yishai, Piotr, Kawaji, Christine, Tom, Al

<br>

=== Removal of poor quality libraries ===

see [[Sample_QC]] for the current status


=== Cluster thresholding ===

For a shared set for expression profiles to identify cellular states, we would like to set a threshold of expression (not relying on any other evidences - such as DNase HS site, since it could ignore rare cells). Piotr, Erik vN, Kawaji, and Al explored several parameters, and we concluded as below (Nov 30, 11):

* For TSS identification, we use "max counts at CTSS > 2" as a permissive threshold ( [https://fantom5-collaboration.gsc.riken.jp/webdav/home/kawaji/110720-dpi-clusters/hg19/thresholding_test/in/ctssMaxCounts/tc.decompose_smoothing_merged_ctssMaxCounts2.bed.gz this threshold on UPDATE_012 DPI clusters] )
* For motif and expression analysis, we use "max counts at CTSS >10 AND max tpm at CTSS > 1" as a robust threshold. ( [https://fantom5-collaboration.gsc.riken.jp/webdav/home/kawaji/110720-dpi-clusters/hg19/thresholding_test/in/ctssMaxCounts_ctssTpm/tc.decompose_smoothing_merged_ctssMaxCounts10___tc.decompose_smoothing_merged_ctssMaxTpm1.0.bed.gz this threshold on UPDATE_012 DPI clusters] )

See https://fantom5-collaboration.gsc.riken.jp/wiki/images/b/b4/111131-F5telecon-kawaji.pdf for details.

=== Normalization ===

question 1: '''collapsing TSS will change the data structure or not?''' - Yishai and Cesare will compare the normalised data to check whether the thresholding and collapsing the cluster will impact on the normalised structure. Stick with TSS.

*results: Yes (at least in Yishai's normalization approach). Yishai compared "Normalization before collapsing" and "Normalization after collapsing", and the latter is better (that is, expression of ACTB and TUBB is relatively uncorrelated).

<br> question 2: '''Key question: which normalization scheme should be used?''' - Compare power-law method of Piotr's or Yishai with EdgeR TMM RLE and TPM. See the 'Impact of normalization' section below, too.

[Plan and progress]

*step0: set the target data
**DPI cluster, &gt;= 10 reads in a library, at least
**https://fantom5-collaboration.gsc.riken.jp/webdav/home/kawaji/110804-dpi-clusters-expression/hg19-v0.0/tc.chrom_tpm.annotation.max_counts10.txt.gz

*step1: normalization results
**Piotr https://fantom5-collaboration.gsc.riken.jp/webdav/home/balwierz/NormalizedKawajiClusterExpression/tc.normalized.piotr.max_counts10.txt.gz
**TPM https://fantom5-collaboration.gsc.riken.jp/webdav/home/kawaji/110804-dpi-clusters-expression/hg19-v0.0/tc.chrom_tpm.annotation.max_counts10.txt.gz
**RLE based TPM https://fantom5-collaboration.gsc.riken.jp/webdav/home/kawaji/110804-dpi-clusters-expression/hg19-v0.0-edgeR-normalization/tc.max_counts10.rle_tpm.txt.gz
**TMM based TPM https://fantom5-collaboration.gsc.riken.jp/webdav/home/kawaji/110804-dpi-clusters-expression/hg19-v0.0-edgeR-normalization/tc.max_counts10.tmm_tpm.txt.gz
**Yishai's method - Rank-invariant normalization (RINO): https://fantom5-collaboration.gsc.riken.jp/webdav/home/yishai/tc.rino.chrom_tpm.annotation.txt.gz

*Step2: Systematic evaluation - Cesare, Marco, Yishai
**Box plot - https://fantom5-collaboration.gsc.riken.jp/webdav/home/kawaji/111117-normalization-evaluation/boxplot/boxplot.4methods.png [https://fantom5-collaboration.gsc.riken.jp/webdav/home/kawaji/111117-normalization-evaluation/Yishai-boxplot.3methods.readme.txt (readme)]
**scatter plot and R^2 on a time course (Marco) - https://fantom5-collaboration.gsc.riken.jp/webdav/home/kawaji/111117-normalization-evaluation/Marco-timecourse-scatter-R2/ [https://fantom5-collaboration.gsc.riken.jp/webdav/home/kawaji/111117-normalization-evaluation/Marco-timecourse-scatter-R2.readme.txt (readme)]
**MA plots on 6 random pairs https://fantom5-collaboration.gsc.riken.jp/webdav/home/kawaji/111117-normalization-evaluation/Marco-MA/

*Step3: Inspection from biological perspective - Christine, Tom, Al
** Pearson's correlation-based network https://fantom5-collaboration.gsc.riken.jp/webdav/home/kawaji/111117-normalization-evaluation/Norm_90P_summary_file.xlsx

=== Impact of Normalization on FANTOM5 hCAGE promoterome: Implementation Plan (0.1) ===

#We (Trento, Columbia – Piotr is invited) will start implementing the normalization check scheme. It means building a generic script to run on DPI TSS tag cluster set(s) when they are ready. Can be run on any other clustering scheme for TSS, or before the clustering. Hopefully
##Run power-law fits on single libraries, compare with pooled data; check range of parameters, infer stability on pooled; compare with QC indicators on single libraries; consider; check for batch effect. Start from YB script; Piotr’s version as available.
##Match with normalization and QC methods from edgeR, also in script form. Keep it generic, automate.
##Apply on existing tag DPI clusters. Run also before and after other clustering. Prepare reporting scheme.
##Verify impact on a few examples of downstream analysis, e.g. quality/stability/accuracy of classifiers and networks
#Apply on DPI tag cluster set(s) as they are ready and report, to be used for main paper
#It can make sense to reuse all material for the “peak clustering and normalization” satellite paper to submit aside the main paper, or later

=== Description of clusters ===
* assigned description (with association to gene models) according to [https://fantom5-collaboration.gsc.riken.jp/wiki/index.php/File:111208-telecon-kawaji.pdf this process]
** [https://fantom5-collaboration.gsc.riken.jp/webdav/home/kawaji/110720-dpi-clusters/hg19/agreed_thresholds/ data directory]
** [https://fantom5-collaboration.gsc.riken.jp/ucsc/cgi-bin/hgTracks?db=hg19&org=human&hgt.customText=https://oscf5:0114collab@fantom5-collaboration.gsc.riken.jp/webdav/home/kawaji/110720-dpi-clusters/hg19/ucsc_config.txt config on UCSC mirror]

== Cluster annotations discussed in [[Tag Cluster Annotation|the cluster working group]] ==

==== Method Description ====

[https://fantom5-collaboration.gsc.riken.jp/wiki/index.php/Transcript_model_derived_annotation_protocol#Promoter_annotations_of_.22UPDATE_012_Decomposition-based_Peak_Identification_.28DPI.29_cluster___.22 Transcript model derived annotation protocol]

==== Annotation Results ====


== Cluster annotations discussed in [[Tag_Cluster_Annotation|the cluster working group]] ==
==== Method Description ====
[https://fantom5-collaboration.gsc.riken.jp/wiki/index.php/Transcript_model_derived_annotation_protocol#Promoter_annotations_of_.22UPDATE_012_Decomposition-based_Peak_Identification_.28DPI.29_cluster___.22 Transcript model derived annotation protocol]
==== Annotation Results ====
*[https://fantom5-collaboration.gsc.riken.jp/webdav/home/nbertin/CAGE-Tag-Cluster-Annotation_Aug11/ webdav repository]
*[https://fantom5-collaboration.gsc.riken.jp/webdav/home/nbertin/CAGE-Tag-Cluster-Annotation_Aug11/ webdav repository]

====== Human Decomposition-based Peak Identification (DPI) cluster ======
====== Human Decomposition-based Peak Identification (DPI) cluster ======


*https://fantom5-collaboration.gsc.riken.jp/webdav/home/nbertin/CAGE-Tag-Cluster-Annotation_Aug11/tc.decompose_smoothing_merged.hg19.annotations/
*https://fantom5-collaboration.gsc.riken.jp/webdav/home/nbertin/CAGE-Tag-Cluster-Annotation_Aug11/tc.decompose_smoothing_merged.hg19.annotations/
**tc.decompose_smoothing_merged.hg19.CpGislands.annotated.osc.gz
**tc.decompose_smoothing_merged.hg19.CpGislands.annotated.osc.gz
**tc.decompose_smoothing_merged.hg19.EST.annotated.osc.gz
**tc.decompose_smoothing_merged.hg19.EST.annotated.osc.gz
**tc.decompose_smoothing_merged.hg19.Ensembl.non_coding.annotated.sym.osc.gz
**tc.decompose_smoothing_merged.hg19.Ensembl.non_coding.annotated.sym.osc.gz
**tc.decompose_smoothing_merged.hg19.Ensembl.protein_coding.annotated.sym.osc.gz
**tc.decompose_smoothing_merged.hg19.Ensembl.protein_coding.annotated.sym.osc.gz
**tc.decompose_smoothing_merged.hg19.F5_human_lncRNAome.annotated.osc.gz
**tc.decompose_smoothing_merged.hg19.F5_human_lncRNAome.annotated.osc.gz
**tc.decompose_smoothing_merged.hg19.RefSeq.non_coding.annotated.sym.osc.gz
**tc.decompose_smoothing_merged.hg19.RefSeq.non_coding.annotated.sym.osc.gz
**tc.decompose_smoothing_merged.hg19.RefSeq.protein_coding.annotated.sym.osc.gz
**tc.decompose_smoothing_merged.hg19.RefSeq.protein_coding.annotated.sym.osc.gz
**tc.decompose_smoothing_merged.hg19.TBP_JASPAR_CORE_MA0108.2.annotated.osc.gz
**tc.decompose_smoothing_merged.hg19.TBP_JASPAR_CORE_MA0108.2.annotated.osc.gz
**tc.decompose_smoothing_merged.hg19.gencode-pseudo.annotated.sym.osc.gz
**tc.decompose_smoothing_merged.hg19.gencode-pseudo.annotated.sym.osc.gz
**tc.decompose_smoothing_merged.hg19.gencode.non_coding.annotated.sym.osc.gz
**tc.decompose_smoothing_merged.hg19.gencode.non_coding.annotated.sym.osc.gz
**tc.decompose_smoothing_merged.hg19.gencode.protein_coding.annotated.sym.osc.gz
**tc.decompose_smoothing_merged.hg19.gencode.protein_coding.annotated.sym.osc.gz
**tc.decompose_smoothing_merged.hg19.knownGene.non_coding.annotated.sym.osc.gz
**tc.decompose_smoothing_merged.hg19.knownGene.non_coding.annotated.sym.osc.gz
**tc.decompose_smoothing_merged.hg19.knownGene.protein_coding.annotated.sym.osc.gz
**tc.decompose_smoothing_merged.hg19.knownGene.protein_coding.annotated.sym.osc.gz
**tc.decompose_smoothing_merged.hg19.mRNA.annotated.osc.gz
**tc.decompose_smoothing_merged.hg19.mRNA.annotated.osc.gz
**tc.decompose_smoothing_merged.hg19.rmsk.annotated.repClass.repFamily.osc.gz
**tc.decompose_smoothing_merged.hg19.rmsk.annotated.repClass.repFamily.osc.gz


====== Mouse Decomposition-based Peak Identification (DPI) cluster ======
====== Mouse Decomposition-based Peak Identification (DPI) cluster ======


*https://fantom5-collaboration.gsc.riken.jp/webdav/home/nbertin/CAGE-Tag-Cluster-Annotation_Aug11/tc.decompose_smoothing_merged.mm9.annotations/
*https://fantom5-collaboration.gsc.riken.jp/webdav/home/nbertin/CAGE-Tag-Cluster-Annotation_Aug11/tc.decompose_smoothing_merged.mm9.annotations/
**tc.decompose_smoothing_merged.mm9.CpGislands.annotated.osc.gz
**tc.decompose_smoothing_merged.mm9.CpGislands.annotated.osc.gz
**tc.decompose_smoothing_merged.mm9.EST.annotated.osc.gz
**tc.decompose_smoothing_merged.mm9.EST.annotated.osc.gz
**tc.decompose_smoothing_merged.mm9.Ensembl.non_coding.annotated.sym.osc.gz
**tc.decompose_smoothing_merged.mm9.Ensembl.non_coding.annotated.sym.osc.gz
**tc.decompose_smoothing_merged.mm9.Ensembl.protein_coding.annotated.sym.osc.gz
**tc.decompose_smoothing_merged.mm9.Ensembl.protein_coding.annotated.sym.osc.gz
**tc.decompose_smoothing_merged.mm9.RefSeq.non_coding.annotated.sym.osc.gz
**tc.decompose_smoothing_merged.mm9.RefSeq.non_coding.annotated.sym.osc.gz
**tc.decompose_smoothing_merged.mm9.RefSeq.protein_coding.annotated.sym.osc.gz
**tc.decompose_smoothing_merged.mm9.RefSeq.protein_coding.annotated.sym.osc.gz
**tc.decompose_smoothing_merged.mm9.TBP_JASPAR_CORE_MA0108.2.annotated.osc.gz
**tc.decompose_smoothing_merged.mm9.TBP_JASPAR_CORE_MA0108.2.annotated.osc.gz
**tc.decompose_smoothing_merged.mm9.knownGene.non_coding.annotated.sym.osc.gz
**tc.decompose_smoothing_merged.mm9.knownGene.non_coding.annotated.sym.osc.gz
**tc.decompose_smoothing_merged.mm9.knownGene.protein_coding.annotated.sym.osc.gz
**tc.decompose_smoothing_merged.mm9.knownGene.protein_coding.annotated.sym.osc.gz
**tc.decompose_smoothing_merged.mm9.mRNA.annotated.osc.gz
**tc.decompose_smoothing_merged.mm9.mRNA.annotated.osc.gz
**tc.decompose_smoothing_merged.mm9.rmsk.annotated.repClass.repFamily.osc.gz
**tc.decompose_smoothing_merged.mm9.rmsk.annotated.repClass.repFamily.osc.gz


<br>



== Other clusters on UPDATE_012 ==
== Other clusters on UPDATE_012 ==
Line 72: Line 143:


*by Maritn Frith https://fantom5-collaboration.gsc.riken.jp/webdav/home/mcfrith/110725-update012-pclu/
*by Maritn Frith https://fantom5-collaboration.gsc.riken.jp/webdav/home/mcfrith/110725-update012-pclu/
*by Michael Rehli https://fantom5-collaboration.gsc.riken.jp/wfebdav/home/rehli/
*by Michael Rehli https://fantom5-collaboration.gsc.riken.jp/wfebdav/home/rehli/
*by Cesare Furlanello, Marco Chierici, Davide Albanese, Marco Roncador - https://fantom5-collaboration.gsc.riken.jp/webdav/home/FBK/110905-clusters/
*by Cesare Furlanello, Marco Chierici, Davide Albanese, Marco Roncador - https://fantom5-collaboration.gsc.riken.jp/webdav/home/FBK/110905-clusters/



Latest revision as of 18:06, 15 December 2011

This page is to update information about CAGE clusters on UPDATE_012

CAGE clusters for the main paper

Please contact to fantom5-wp4@gsc.riken.jp about the whole status, and corresponding providers for individual questions.

'Promotome' set

Note that the set is not finalized yet. The current status is promotome v1.0 RC1, where only the CAGE clusters supported by independent resources (evidences) are included.

Genomic coordinates

https://fantom5-collaboration.gsc.riken.jp/webdav/home/kawaji/110720-dpi-clusters/

ZENBU configuration

human and mouse

TSS classifier

[Sebastian et al.]

[Timo]

Expression profiles to identify cellular states

participants: Cesare, Marco, Yishai, Piotr, Kawaji, Christine, Tom, Al


Removal of poor quality libraries

see Sample_QC for the current status


Cluster thresholding

For a shared set for expression profiles to identify cellular states, we would like to set a threshold of expression (not relying on any other evidences - such as DNase HS site, since it could ignore rare cells). Piotr, Erik vN, Kawaji, and Al explored several parameters, and we concluded as below (Nov 30, 11):

See https://fantom5-collaboration.gsc.riken.jp/wiki/images/b/b4/111131-F5telecon-kawaji.pdf for details.

Normalization

question 1: collapsing TSS will change the data structure or not? - Yishai and Cesare will compare the normalised data to check whether the thresholding and collapsing the cluster will impact on the normalised structure. Stick with TSS.

  • results: Yes (at least in Yishai's normalization approach). Yishai compared "Normalization before collapsing" and "Normalization after collapsing", and the latter is better (that is, expression of ACTB and TUBB is relatively uncorrelated).


question 2: Key question: which normalization scheme should be used? - Compare power-law method of Piotr's or Yishai with EdgeR TMM RLE and TPM. See the 'Impact of normalization' section below, too.

[Plan and progress]

Impact of Normalization on FANTOM5 hCAGE promoterome: Implementation Plan (0.1)

  1. We (Trento, Columbia – Piotr is invited) will start implementing the normalization check scheme. It means building a generic script to run on DPI TSS tag cluster set(s) when they are ready. Can be run on any other clustering scheme for TSS, or before the clustering. Hopefully
    1. Run power-law fits on single libraries, compare with pooled data; check range of parameters, infer stability on pooled; compare with QC indicators on single libraries; consider; check for batch effect. Start from YB script; Piotr’s version as available.
    2. Match with normalization and QC methods from edgeR, also in script form. Keep it generic, automate.
    3. Apply on existing tag DPI clusters. Run also before and after other clustering. Prepare reporting scheme.
    4. Verify impact on a few examples of downstream analysis, e.g. quality/stability/accuracy of classifiers and networks
  2. Apply on DPI tag cluster set(s) as they are ready and report, to be used for main paper
  3. It can make sense to reuse all material for the “peak clustering and normalization” satellite paper to submit aside the main paper, or later

Description of clusters

Cluster annotations discussed in the cluster working group

Method Description

Transcript model derived annotation protocol

Annotation Results

Human Decomposition-based Peak Identification (DPI) cluster
  • https://fantom5-collaboration.gsc.riken.jp/webdav/home/nbertin/CAGE-Tag-Cluster-Annotation_Aug11/tc.decompose_smoothing_merged.hg19.annotations/
    • tc.decompose_smoothing_merged.hg19.CpGislands.annotated.osc.gz
    • tc.decompose_smoothing_merged.hg19.EST.annotated.osc.gz
    • tc.decompose_smoothing_merged.hg19.Ensembl.non_coding.annotated.sym.osc.gz
    • tc.decompose_smoothing_merged.hg19.Ensembl.protein_coding.annotated.sym.osc.gz
    • tc.decompose_smoothing_merged.hg19.F5_human_lncRNAome.annotated.osc.gz
    • tc.decompose_smoothing_merged.hg19.RefSeq.non_coding.annotated.sym.osc.gz
    • tc.decompose_smoothing_merged.hg19.RefSeq.protein_coding.annotated.sym.osc.gz
    • tc.decompose_smoothing_merged.hg19.TBP_JASPAR_CORE_MA0108.2.annotated.osc.gz
    • tc.decompose_smoothing_merged.hg19.gencode-pseudo.annotated.sym.osc.gz
    • tc.decompose_smoothing_merged.hg19.gencode.non_coding.annotated.sym.osc.gz
    • tc.decompose_smoothing_merged.hg19.gencode.protein_coding.annotated.sym.osc.gz
    • tc.decompose_smoothing_merged.hg19.knownGene.non_coding.annotated.sym.osc.gz
    • tc.decompose_smoothing_merged.hg19.knownGene.protein_coding.annotated.sym.osc.gz
    • tc.decompose_smoothing_merged.hg19.mRNA.annotated.osc.gz
    • tc.decompose_smoothing_merged.hg19.rmsk.annotated.repClass.repFamily.osc.gz
Mouse Decomposition-based Peak Identification (DPI) cluster
  • https://fantom5-collaboration.gsc.riken.jp/webdav/home/nbertin/CAGE-Tag-Cluster-Annotation_Aug11/tc.decompose_smoothing_merged.mm9.annotations/
    • tc.decompose_smoothing_merged.mm9.CpGislands.annotated.osc.gz
    • tc.decompose_smoothing_merged.mm9.EST.annotated.osc.gz
    • tc.decompose_smoothing_merged.mm9.Ensembl.non_coding.annotated.sym.osc.gz
    • tc.decompose_smoothing_merged.mm9.Ensembl.protein_coding.annotated.sym.osc.gz
    • tc.decompose_smoothing_merged.mm9.RefSeq.non_coding.annotated.sym.osc.gz
    • tc.decompose_smoothing_merged.mm9.RefSeq.protein_coding.annotated.sym.osc.gz
    • tc.decompose_smoothing_merged.mm9.TBP_JASPAR_CORE_MA0108.2.annotated.osc.gz
    • tc.decompose_smoothing_merged.mm9.knownGene.non_coding.annotated.sym.osc.gz
    • tc.decompose_smoothing_merged.mm9.knownGene.protein_coding.annotated.sym.osc.gz
    • tc.decompose_smoothing_merged.mm9.mRNA.annotated.osc.gz
    • tc.decompose_smoothing_merged.mm9.rmsk.annotated.repClass.repFamily.osc.gz


Other clusters on UPDATE_012

Please don't hesitate to use/produce other clusters for other purposes

Related pages