Clustering evaluation: Difference between revisions
(Created page with '== Initial comparison of clustering methods == Nine clustering methods have been compared for various metrics, specifically: * Length distribution * Pair wise correlation * Pro…') |
No edit summary |
||
| Line 7: | Line 7: | ||
* Promoter ratio |
* Promoter ratio |
||
A summary of the results is given in [[ |
A summary of the results is given in [[Cluster_comparison.pdf]]. Details follow below. |
||
| Line 71: | Line 71: | ||
== Pair wise correlation == |
== Pair wise correlation == |
||
Pearson's correlation of log10(tag count + 1) was computed and plotted for all pairs of samples. Red circles i scatter plots in pdf indicate the same cell type. Higher resolution scatter plots are available in [[ |
Pearson's correlation of log10(tag count + 1) was computed and plotted for all pairs of samples. Red circles i scatter plots in pdf indicate the same cell type. Higher resolution scatter plots are available in [[Png.zip]]. Tables of pair wise correlations are available in [[Csv.zip]]. Average correlation within cell type is reported in the summary pdf. |
||
| Line 77: | Line 77: | ||
== Promoter ratio == |
== Promoter ratio == |
||
The ratio of clusters falling within 500 bp of RefSeq promoters was computed. Additionally, the amount of expression falling within the same promoter regions was computed and averaged over cell type for each clustering method. The number of missed RefSeq promoters for each clustering method is reported in [[ |
The ratio of clusters falling within 500 bp of RefSeq promoters was computed. Additionally, the amount of expression falling within the same promoter regions was computed and averaged over cell type for each clustering method. The number of missed RefSeq promoters for each clustering method is reported in [[Missed_prom.zip]]. |
||
Revision as of 15:12, 15 June 2011
Initial comparison of clustering methods
Nine clustering methods have been compared for various metrics, specifically:
- Length distribution
- Pair wise correlation
- Promoter ratio
A summary of the results is given in Cluster_comparison.pdf. Details follow below.
Input files
The following files, contributed by collaborators, were used for the analysis:
- Kawaji pooled decomposed smoothing - https://fantom5-collaboration.gsc.riken.jp/webdav/home/kawaji/110428-human-cage-cluster-UPDATE_011/pooled.tc.decompose_smoothing.bed9.gz
- OSC level2 - https://fantom5-collaboration.gsc.riken.jp/webdav/home/WP5/UPDATE_011_Tag_clusters/UPDATE_011_level2_hg19.osc.gz
- OSC level3 - https://fantom5-collaboration.gsc.riken.jp/webdav/home/WP5/UPDATE_011_Tag_clusters/UPDATE_011_level3_hg19.osc.gz
- TSC Balwierz - https://fantom5-collaboration.gsc.riken.jp/webdav/home/balwierz/TranscriptionStartClusters/TSC.hg19.bed
- FBKclust pooled - https://fantom5-collaboration.gsc.riken.jp/webdav/home/FBKclust/FBKclust_pooled_BED6.bed.gz
- Frith pclu - https://fantom5-collaboration.gsc.riken.jp/webdav/home/mcfrith/110425-pclu/hg19-pclu.bed
- Schmeier filtered - https://fantom5-collaboration.gsc.riken.jp/webdav/home/seb/filtered_hCAGE_clusters.bed.gz
- BASE - https://fantom5-collaboration.gsc.riken.jp/webdav/home/FA_Pro/FactorAnalysisResults_UPDATE011_up4M.bed
- Vanja - https://fantom5-collaboration.gsc.riken.jp/webdav/home/vhaberle/clusters/human.pooled/Pooled.all.CTSS.clusters.0.25.bed.gz
The following libraries were used for correlation and promoter ratio analysis:
CD4+
CNhs10853 CNhs11955 CNhs11998
CD14+
CNhs10852 CNhs11954 CNhs11997
Astrocyte-cerebellum
CNhs11321 CNhs12081 CNhs12117
Dendritic cells -monocyte immature derived (technical and donor replicates)
CNhs10855 and CNhs11062 are technical replicates for donor1 CNhs12195 CNhs12000
THP-1 biological reps
CNhs10722 CNhs10723 CNhs10724
For theses libraries and clusterings, tag count and TPM tables were created by Jessica Severin:
- https://fantom5-collaboration.gsc.riken.jp/webdav/home/severin/cluster_expression_tagcount/
- https://fantom5-collaboration.gsc.riken.jp/webdav/home/severin/cluster_expression_tpm/
Size distributions
A density plot of log10(cluster_length) is provided in the summary pdf.
Pair wise correlation
Pearson's correlation of log10(tag count + 1) was computed and plotted for all pairs of samples. Red circles i scatter plots in pdf indicate the same cell type. Higher resolution scatter plots are available in Png.zip. Tables of pair wise correlations are available in Csv.zip. Average correlation within cell type is reported in the summary pdf.
Promoter ratio
The ratio of clusters falling within 500 bp of RefSeq promoters was computed. Additionally, the amount of expression falling within the same promoter regions was computed and averaged over cell type for each clustering method. The number of missed RefSeq promoters for each clustering method is reported in Missed_prom.zip.