Clustering evaluation: Difference between revisions

From Wiki
Jump to navigationJump to search
Line 91: Line 91:
== Specific gene loci of interest ==
== Specific gene loci of interest ==


B4GALT1
* B4GALT1
** example https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=ay1fPkxlF5Os9C_EzIiKF;loc=hg19::chr9:33096459..33181534
example
https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=ay1fPkxlF5Os9C_EzIiKF;loc=hg19::chr9:33096459..33181534
** zoomed https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=ay1fPkxlF5Os9C_EzIiKF;loc=hg19::chr9:33166097..33168728


* ABCA1
zoomed
https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=ay1fPkxlF5Os9C_EzIiKF;loc=hg19::chr9:33166097..33168728
**example https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=ay1fPkxlF5Os9C_EzIiKF;loc=hg19::chr9:107506494..107727223
** zoomed https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=ay1fPkxlF5Os9C_EzIiKF;loc=hg19::chr9:107690015..107690795


* MAFB
ABCA1 example
https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=ay1fPkxlF5Os9C_EzIiKF;loc=hg19::chr9:107506494..107727223
** example https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=ay1fPkxlF5Os9C_EzIiKF;loc=hg19::chr20:39313674..39318715
** zoomed https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=ay1fPkxlF5Os9C_EzIiKF;loc=hg19::chr20:39317825..39317924


* FN1
zoomed
https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=ay1fPkxlF5Os9C_EzIiKF;loc=hg19::chr9:107690015..107690795
** example https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=ay1fPkxlF5Os9C_EzIiKF;loc=hg19::chr2:216206272..216319694
** zoomed https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=ay1fPkxlF5Os9C_EzIiKF;loc=hg19::chr2:216300166..216301803


* DUSP1
MAFB example
https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=ay1fPkxlF5Os9C_EzIiKF;loc=hg19::chr20:39313674..39318715
** example https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=ay1fPkxlF5Os9C_EzIiKF;loc=hg19::chr5:172194326..172198977
** zoomed https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=ay1fPkxlF5Os9C_EzIiKF;loc=hg19::chr5:172198150..172198249


* CEBPA
zoomed
https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=ay1fPkxlF5Os9C_EzIiKF;loc=hg19::chr20:39317825..39317924
** example https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=ay1fPkxlF5Os9C_EzIiKF;loc=hg19::chr19:33790339..33793915
** zoomed https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=ay1fPkxlF5Os9C_EzIiKF;loc=hg19::chr19:33793385..33793484


* HERC1
FN1 example
https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=ay1fPkxlF5Os9C_EzIiKF;loc=hg19::chr2:216206272..216319694
** example https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=ay1fPkxlF5Os9C_EzIiKF;loc=hg19::chr15:63844483..64182479
** zoomed https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=ay1fPkxlF5Os9C_EzIiKF;loc=hg19::chr15:64125991..64126298


* HMGA1
zoomed
https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=ay1fPkxlF5Os9C_EzIiKF;loc=hg19::chr2:216300166..216301803
** example https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=ay1fPkxlF5Os9C_EzIiKF;loc=hg19::chr6:34202218..34216365
** zoomed https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=ay1fPkxlF5Os9C_EzIiKF;loc=hg19::chr6:34204186..34205105


* IGF2
DUSP1 example
https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=ay1fPkxlF5Os9C_EzIiKF;loc=hg19::chr5:172194326..172198977
** example (NOTE IDR is just run on the CD14 samples whcih don;t express this therefore nothing called as reproducible) https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=ay1fPkxlF5Os9C_EzIiKF;loc=hg19::chr11:2145224..2175954
** zoomed https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=ay1fPkxlF5Os9C_EzIiKF;loc=hg19::chr11:2150419..2160905


* IRF8
zoomed
https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=ay1fPkxlF5Os9C_EzIiKF;loc=hg19::chr5:172198150..172198249
** example https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=ay1fPkxlF5Os9C_EzIiKF;loc=hg19::chr16:85926913..85962071
** zoomed https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=ay1fPkxlF5Os9C_EzIiKF;loc=hg19::chr16:85932059..85933147

CEBPA example
https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=ay1fPkxlF5Os9C_EzIiKF;loc=hg19::chr19:33790339..33793915

zoomed
https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=ay1fPkxlF5Os9C_EzIiKF;loc=hg19::chr19:33793385..33793484

HERC1 example
https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=ay1fPkxlF5Os9C_EzIiKF;loc=hg19::chr15:63844483..64182479

zoomed
https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=ay1fPkxlF5Os9C_EzIiKF;loc=hg19::chr15:64125991..64126298

HMGA1 example
https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=ay1fPkxlF5Os9C_EzIiKF;loc=hg19::chr6:34202218..34216365

zoomed
https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=ay1fPkxlF5Os9C_EzIiKF;loc=hg19::chr6:34204186..34205105

IGF2 example (NOTE IDR is just run on the CD14 samples whcih don;t express this therefore nothing called as reproducible)
https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=ay1fPkxlF5Os9C_EzIiKF;loc=hg19::chr11:2145224..2175954

zoomed
https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=ay1fPkxlF5Os9C_EzIiKF;loc=hg19::chr11:2150419..2160905

IRF8 example
https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=ay1fPkxlF5Os9C_EzIiKF;loc=hg19::chr16:85926913..85962071

zoomed
https://fantom5-collaboration.gsc.riken.jp/zenbu/gLyphs/#config=ay1fPkxlF5Os9C_EzIiKF;loc=hg19::chr16:85932059..85933147

Revision as of 17:00, 23 June 2011

Initial comparison of clustering methods

Nine clustering methods have been compared for various metrics, specifically:

  • Length distribution
  • Pair wise correlation
  • Promoter ratio

A summary of the results is given in File:Cluster comparison.pdf. Details follow below.


Input files

The following files, contributed by collaborators, were used for the analysis:


The following libraries were used for correlation and promoter ratio analysis:

CD4+

CNhs10853
CNhs11955
CNhs11998

CD14+

CNhs10852
CNhs11954
CNhs11997

Astrocyte-cerebellum

CNhs11321
CNhs12081
CNhs12117


Dendritic cells -monocyte immature derived (technical and donor replicates)

CNhs10855 and CNhs11062 are technical replicates for donor1
CNhs12195
CNhs12000


THP-1 biological reps

CNhs10722
CNhs10723
CNhs10724


For theses libraries and clusterings, tag count and TPM tables were created by Jessica Severin:


Size distributions

A density plot of log10(cluster_length) is provided in the summary pdf.

Additional plots of size distribution (and basic refgene annotation) for each clustering method can be found at https://fantom5-collaboration.gsc.riken.jp/webdav/home/m.lizio/update11_clusters_size_distribution_plots/

Pair wise correlation

Pearson's correlation of log10(tag count + 1) was computed and plotted for all pairs of samples. Red circles in scatter plots in pdf indicate the same cell type. Higher resolution scatter plots are available in File:Png.zip. Tables of pair wise correlations are available in File:Csv.zip. Average correlation within cell type is reported in the summary pdf.


Promoter ratio

The ratio of clusters falling within 500 bp of RefSeq promoters was computed. Additionally, the amount of expression falling within the same promoter regions was computed and averaged over cell type for each clustering method. Lists of missed RefSeq promoters for each clustering method is reported in File:Missed prom.zip.


Irreproducible Discovery Rate (IDR)

To select signal in high throughput experiments one can use consistency between replicas, because genuine signals are supposed to be reproducible between replicates. IDR is a statistical method to quantitatively measure the consistency between replicates and select signals with the reproducibility of signals into account. For a proper description see: http://www.stat.berkeley.edu/tech-reports/790.pdf .

Below are the results from comparing CD14 donor 2 and 3 replicas using different clustering methods:


caption

Specific gene loci of interest