Clustering evaluation

From Wiki
Revision as of 15:09, 15 June 2011 by Arner (talk | contribs) (Created page with '== Initial comparison of clustering methods == Nine clustering methods have been compared for various metrics, specifically: * Length distribution * Pair wise correlation * Pro…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

Initial comparison of clustering methods

Nine clustering methods have been compared for various metrics, specifically:

  • Length distribution
  • Pair wise correlation
  • Promoter ratio

A summary of the results is given in File:Cluster comparison.pdf. Details follow below.


Input files

The following files, contributed by collaborators, were used for the analysis:


The following libraries were used for correlation and promoter ratio analysis:

CD4+

CNhs10853
CNhs11955
CNhs11998

CD14+

CNhs10852
CNhs11954
CNhs11997

Astrocyte-cerebellum

CNhs11321
CNhs12081
CNhs12117


Dendritic cells -monocyte immature derived (technical and donor replicates)

CNhs10855 and CNhs11062 are technical replicates for donor1
CNhs12195
CNhs12000


THP-1 biological reps

CNhs10722
CNhs10723
CNhs10724


For theses libraries and clusterings, tag count and TPM tables were created by Jessica Severin:


Size distributions

A density plot of log10(cluster_length) is provided in the summary pdf.


Pair wise correlation

Pearson's correlation of log10(tag count + 1) was computed and plotted for all pairs of samples. Red circles i scatter plots in pdf indicate the same cell type. Higher resolution scatter plots are available in File:Png.zip. Tables of pair wise correlations are available in File:Csv.zip. Average correlation within cell type is reported in the summary pdf.


Promoter ratio

The ratio of clusters falling within 500 bp of RefSeq promoters was computed. Additionally, the amount of expression falling within the same promoter regions was computed and averaged over cell type for each clustering method. The number of missed RefSeq promoters for each clustering method is reported in File:Missed prom.zip.