SampleClassification

From Wiki
Revision as of 03:09, 1 April 2011 by Hai Fang (talk | contribs) (Created page with '1) Win input data is gene expression phylogenetic algorithm is neighbour-joining (Manhattan distance) 2) Robin input data is level 2 promotor expression phylogenetic technique is…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to navigationJump to search

1) Win input data is gene expression phylogenetic algorithm is neighbour-joining (Manhattan distance) 2) Robin input data is level 2 promotor expression phylogenetic technique is neighbour-joining (KL divergence distance) 3) Owen input data is gene expression (presence/absence using a threshold) phylogenetic technique is maximum likelihood 4) Owen --preliminary result-- input data is presence/absence of TF network edges based on Motif activity (same as FANTOM4 Nature Genetics paper) phylogenetic technique is maximum likelihood 5) Hai input data is domain-level expression (converted from gene-level) phylogenetic technique is average linkage clustering 6) Kawaji-san input data is gene expression phylogenetic technique is average linkage clustering (Pearson correlation)

Comparing these trees (attached and numbered), we can see that 1 and 6 use the same input data, and 2 is very similar (adding in non-coding). 3 takes the same data from 1 and 6 but applies a threshold to convert it to binary (discarding information), and 4 is edge information instead of node information only for transcription factors. 5 is the same data from 1 and 6, but converted to to domains from genes.

1 and 2 use the same phylogenetic algorithm, 3 and 4 use a different one, and 5 and 6 use a third. The information contained in the input data should make more difference than the phylogenetic algorithm applied to it.

If the trees are compared, the results are in general very similar in 1,2, 6 and 5. They all group obvious clades such as macrophage, brain and blood. The trees in 3 and 4 fail to separate primary cells from tissues although they do separate the obvious clades similar to the others. The tree in 5 is unique in that it separates organisms perfectly as well as separating the sample type and obvious clades within organisms (where available).