Working Group 2 - Expression visualization and clustering

From Wiki
Jump to navigationJump to search

Matrix Definition

Current primary data files associate tag counts with known genes and are available as raw counts and adjusted tag counts per million (TPM) Moving forward the matrix will be broken down (expanded) to the level of tag counts per promoter (both level 2 and level 3 promoters, if this classification is preserved). This will require a consensus view to be reached on the best way in defining and naming promoters. This will be the output of working group 1 and result in a matrix of tag counts per promoter being available from the Riken. Matrix should also include tag counts associated with non-coding RNAs.

Clustering

This data will be available to allow different approaches to normalization and clustering to be explored. The analysis of the best approach to clustering will be explored but criteria for ‘best clustering’ algorithms is open to debate (GO/pathway enrichment vs. assessment by biologist). Once the data is clustered by the optimum method(s) there is an obvious value and interest in defining the biology of those clusters where not immediately obvious. How or by whom this work will be done is not clear. Clustering may also be used to help define the true origin or type of cells i.e. help their ontological description. Clustering should also include clustering of samples.

Data Visualisation

It was thought that we should aim to make a sustainable and useful interface to the data and its analysis. This is related to WP9 and the availability of the raw data and sample information. Level 1 visualisation should include mapping of tag counts to genome in ZENBU and as promoter tracts on other genome browsers e.g. ENSEMBL, UCSC Level 2 visualisation should support gene/promoter level analysis of expression. The use of GNF’s BioGPS was discussed for this and was generally accepted as good/useful tool. Level 3 should include network visualizations of the datasets as a whole, clusters thereof and inference analyses of the transcriptional networks that underpin the data. BioLayout Express3D was seen as a possible candidate for this but other network visualisation tools might be appropriate for other aspects

Tools for visualisation: