Long noncoding RNA main paper

From Wiki
Revision as of 12:43, 19 October 2011 by Burrough (talk | contribs)
Jump to navigationJump to search

Welcome to the long noncoding main paper page. This page will be used to collect and link to ncRNA (including small) resources and data files as well as list and discuss ongoing analyses for the paper. Please keep in mind that this paper has been under discussion in many forums and we are planning to move fast on this paper; if you have a new analysis to introduce/propose please do so with the intention of personally carrying out the analysis.

Data/resources

FANTOM5 lncRNA

direct questions regarding shortRNA, RNA-seq data to Max at burrough@gsc.riken.jp

Objectives of paper

This paper aims to capture the complete breadth and diversity of long noncoding RNAs (lncRNAs) while leveraging the unique qualities of FANTOM5 to understand their cellular restriction and evolutionary impact on the human genome. In addition, it uses genome organization and context to probe functional properties and provide a comprehensive classification scheme for lncRNAs.

Tasks for the Paper

If you are interested in assisting with a task listed below, please add your name after the task in parantheses (you will be contacted shortly...) Some people who have already expressed interest in a task have already had their names added (feel free to remove this).

Inclusion of potential novel noncoding RNAs in existing lncRNA set

Leonard Lipovich's lab has undertaken and completed the Herculean task of assembling and annotating the set of known lncRNAs and supplemented this with the set provided by Gencode <link here>. To leverage FANTOM5 data, we need to identify novel lncRNAs based on hCAGE peaks from Kawaji-san's clustering combined with data from RNA-seq. Specific tasks:

  1. selection of FANTOM5 samples for RNA-seq, sequencing, RNA-seq processing, and transcript assembly (Max, RIKEN OSC)
  2. collection and formatting of publicly-available RNA-seq data for further assistance in 'validation' (Max, open to recommendations)
  3. based on RNA-seq-derived transcript definitions, use translational analysis to assess coding potential (Ben Brown)

Analysis of lncRNA tissue specificity and timecourse behavior

These results will essentially argue for biological importance of lncRNAs. Specific tasks: (Lukasz has volunteered to look into some/many of these)

  1. top expressed lncRNAs in the total dataset and in different tissues
  2. identification of 'house-keeping' lncRNAs versus tissue-specific lncRNAs
  3. clustering of primary cells/tissues with respect to their lncRNA expression profiles
  4. clustering of lncRNAs with respect to their primary cell/tissue expression profiles
  5. PCA and multidimensional scaling to find tissues with most expression differences / similarity
  6. identification of significant lncRNA expression differences across time course data
  7. identify lncRNA present across multiple time course datasets

lncRNA conservation in matching mouse samples

Anayzing the presence/absence of lncRNA peaks in mouse and human. Results will likely argue for a specific role for lncRNAs in shaping the human transcriptome. Specific tasks:

  1. identification of human-specific lncRNAs
  2. analysis of sequence conservation; promoter regions (hCAGE strength) vs. the length of the transcript. extension to rat, dog, chicken?
  3. role of Alu-initiated TSS in lncRNA proliferation in humans

Global analysis of sense/antisense transcription in cis/trans lncRNA networks

Much of this is underway, performed in Leonard's lab, Vlad's lab, and Nicolas at the OSC.

  1. identification of sense/antisense transcription involving at least one lncRNA (nicolas)
  2. coexpression trends between lncRNAs and proximal genes
  3. ontology analysis of s/as lncRNAs specific to tissues and humans
  4. construction "directional" of lncRNA networks
  1. identification of sense/antises


Subclassification of lncRNAs

Begin to catalog the diversity of lncRNAs evident from combination of RNAseq and hCAGE data.