Long noncoding RNA main paper
Welcome to the long noncoding main paper page. This page will be used to collect and link to ncRNA (including small) resources and data files as well as list and discuss ongoing analyses for the paper. Please keep in mind that this paper has already been the subject of extensive discussion in many forums and we need to move quickly on this paper; if you have a new analysis to introduce/propose please do so with the intention of personally carrying out the analysis.
Data/resources
- FANTOM5 lncRNAome (Lipovich lab): Media:F5_human_lncRNAome(Jia&Lipovich_Gencode)BED.zip
- FANTOM5 sense/antisense pairing (Lipovich lab): Media:F5_human_sense-antisense_pairs_hg19.zip
Forthcoming: pointer to short RNA data gene expression tables and raw data pointer to RNA-seq data gene expression, transcript assemblies, and raw data
direct questions regarding shortRNA, RNA-seq data to Max: burrough@gsc.riken.jp
Objectives of paper
This paper aims to capture the complete breadth and diversity of long noncoding RNAs (lncRNAs) while leveraging the unique qualities of FANTOM5 to understand their cellular restriction and evolutionary impact on the human genome. In addition, it uses genome organization and context to probe functional properties and provide a comprehensive classification scheme for lncRNAs.
Tasks for the Paper
If you are interested in assisting with a task below, please add your name after the task in parantheses. Names have already been added for people expressing interest or currently involved in tasks.
The non-redundant lncRNAome and addition of novel candidates from orphan hCAGE peaks
Leonard Lipovich's lab has undertaken and completed the Herculean task of assembling and annotating the set of non-redundant, known lncRNAs and supplemented this with the set provided by Gencode <link here>. To leverage FANTOM5 data, we will identify novel lncRNAs based on hCAGE peaks from Kawaji-san's clustering combined with data from RNA-seq. Specific tasks:
- selection of FANTOM5 samples for RNA-seq, sequencing, RNA-seq processing, and transcript assembly (Max, RIKEN OSC)
- collection and formatting of publicly-available RNA-seq data for further assistance in 'validation' (Max, open to recommendations)
- based on RNA-seq-derived transcript definitions, use translational analysis to assess coding potential (Ben Brown)
Analysis of lncRNA tissue specificity and timecourse behavior
These results will essentially argue for biological importance of lncRNAs. Specific tasks: (Lukasz has volunteered to look into some/many of these)
- top expressed lncRNAs in the total dataset and in different tissues
- identification of 'house-keeping' lncRNAs versus tissue-specific lncRNAs
- clustering of primary cells/tissues with respect to their lncRNA expression profiles
- clustering of lncRNAs with respect to their primary cell/tissue expression profiles
- PCA and multidimensional scaling to find tissues with most expression differences / similarity
- identification of significant lncRNA expression differences across time course data
- identify lncRNA present across multiple time course datasets
lncRNA conservation in matching mouse samples
Anayzing the presence/absence of lncRNA peaks in mouse and human. Results will likely argue for a specific role for lncRNAs in shaping the human transcriptome. Specific tasks:
- identification of human-specific lncRNAs
- analysis of sequence conservation; promoter regions (hCAGE strength) vs. the length of the transcript. extension to rat, dog, chicken?
- role of Alu-initiated TSS in lncRNA proliferation in humans
Global analysis of sense/antisense transcription in cis/trans lncRNA networks
Much of this is underway, performed in Leonard's lab, Vlad's lab, and Nicolas at the OSC.
- identification of sense/antisense transcription involving at least one lncRNA (nicolas)
- coexpression trends between lncRNAs and proximal genes
- ontology analysis of s/as lncRNAs specific to tissues and humans
- construction "directional" of lncRNA networks
- effects of short RNA on the networks (?)
Network validation
Making lncRNA functional inferences through selective testing of the effects of perturbation on the different networks identified above.
- selection of candidate target networks (based on above)
- knockdown of lncRNAs, measuring influence of lncRNAs (WP6)
Subclassifying lncRNAs
Begin to classify lncRNAs via integration of various resources (FANTOM5 and non-FANTOM5)
- splicing architecture
- processing (products possibly supported by short RNA data)
- translational potential
- positioning relative to other genome markers (e.g. short RNA, histone markers, etc.)