Long noncoding RNA main paper

From Wiki
Revision as of 16:28, 26 October 2011 by Burrough (talk | contribs)
Jump to navigationJump to search

Welcome to the FANTOM5 long noncoding RNA (lncRNA) main paper page. This page will be used to list tasks and discuss ongoing analyses for the paper. For information on ncRNA data resources, see this page. Please keep in mind that this paper has already been the subject of extensive discussion in many forums and we need to move quickly on this paper. While we are always interested in exciting new analyses, if you have something new to introduce/propose please do so with the intention of personally carrying out the analysis.

Paper objectives

This paper aims to capture the complete breadth and diversity of long noncoding RNAs (lncRNAs) while leveraging the unique qualities of FANTOM5 to understand their cellular restriction and evolutionary impact on the human genome. In addition, it uses genome organization and context to probe functional properties and provide a comprehensive classification scheme for lncRNAs.

Tasks for the paper

If you are interested in assisting with a task below, please add your name before the task in parantheses. Names have already been added for people expressing interest or currently involved in tasks as discussed at the FANTOM5 Kouyou meeting.

Annotation/analysis of the non-redundant lncRNAome across FANTOM5 dataset

Leonard Lipovich's lab has undertaken and completed the Herculean task of assembling and annotating the set of non-redundant, known lncRNAs and supplemented this with the set provided by Gencode (see non-coding RNA resource page for more details). Preliminary viewing of the analysis in ZENBU suggests many lncRNAs are tissue-specific; this is an important point of order for the FANTOM5 data and this main paper.

Specific tasks:

  1. (Leonard) Final tweaks to the FANTOM5 lncRNAome
    1. (Leonard) Inclusion of latest lncRNAs from the Cabili paper
  2. (WP4, Nicolas, Leonard, ?) Obtain the list of hCAGE promoter peaks associating with lncRNAome from the final filtered and normalized clustering values
  3. (Lukasz, Nicolas?) Primary-cell specific expression
    1. Top-expressed lncRNAs in the total dataset and in different tissues (made available on the wiki as an excel file)
    2. Identification of "cell-type" specific lncRNAs
  4. Time course expression
    1. Significant differences in lncRNA expression across time points across all time courses
    2. lncRNA expression shared across multiple time courses
  5. Analysis of lncRNAs (done in comparison with analysis of coding RNA--i.e. main promoterome paper analysis)
    1. House-keeping vs tissue-specific lncRNAs (vs. coding RNAs)
    2. Clustering of primary cells/tissues with respect to their lncRNA expression profiles
    3. PCA and multidimensional scaling to find tissues with most lncRNA expression differences / similarity (vs. coding RNA)
  6. All of the above tasks could in theory be repeated for the sets of all cis- and trans-acting lncRNAs (see below)

General functional classification of lncRNAs

Here we are interested in defining relative percentages of lncRNAs likely to be acting in "cis" or "trans" and making general functional predictions for individual lncRNAs based on co-expression.

Specific tasks:

  1. (Nicolas, Leonard) Preliminary classification of lncRNAs into likely cis- and trans- acting
    1. (Nicolas) construction of "cis-acting" lncRNA chains
    2. (Leonard) annotation of the above into a curated set representing the "Chainome"
    3. (Nicolas) subtraction of above yields set of potential trans-acting lncRNAs
  2. cis-acting lncRNA analysis (each analysis potentially performed on both the complete set extracted by Nicolas and the chainome curated by Leonard's lab)
    1. (Timo, Robin, Nicolas) linking lncRNA expression to groups of locally-connected genes
    2. (Eivind, Finn, Tom, Nicolas) co-expression analysis to inform function of individual lncRNAs
    3. (Eivind, Helena, Max) overlaying small RNA information with ncRNA found in chains
      1. similar to above, search for potential effects on expression of lncRNA and coding RNA in presence/absence of small RNA and its orientation
  3. trans-acting lncRNA analysis
    1. (Eivind, Finn, Tom, Nicolas) co-expression analysis to inform function of individual lncRNAs
    2. (Nicolas) reverse, window-based homology analysis of trans-acting lncRNAs to determine potential sites of activity on the genome
      1. overlay this analysis with co-expression results

Motif enrichment in promoter regions of lncRNAs

  1. (Boris Jankovic) Comparisons of motif enrichment (difference in cis- vs. trans- ?)
  2. Location/orientation of binding motifs within promoters
  3. (Michiel) MARA analysis on lncRNAome

lncRNA conservation in matching mouse primary cells

Anayzing the presence/absence of lncRNA peaks in mouse and human under the assumption that lncRNAs play a specific role in shaping the human/primate transcriptome. Many of these analyses could also be extended to aortic smooth muscle cells in rat, dog, and chicken.

Specific tasks:

  1. conservation frequency of human-specific lncRNAs in matching mouse primary cells
    1. relative conservation of cis- and trans- acting
  2. analysis of sequence conservation; promoter regions vs. the length of the transcript.
  3. (Leonard talks to Nicolas) conservation of "chainome"
  4. (Yulia for global analysis, Leonard for annotation) frequency and conservation of Alu-initiated TSS in lncRNA in humans vs. mouse

Network validation

Probing lncRNA function through perturbation in identified networks.

Specific tasks:

  1. (Emily, Leonard) selection of candidate target networks
  2. (coordinated by Haru, WP6) knockdown of lncRNAs, measuring influence of lncRNAs

Identification of novel lncRNAs using RNA-seq

see RNA-seq page for details.

  1. (Max) collection of usable public RNA-seq data
  2. (Max) integration with FANTOM5 RNA-seq

Structure features/subclassification of lncRNAs

The intention here is to provide a comprehensive classification of lncRNAs based on structural features with the help of RNA-seq and short RNA data (e.g. splicing architecture, evidence of processed intermediates, translational potential, positioning relative to other genome markers, etc.) However, quite a bit of this was performed in a recent paper by Cabili so we will have to see if there is scope for something new in FANTOM5.

eRNA analysis

eRNA (enhancer RNA) is a class of lncRNA of particular interest. Analysis of eRNA is being headed up by Robin Andersson.

Specific tasks:

  1. (Robin) identification/classification, percent lncRNAs that are eRNAs, along with rationale
  2. basic statistics (e.g. length distribution, etc.)
  3. cell specificity
  4. exploring relationship between eRNA and associated promoters interactions
    1. expression correlation
    2. mutual information approach
    3. intersection with publicly available spatial genomic organization data
  5. (Miura-san, Robin, Nicolas) validation of eRNA interaction with promoter regions by intersect with existing HiC (?) data and/or more targeted validations

miRNA promoters

Satellite paper based on Eiven and Kawaji-san's work

  1. definition of miRNA promoters based on DROSHA-KD, small RNA-seq and upstream hCAGE peaks

Timeline