Long noncoding RNA main paper

From Wiki
Revision as of 17:18, 31 October 2011 by Burrough (talk | contribs)
Jump to navigationJump to search

Welcome to the FANTOM5 long noncoding RNA (lncRNA) main paper page. This page will be used to list tasks and discuss ongoing analyses for the paper. For information on ncRNA data resources, see this page. Please keep in mind that this paper has already been the subject of extensive discussion in many forums and we need to move quickly on this paper. While we are always interested in exciting new analyses, if you have something new to introduce/propose please do so with the intention of personally carrying out the analysis.

Paper objectives

This paper aims to capture the complete breadth and diversity of long noncoding RNAs (lncRNAs) while leveraging the unique qualities of FANTOM5 to understand their cellular restriction and evolutionary impact on the human genome. For the purpose of this paper, it is important to note that our definition of lncRNAs is more broader than others and includes bidirectional/nested/cis-antisense lncRNAs and unspliced single-exon lncRNA genes with hCAGE support. In addition, this paper will use genome organization and context to probe functional properties and provide a comprehensive classification scheme for lncRNAs.

Tasks for the paper

If you are interested in assisting with a task below, please add your name before the task in parantheses. Names have already been added for people expressing interest or currently involved in tasks as discussed at the FANTOM5 Kouyou meeting. There are still tasks with no one assigned, if you are interested please put your name down. Conceivably, some of these tasks can and will be made into satellite papers which will be referred to by the main paper but we are including them here at present.

Annotation/analysis of the non-redundant lncRNAome across FANTOM5 dataset

Leonard Lipovich's lab has undertaken and completed the Herculean task of assembling and annotating the set of non-redundant, known lncRNAs and supplemented this with the set provided by Gencode. Preliminary viewing of the analysis in ZENBU suggests many lncRNAs are tissue-specific; this is an important point of order for the FANTOM5 data and this main paper.

Specific tasks:

  1. (Leonard) Final tweaks to the FANTOM5 lncRNAome
    1. (Leonard) Inclusion of latest lncRNAs from the Cabili paper (?)
  2. (WP4, Nicolas, Leonard) Obtain the list of hCAGE promoter peaks associating with lncRNAome from the final filtered and normalized clustering values
  3. (Lukasz) Primary-cell specific expression
    1. (Lukasz) Top-expressed lncRNAs in the total dataset and in different tissues (made available on the wiki to sample providers)
    2. (Lukasz) Identification of "cell-type" specific lncRNAs (made available on the wiki to sample providers)
  4. (Lukasz) Time course expression
    1. (Lukasz) Significant differences in lncRNA expression across time points across all time courses
    2. (Lukasz) lncRNA expression shared across multiple time courses
  5. (Lukasz) Analysis of lncRNAs (done in comparison with analysis of coding RNA--i.e. main promoterome paper analysis)
    1. (Lukasz) House-keeping vs tissue-specific lncRNAs (vs. coding RNAs)
    2. (Lukasz) Clustering of primary cells/tissues with respect to their lncRNA expression profiles
    3. (Lukasz) PCA and multidimensional scaling to find tissues with most lncRNA expression differences / similarity (vs. coding RNA)
  6. (Lukasz) All of the above tasks can be repeated to look for differences in cis- and trans-acting lncRNAs (see below)

General functional classification of lncRNAs

Here we are interested in defining relative percentages of lncRNAs likely to be acting in "cis" or "trans" and making general functional predictions for individual lncRNAs based on co-expression.

Specific tasks:

  1. (Nicolas, Leonard) Preliminary classification of lncRNAs into likely cis- and trans- acting
    1. (Nicolas) construction of "cis-acting" lncRNA chains
    2. (Leonard) annotation of the above into a curated set representing the "Chainome"
    3. (Nicolas) subtraction of above yields set of potential trans-acting lncRNAs
  2. cis-acting lncRNA analysis (each analysis performed on both the complete set extracted by Nicolas and the chainome curated by Leonard's lab)
    1. (Timo, Robin, Nicolas) linking lncRNA expression to groups of locally-connected genes
    2. (Eivind, Finn, Tom, Nicolas) co-expression analysis to inform function of individual lncRNAs and effects of lncRNAs on chains
    3. (Eivind, Helena, Max) overlaying small RNA information with ncRNA found in chains
      1. similar to above, search for potential effects on expression of lncRNA and coding RNA in presence/absence of small RNA and its orientation
  3. trans-acting lncRNA analysis
    1. (Eivind, Finn, Tom, Nicolas) co-expression analysis to inform function of individual lncRNAs
    2. (Nicolas) reverse, window-based homology analysis of trans-acting lncRNAs to determine potential sites of activity on the genome
      1. overlay this analysis with co-expression results

Motif enrichment in promoter regions of lncRNAs

  1. (Boris Jankovic) Comparisons of motif enrichment (difference in cis- vs. trans- ?)
  2. Location/orientation of binding motifs within promoters
  3. (Michiel) MARA analysis on lncRNAome

lncRNA conservation in matching mouse primary cells

Anayzing the presence/absence of lncRNA peaks in mouse and human under the assumption that lncRNAs play a specific role in shaping the human/primate transcriptome. Many of these analyses could also be extended to aortic smooth muscle cells in rat, dog, and chicken.

Specific tasks:

  1. (?) conservation frequency of human-specific lncRNAs in matching mouse primary cells
    1. (?) relative conservation of cis- and trans- acting
  2. (?) analysis of sequence conservation; promoter regions vs. the length of the transcript.
  3. (Leonard talks to Nicolas) conservation of "chainome"
  4. (Yulia for global analysis, Leonard for annotation) frequency and conservation of Alu-initiated TSS in lncRNA in humans vs. mouse

Network validation

Probing lncRNA function through perturbation in identified networks.

Specific tasks:

  1. (Emily, Leonard) selection of candidate target networks
  2. (coordinated by Haru, WP6) knockdown of lncRNAs, measuring influence of lncRNAs

Identification of novel lncRNAs using RNA-seq/CAGE-scan

see RNA-seq page for details.

  1. (Max) collection of usable public RNA-seq data
  2. (Max) integration with FANTOM5 RNA-seq
  3. (Nicolas, Max) CAGE-scan integration
  4. (Laurens) annotation of set of novel lncRNAs

Structure features/subclassification of lncRNAs

The intention here is to provide a comprehensive classification of lncRNAs based on structural features with the help of RNA-seq and short RNA data (e.g. splicing architecture, evidence of processed intermediates, translational potential, positioning relative to other genome markers, etc.) However, quite a bit of this was performed in a recent paper by Cabili so we will have to see if there is scope for something new in FANTOM5.

  1. (Martin Frith & colleagues?) structure of lncRNA with overlap to short RNA

eRNA analysis

eRNA (enhancer RNA) is a class of lncRNA of particular interest. Analysis of eRNA is being headed up by Robin Andersson ([robin@binf.ku.dk]).

Specific tasks:

  1. (Robin) identification/classification, percent lncRNAs that are eRNAs, along with rationale
  2. basic statistics (e.g. length distribution, etc.)
  3. cell specificity
  4. exploring relationship between eRNA and associated promoters interactions
    1. expression correlation
    2. mutual information approach
    3. intersection with publicly available spatial genomic organization data
  5. (Miura-san, Robin, Nicolas) validation of eRNA interaction with promoter regions by intersect with existing HiC (?) data and/or more targeted validations

lncRNA and human disease overlap

  1. (Kenny, Peter, Juha) overlaying GWAS data with lncRNA
    1. cis-/trans-enrichment, cell-specificity of affected lncRNAs, etc...
  2. (Leonard,Alka) Rhett syndrome and cis-chain

miRNA promoters

Satellite paper based on Eivind and Kawaji-san's work

  1. (Eivind/Kawaji-san) idefinition of miRNA promoters based on DROSHA-KD, small RNA-seq and upstream hCAGE peaks


Timeline/order of analyses

Instead of wasting time assigning a bunch of meaningless dates to each task, I'll work out some of the dependencies which gives an idea of the prioritization. Then we'll follow up with groups assigned to the tasks as soon as they can be accomplished. The lists below are structured to imply dependency (indented tasks follow non-indented tasks...)

This is all dependent on finalization and normalization of the Kawaji-san promoterome clusters; however, everything listed below can begin using available data. After RNA-seq is used to confirm novel lncRNAs from FANTOM5, we may need to rerun a selected portion of the analyses on this set and possibly on the integrated set.

Network validation

Given the time this will require, we should get moving with what we have currently.

  • Current best targets from Emily and Leonard sent to the OSC (Max and Al).
    • Max and Al--> discussion with WP6.

Annotation/analysis of the non-redundant lncRNAome across FANTOM5 dataset

  • Leonard submits final tweaks to the lncRNAome
    • Lukasz/Nicolas (?) perform listed tasks
      • listed tasks are performed again on the set of predicted cis-acting and trans-acting lncRNAs, looking for differences

General functional classification of lncRNAs

  • Nicolas cis/trans differentiation
    • Leonard annotation of formal "Chainome"
      • Timo, Robin, Nicolas establishing locally-connected genes with lncRNA chains
      • Eivind, Finn, Tom, Nicolas co-expression analysis on chains and complete set
        • Eivind, Helena, Max overlaying small RNA information on chains and effects of small RNA on expression
  • Eivind, Finn, Tom, Nicolas co-expression analysis on trans-acting lncRNAs
  • Nicolas identifying complete space of physical interaction for trans-acting lncRNAs
    • Nicolas overlaying the above two

Motif enrichment in promoter regions of lncRNAs

  • Boris motif enrichment
  • Michiel MARA

lncRNA conservation in matching mouse primary cells

  • ? defines determinants of lncRNA conservation
    • ? basic statistics on human/mouse conservation (possibly dependent on cis-/trans- classification)
    • ? conservation in promoter region vs. remaining sequence
  • Leonard defines chain conservation
    • Nicolas looks at genome-wide conservation of chains

Identification of novel lncRNAs using RNA-seq/CAGE-scan

see RNA-seq page.

  • Laurens will receive the complete set of novel lncRNAs for further annotation

Structure features/subclassification of lncRNAs

  • take list of lncRNAs with overlapping short RNAs from above
    • ? identification of "precursor" structure from RNA-seq and short RNA
    • ? secondary structure predictions of lncRNAs in short RNA regions
      • ? integration of the two above

eRNA analysis

  • Robin percentage of lncRNAs that are eRNAs and rationale for choosing this
  • Robin basic statistics/cell specificity
  • Robin eRNA and affected promoter analysis
    • Robin/others computational validation with public datasets
  • Miura-san wet lab validation

lncRNA and human disease overlap

  • Kenny/others GWAS overlap with lncRNAome set
    • accompanying analysis
  • Leonard and Alka pursue Rhett story