Long noncoding RNA main paper: Difference between revisions

From Wiki
Jump to navigationJump to search
No edit summary
No edit summary
Line 25: Line 25:
## (Lukasz) PCA and multidimensional scaling to find tissues with most lncRNA expression differences / similarity (vs. coding RNA)
## (Lukasz) PCA and multidimensional scaling to find tissues with most lncRNA expression differences / similarity (vs. coding RNA)
# (Lukasz) All of the above tasks can be repeated to look for differences in cis- and trans-acting lncRNAs (see below)
# (Lukasz) All of the above tasks can be repeated to look for differences in cis- and trans-acting lncRNAs (see below)
# (?) tissue-specific differential promoter usage in lncRNAs; comparison to coding differential promoter usage
# (Max, Leonard, OSC) increasing the non-redundant lncRNAome by incorporating novel transcripts identified through FANTOM5 hCAGE and RNA-seq data
# (Max, Leonard, OSC) increasing the non-redundant lncRNAome by incorporating novel transcripts identified through FANTOM5 hCAGE and RNA-seq data



Revision as of 13:41, 4 November 2011

Welcome to the FANTOM5 long noncoding RNA (lncRNA) main paper page. This page will be used to list tasks and discuss ongoing analyses for the paper. For information on ncRNA data resources, see the Noncoding RNA central page. Please keep in mind that this paper has already been the subject of extensive discussion in many forums and we need to move quickly on this paper. While we are always interested in exciting new analyses, if you have something new to introduce/propose please do so with the intention of personally carrying out the analysis.

Paper objectives

This paper aims to capture the complete breadth and diversity of the non-redundant set of long noncoding RNAs (lncRNAs) while leveraging the unique qualities of FANTOM5 to understand their cellular restriction and evolutionary impact on the human genome. For the purpose of this paper, it is important to note that our definition of lncRNAs is broader than others and includes bidirectional/nested/cis-antisense lncRNAs and unspliced single-exon lncRNA genes with hCAGE support. In addition, this paper will use genome organization and context coupled with hCAGE-measured expression of coding genes to probe functional properties and provide a comprehensive classification scheme for lncRNAs.

Tasks for the paper

If you are interested in assisting with a task below, please add your name before the task in parantheses. Names have already been added for people expressing interest or currently involved in tasks as discussed at the FANTOM5 Kouyou meeting. There are still tasks with no one assigned, if you are interested please put your name down. Conceivably, some of these tasks will end up as satellite papers which will be referred to by the main paper but we are including them here at present.

Annotation/analysis of the non-redundant lncRNAome across FANTOM5 dataset

Leonard Lipovich's lab has undertaken and completed the Herculean task of assembling and annotating the set of non-redundant, known lncRNAs and supplemented this with the set provided by Gencode. Preliminary viewing of the analysis in ZENBU suggests many lncRNAs are tissue-specific; this is an important point of order for the FANTOM5 data and this main paper.

Specific tasks:

  1. (Leonard) Final tweaks to the FANTOM5 lncRNAome
    1. (Leonard) Inclusion of latest lncRNAs from the Cabili paper
  2. (WP4, Nicolas, Leonard) Obtain the list of hCAGE promoter peaks associating with lncRNAome from the final filtered and normalized clustering values
  3. (Lukasz) Primary-cell specific expression
    1. (Lukasz) Top-expressed lncRNAs in the total dataset and in different tissues (made available on the wiki to sample providers)
    2. (Lukasz) Identification of "cell-type" specific lncRNAs (made available on the wiki to sample providers)
  4. (Lukasz) Time course expression
    1. (Lukasz) Significant differences in lncRNA expression across time points across all time courses
    2. (Lukasz) lncRNA expression shared across multiple time courses
  5. (Lukasz) Analysis of lncRNAs (done in comparison with analysis of coding RNA--i.e. main promoterome paper analysis)
    1. (Lukasz) House-keeping vs tissue-specific lncRNAs (vs. coding RNAs)
    2. (Lukasz) Clustering of primary cells/tissues with respect to their lncRNA expression profiles
    3. (Lukasz) PCA and multidimensional scaling to find tissues with most lncRNA expression differences / similarity (vs. coding RNA)
  6. (Lukasz) All of the above tasks can be repeated to look for differences in cis- and trans-acting lncRNAs (see below)
  7. (?) tissue-specific differential promoter usage in lncRNAs; comparison to coding differential promoter usage
  8. (Max, Leonard, OSC) increasing the non-redundant lncRNAome by incorporating novel transcripts identified through FANTOM5 hCAGE and RNA-seq data

General functional classification of lncRNAs

Here we are interested in isolating lncRNAs likely to be acting in "cis" and making general functional predictions for individual lncRNAs based on co-expression. We are also interested in probing to what extent we can identify lncRNAs involved in "trans"-like regulation and lncRNAs which primarily function as precursors for small RNA biogenesis. While there is some overlap in the analyses, they are treated in separate sections below.

cis-acting lncRNAs

Specific tasks:

  1. (Nicolas, Leonard) Preliminary classification of cis-acting lncRNAs
    1. (Nicolas) overlapping sense-antisense co-expression at all lncRNA-mRNA sense-antisense pairs for latest data updates
    2. (Leonard) annotation of the above into a curated set representing the "Chainome"
  2. cis-acting lncRNA analysis (each analysis performed on both the complete set extracted by Nicolas and the chainome curated by Leonard's lab)
    1. (Timo, Robin, Nicolas) linking lncRNA expression to groups of locally-connected genes
    2. (Eivind, Finn, Tom, Nicolas) co-expression analysis to inform function of individual lncRNAs and effects of lncRNAs on chains including both co-expressed gene ontology and cell tree ontology (i.e. co-expression across related cell lineages gives clue to function)
      1. co-expression patterns to search for include lncRNA chains where all members are expressed and chains where lncRNA is expressed and mRNA is not and vice/versa.
    3. (Michiel) MARA analysis to see influence of cis-acting lncRNAs on transcriptional network (see motif enrichment section below)
    4. (Eivind, Helena, Max) overlaying small RNA information with ncRNA found in chains
      1. similar to above, search for potential effects on expression of chains in presence/absence of small RNA and its orientation
      2. (Eivind, Helena, Max, Martin, CRBC (see structure section below)) ncRNA serving as possible small RNA precursors

trans-acting lncRNAs

  1. (all) identification of potential trans-acting lncRNAs and their classes (classes: general trans-acting, Alu-element acting, and short RNA precursor transcripts)
    1. (Timo, Robin, Nicolas) linking lncRNA expression to groups of locally-connected genes
    2. (Eivind, Finn, Tom, Nicolas) co-expression analysis to inform function of individual lncRNAs including both co-expressed gene ontology and cell tree ontology (i.e. co-expression across related cell lineages gives clue to function)
    3. (Eivind, Helena, Max, Martin, CRBC (see structure section below)) ncRNA serving as possible small RNA precursors
    4. (Nicolas) reverse, window-based homology analysis of trans-acting lncRNAs to determine potential sites of activity on the genome
      1. overlay this analysis with co-expression results
    5. (Yulia) direct/inverse co-expression patterns of lncRNAs with known-gene mRNAs with Alu elements in 3'UTRs (based on Gong and Maquat 2011 Nature paper)
      1. requires construction of the set of mRNAs with Alu elements in the 3' UTR to specifically look at effects of expression in these lncRNAs/potential target mRNAs

Identification of novel lncRNAs using RNA-seq/CAGE-scan

see RNA-seq page for details.

  1. (Max) collection of usable public RNA-seq data
  2. (Max) integration with FANTOM5 RNA-seq
  3. (Nicolas, Max) CAGE-scan integration
    1. (Nicolas) Use CAGE-scan to look specifically for lncRNA TSSs inside repeats
    2. (Nicolas, Max) tabulation of all novel lncRNAs from this data and incorporation into non-redundant lncRNAome
  4. (Laurens) annotation of set of novel lncRNAs

Motif enrichment in promoter regions of lncRNAs

  1. (Boris Jankovic) Comparisons of motif enrichment (difference in cis- vs. trans- ?)
  2. Location/orientation of binding motifs within promoters
  3. (Michiel) MARA analysis on lncRNAome to identify possible candidates important to the transcriptional network

lncRNA conservation in matching mouse primary cells

Anayzing the presence/absence of lncRNA peaks in mouse and human under the assumption that lncRNAs play a specific role in shaping the human/primate transcriptome. Many of these analyses could also be extended to aortic smooth muscle cells in rat, dog, and chicken. There are three specific kinds of conservation that we are interested in analyzing: i) sequence (using UCSC TransMap and liftOver), ii) structure (location and presence/absence in the genome of lncRNAs; including chain conservation), iii) expression.

Specific tasks:

  1. (?) mapping human lncRNAs to mouse (TransMap/liftOver)
    1. (?) sequence conservation (promoter regions vs. full-length)
      1. (?) frequency
      2. (?) relative cis- vs. trans- conservation
      3. (?) promoter regions vs. full-length of transcripts
    2. (?) conservation of the structure of the genome location in human and matching mouse lncRNAs
      1. (?) presence/absence of lncRNAs
      2. (?) relative cis- vs. trans- conservation
      3. (?) conservation and partial conservation of the chainome
    3. (?) expression conservation
      1. (?) performed for conserved matches identified in 1.1 and also for conserved chains identified in 1.2
  2. (Yulia for global analysis, Leonard for annotation) frequency and conservation of Alu-initiated TSS in lncRNA in humans vs. mouse

Network validation

Probing lncRNA function through perturbation in identified networks.

Specific tasks:

  1. (Emily, Leonard) selection of candidate target networks involving transcription of transcription factors
  2. (coordinated by Haru, WP6) knockdown of lncRNAs, measuring local influence of lncRNAs to identify candidates for genome-wide perturbation experiments
    1. (WP6) probing transcriptional network perturbations through knockdown CAGE-seq
  3. (WP6) look into feasibility of overexpression experiments

Structure features/subclassification of lncRNAs

The intention here is to provide a classification of lncRNAs based on structural features (including computational secondary structure prediction and overlap with other genomic features) with the help of RNA-seq and short RNA data (e.g. splicing architecture, evidence of processed intermediates, translational potential, recapping, positioning relative to other genome markers, etc.)

  1. (Eivind, Max, Helen) overlap with short RNA (TSS-based and other processing products), see short RNA page
  2. (?) overlap with other genome features
  3. (?) RNA-seq and splicing frequency
  4. (Jia, Ben Brown) translational potential (small reading frames)
  5. (CBRC) secondary structure of lncRNA based on different classifications (cis- vs trans-, chain vs. non-chain, splicing vs. non-splicing, shortRNA containing-vs non-containing, etc.)

eRNA analysis

eRNA (enhancer RNA) is a class of lncRNA of particular interest. Analysis of eRNA is being headed up by Robin Andersson (robin@binf.ku.dk).

Specific tasks:

  1. (Robin) identification/classification, percent lncRNAs that are eRNAs, along with rationale
  2. basic statistics (e.g. length distribution, etc.)
  3. cell specificity
  4. exploring relationship between eRNA and associated promoters interactions
    1. expression correlation
    2. mutual information approach
    3. intersection with publicly available spatial genomic organization data
  5. (Miura-san, Robin, Nicolas) validation of eRNA interaction with promoter regions by intersect with existing HiC (?) data and/or more targeted validations

lncRNA and human disease overlap

  1. (Kenny, Peter, Juha) overlaying GWAS data with lncRNA
    1. cis-/trans-enrichment, cell-specificity of affected lncRNAs, etc...
  2. (Leonard,Alka) Rhett syndrome and cis-chain

miRNA promoters

Satellite paper based on Eivind and Kawaji-san's work

  1. (Eivind/Kawaji-san) idefinition of miRNA promoters based on DROSHA-KD, small RNA-seq and upstream hCAGE peaks


Timeline/order of analyses

Instead of wasting time assigning a bunch of meaningless dates to each task, I'll work out some of the dependencies which gives an idea of the prioritization. Then we'll follow up with groups assigned to the tasks as soon as they can be accomplished. The lists below are structured to imply dependency (indented tasks follow non-indented tasks...)

This is all dependent on finalization and normalization of the Kawaji-san promoterome clusters; however, everything listed below can begin using available data. After RNA-seq is used to confirm novel lncRNAs from FANTOM5, we may need to rerun a selected portion of the analyses on this set and possibly on the integrated set.

Network validation

Given the time this will require, we should get moving with what we have currently.

  • Al and Max list of available experimental systems to Leonard
  • Current best targets from Emily and Leonard sent to the OSC (Max and Al).
    • Max and Al--> discussion with WP6.

Annotation/analysis of the non-redundant lncRNAome across FANTOM5 dataset

  • Leonard submits final tweaks to the lncRNAome
    • Lukasz perform listed tasks
      • listed tasks are performed again on classified cis-acting and trans-acting lncRNAs, looking for differences
  • novel lncRNAs are added to the set
    • potentially repeat above analyses

General functional classification of lncRNAs

  • Nicolas cis-acting classification
    • Leonard annotation of formal "Chainome"
      • Timo, Robin, Nicolas establishing locally-connected genes with lncRNA chains
      • Eivind, Finn, Tom, Nicolas co-expression analysis on chains and complete set
        • Eivind, Helena, Max overlaying small RNA information on chains and effects of small RNA on expression
  • Eivind, Finn, Tom, Nicolas co-expression analysis on trans-acting lncRNAs
  • Nicolas identifying complete space of physical interaction for trans-acting lncRNAs
    • Nicolas overlaying the above two
  • construction of the Alu element-containing mRNA list
    • Yulia Alu elements role in trans-acting lncRNA gene-targeted expression

Motif enrichment in promoter regions of lncRNAs

  • Boris motif enrichment
  • Michiel MARA

lncRNA conservation in matching mouse primary cells

  • Al matched human/mouse samples
  • Leonard assists in defining determinants of lncRNA conservation
    • list of mouse/human sequence conserved sites
    • ? sequence conservation
    • ? genome structure conservation (includes chains)
      • expression conservation human/mouse based on conserved pairs identified in above two points
    • ? basic statistics on human/mouse conservation (possibly dependent on cis-/trans- classification)
  • ? compilation of conservation statistics (e.g. frequency, etc...)

Identification of novel lncRNAs using RNA-seq/CAGE-scan

see RNA-seq page.

  • Laurens will receive the complete set of novel lncRNAs for further annotation

Structure features/subclassification of lncRNAs

  • Eivind Max Helen overlap with short RNA see short RNA page
  • CRBC general structural features of identified classes of lncRNAs
  • take list of lncRNAs with overlapping short RNAs from above
    • CRBC identification of "precursor" structure from RNA-seq and short RNA
    • CRBC secondary structure predictions of lncRNAs in short RNA regions
      • CRBC integration of the two above

eRNA analysis

  • Robin percentage of lncRNAs that are eRNAs and rationale for choosing this
  • Robin basic statistics/cell specificity
  • Robin eRNA and affected promoter analysis
    • Robin/others computational validation with public datasets
  • Miura-san wet lab validation

lncRNA and human disease overlap

  • Kenny/others GWAS overlap with lncRNAome set
    • accompanying analysis
  • Leonard and Alka pursue Rhett story