Long noncoding RNA main paper: Difference between revisions
| Line 37: | Line 37: | ||
Specific tasks: |
Specific tasks: |
||
# (Nicolas, Leonard) Preliminary classification of cis-acting lncRNAs |
# (Nicolas, Leonard) Preliminary classification of cis-acting lncRNAs |
||
## (Nicolas) |
## (Nicolas) calculating (PCC) and ranking sense-antisense co-expression at all ~ 2,800 manually annotated lncRNA-mRNA sense-antisense ("SAS") pairs (from the 4,511-pair F5 SASome provided by Leonard) in latest data updates (012 and 013) |
||
## (Leonard) annotation of the above into a curated set representing |
## (Leonard) annotation of the above into a curated set representing top-ranking co-regulated and anti-regulated lncRNA-mRNA SASpairs |
||
## (Leonard) update of the FANTOM3 human "Chainome" (already obtained from Engstrom et al) to hg19; identification of all lncRNA-containing chains; addition of new lncRNA-containing chains from non-Engstrom sources (Emily) to update the F5 Chainome |
|||
# cis-acting lncRNA analysis (each analysis performed on both the complete set extracted by Nicolas and the chainome curated by Leonard's lab) |
# cis-acting lncRNA analysis (each analysis performed on both the complete set extracted by Nicolas and the chainome curated by Leonard's lab) |
||
## (Timo, Robin, Nicolas) linking lncRNA expression to groups of locally-connected genes |
## (Timo, Robin, Nicolas) linking lncRNA expression to groups of locally-connected genes |
||
| Line 46: | Line 47: | ||
## (Eivind, Helena, Max) overlaying small RNA information with ncRNA found in chains |
## (Eivind, Helena, Max) overlaying small RNA information with ncRNA found in chains |
||
### similar to above, search for potential effects on expression of chains in presence/absence of small RNA and its orientation |
### similar to above, search for potential effects on expression of chains in presence/absence of small RNA and its orientation |
||
### (Eivind, Helena, Max, Martin, |
### (Eivind, Helena, Max, Martin, Asai-sensei / CBRC (see structure section below)) ncRNA serving as possible small RNA precursors |
||
===trans-acting lncRNAs=== |
===trans-acting lncRNAs=== |
||
Revision as of 09:01, 7 November 2011
Welcome to the FANTOM5 long noncoding RNA (lncRNA) main paper page. This page will be used to list tasks and discuss ongoing analyses for the paper. For information on ncRNA data resources, see the Noncoding RNA central page. Please keep in mind that this paper has already been the subject of extensive discussion in many forums and we need to move quickly on this paper. While we are always interested in exciting new analyses, if you have something new to introduce/propose please do so with the intention of personally carrying out the analysis.
Paper objectives
This paper aims to capture, and functionally define, the complete breadth and diversity of the non-redundant set of long noncoding RNA (lncRNA) genes in the human genome, while leveraging the unique qualities of FANTOM5 (hCAGE, RNAseq, >1k human cell and tissue samples) to fully characterize the cell type specificity, timecourse responsiveness, regulatory network participation, and evolutionary impact of these genes. For the purpose of this paper, it is important to note that our definition of lncRNAs is broader than that used by others, and includes, in addition to lincRNAs (long intergenic ncRNAs), all bidirectional/nested/cis-antisense lncRNAs and unspliced single-exon lncRNA genes with hCAGE, RNAseq, and/or pre-F5 cDNA/EST support. In addition, this paper will use genome organization and context coupled with hCAGE-measured expression of coding genes to probe functional properties and provide a comprehensive classification scheme for lncRNAs.
Tasks for the paper
If you are interested in assisting with a task below, please add your name before the task in parantheses. Names have already been added for people expressing interest or currently involved in tasks as discussed at the FANTOM5 Koyo meeting. There are still tasks with no one assigned, if you are interested please put your name down. Conceivably, some of these tasks will end up as satellite papers which will be referred to by the main paper but we are including them here at present.
Annotation/analysis of the non-redundant lncRNAome across FANTOM5 dataset
Leonard Lipovich's lab has undertaken and completed the Herculean task of assembling and annotating the set of non-redundant, known lncRNAs and supplemented this with the set provided by Gencode. We are currently updating the non-redundant gene-centric human lncRNA catalog ("the F5 lncRNAome") with RNAs from Cabili et al Genes & Dev 2011. Preliminary viewing of the analysis in ZENBU suggests many lncRNAs are tissue-specific; this is an important point of order for the FANTOM5 data and this main paper.
Specific tasks:
- (Leonard, Hui) Generation of the definitive gene-centric non-redundant FANTOM5 lncRNAome: "the" human lncRNA catalog
- (Leonard, Hui) Inclusion of latest lncRNAs from the Cabili paper - done
- (Leonard, Hui) Inclusion of latest lncRNAs from the Gencode lncRNA Derrien et al paper - done
- (Max, Leonard, OSC) increasing the non-redundant lncRNAome by incorporating novel transcripts identified through FANTOM5 hCAGE and RNA-seq data - still To Do
- (WP4, Nicolas, Leonard) Obtain the list of hCAGE promoter peaks from UPDATE012 and UPDATE013 associating with lncRNAome from the final filtered and normalized clustering values (Question: Where is the final list of Clusters that we should use for this? - LL)
- (Lukasz, and Tom Freeman!) Primary-cell specific expression
- (Lukasz) Top-expressed lncRNAs in the total dataset and (Tom Freeman?) in different tissues (made available on the wiki to sample providers)
- (Tom Freeman) Identification of "cell-type" specific lncRNAs (made available on the wiki to sample providers)
- (Lukasz, Win Hide, and Emmanuel Dimont!) Time course expression
- (Lukasz) Significant differences in lncRNA expression across time points across all time courses
- (Lukasz) lncRNA expression shared across multiple time courses
- (Win, Emmanuel) lncRNAs that go from low to high (or from high to low) in any specific timecourse
- (Win, Emmanuel) complete SwitchEngine analysis of lncRNA expression in timecourses
- (Lukasz) Analysis of lncRNAs (done in comparison with analysis of coding RNA--i.e. main promoterome paper analysis)
- (Lukasz) House-keeping vs tissue-specific lncRNAs (vs. coding RNAs)
- (Lukasz) Clustering of primary cells/tissues with respect to their lncRNA expression profiles (Question: Lukasz, how is this different from Tom F's stuff? - LL)
- (Lukasz) PCA and multidimensional scaling to find tissues with most lncRNA expression differences / similarity (vs. coding RNA)
- (Lukasz) All of the above tasks can be repeated to look for differences in cis- and trans-acting lncRNAs (see below)
- (?) tissue-specific differential promoter usage in lncRNAs; comparison to coding differential promoter usage
General functional classification of lncRNAs
Here we define the subset of lncRNAs likely to be acting in "cis" based on their genomic proximity to their putative cis-target or co-regulated genes, and develop specific, network-level functional predictions for individual lncRNAs based on co-expression. We also attempt to identify all lncRNAs involved in "trans" regulation (i.e. regulation of genes that reside outside of the lncRNA-encoding locus), and lncRNAs which may function as precursors for small/er RNA biogenesis. While there is some overlap in the analyses, due to the fact that some RNAs are probably involved in more than one of these three functional modalities, they are treated in separate sections below.
cis-acting lncRNAs
Specific tasks:
- (Nicolas, Leonard) Preliminary classification of cis-acting lncRNAs
- (Nicolas) calculating (PCC) and ranking sense-antisense co-expression at all ~ 2,800 manually annotated lncRNA-mRNA sense-antisense ("SAS") pairs (from the 4,511-pair F5 SASome provided by Leonard) in latest data updates (012 and 013)
- (Leonard) annotation of the above into a curated set representing top-ranking co-regulated and anti-regulated lncRNA-mRNA SASpairs
- (Leonard) update of the FANTOM3 human "Chainome" (already obtained from Engstrom et al) to hg19; identification of all lncRNA-containing chains; addition of new lncRNA-containing chains from non-Engstrom sources (Emily) to update the F5 Chainome
- cis-acting lncRNA analysis (each analysis performed on both the complete set extracted by Nicolas and the chainome curated by Leonard's lab)
- (Timo, Robin, Nicolas) linking lncRNA expression to groups of locally-connected genes
- (Eivind, Finn, Tom, Nicolas) co-expression analysis to inform function of individual lncRNAs and effects of lncRNAs on chains including both co-expressed gene ontology and cell tree ontology (i.e. co-expression across related cell lineages gives clue to function)
- co-expression patterns to search for include lncRNA chains where all members are expressed and chains where lncRNA is expressed and mRNA is not and vice/versa.
- (Michiel) MARA analysis to see influence of cis-acting lncRNAs on transcriptional network (see motif enrichment section below)
- (Eivind, Helena, Max) overlaying small RNA information with ncRNA found in chains
- similar to above, search for potential effects on expression of chains in presence/absence of small RNA and its orientation
- (Eivind, Helena, Max, Martin, Asai-sensei / CBRC (see structure section below)) ncRNA serving as possible small RNA precursors
trans-acting lncRNAs
- (all) identification of potential trans-acting lncRNAs and their classes (classes: general trans-acting, Alu-element acting, and short RNA precursor transcripts)
- (Timo, Robin, Nicolas) linking lncRNA expression to groups of locally-connected genes
- (Eivind, Finn, Tom, Nicolas) co-expression analysis to inform function of individual lncRNAs including both co-expressed gene ontology and cell tree ontology (i.e. co-expression across related cell lineages gives clue to function)
- (Eivind, Helena, Max, Martin, CRBC (see structure section below)) ncRNA serving as possible small RNA precursors
- (Nicolas) reverse, window-based homology analysis of trans-acting lncRNAs to determine potential sites of activity on the genome
- overlay this analysis with co-expression results
- (Yulia) direct/inverse co-expression patterns of lncRNAs with known-gene mRNAs with Alu elements in 3'UTRs (based on Gong and Maquat 2011 Nature paper)
- requires construction of the set of mRNAs with Alu elements in the 3' UTR to specifically look at effects of expression in these lncRNAs/potential target mRNAs
RNA processing of the F5 lncRNAome: empirical evidence from hCAGE/RNAseq/CAGEscan integration
- LL to develop this section, stay tuned...
Identification of novel lncRNAs using RNA-seq/CAGE-scan
see RNA-seq page for details.
- (Max) collection of usable public RNA-seq data
- (Max) integration with FANTOM5 RNA-seq
- (Nicolas, Max) CAGE-scan integration
- (Nicolas) Use CAGE-scan to look specifically for lncRNA TSSs inside repeats
- (Nicolas, Max) tabulation of all novel lncRNAs from this data and incorporation into non-redundant lncRNAome
- (Laurens) annotation of set of novel lncRNAs
Motif enrichment in promoter regions of lncRNAs
- (Boris Jankovic) Comparisons of motif enrichment (difference in cis- vs. trans- ?)
- Location/orientation of binding motifs within promoters
- (Michiel) MARA analysis on lncRNAome to identify possible candidates important to the transcriptional network
lncRNA conservation in matching mouse primary cells
Anayzing the presence/absence of lncRNA peaks in mouse and human under the assumption that lncRNAs play a specific role in shaping the human/primate transcriptome. Many of these analyses could also be extended to aortic smooth muscle cells in rat, dog, and chicken. There are three specific kinds of conservation that we are interested in analyzing: i) sequence (using UCSC TransMap and liftOver), ii) structure (location and presence/absence in the genome of lncRNAs; including chain conservation), iii) expression.
Specific tasks:
- (?) mapping human lncRNAs to mouse (TransMap/liftOver)
- (?) sequence conservation (promoter regions vs. full-length)
- (?) frequency
- (?) relative cis- vs. trans- conservation
- (?) promoter regions vs. full-length of transcripts
- (?) conservation of the structure of the genome location in human and matching mouse lncRNAs
- (?) presence/absence of lncRNAs
- (?) relative cis- vs. trans- conservation
- (?) conservation and partial conservation of the chainome
- (?) expression conservation
- (?) performed for conserved matches identified in 1.1 and also for conserved chains identified in 1.2
- (?) sequence conservation (promoter regions vs. full-length)
- (Yulia for global analysis, Leonard for annotation) frequency and conservation of Alu-initiated TSS in lncRNA in humans vs. mouse
Network validation
Probing lncRNA function through perturbation in identified networks.
Specific tasks:
- (Emily, Leonard) selection of candidate target networks involving transcription of transcription factors
- (coordinated by Haru, WP6) knockdown of lncRNAs, measuring local influence of lncRNAs to identify candidates for genome-wide perturbation experiments
- (WP6) probing transcriptional network perturbations through knockdown CAGE-seq
- (WP6) look into feasibility of overexpression experiments
Structure features/subclassification of lncRNAs
The intention here is to provide a classification of lncRNAs based on structural features (including computational secondary structure prediction and overlap with other genomic features) with the help of RNA-seq and short RNA data (e.g. splicing architecture, evidence of processed intermediates, translational potential, recapping, positioning relative to other genome markers, etc.)
- (Eivind, Max, Helen) overlap with short RNA (TSS-based and other processing products), see short RNA page
- (?) overlap with other genome features
- (?) RNA-seq and splicing frequency
- (Jia, Ben Brown) translational potential (small reading frames)
- (CBRC) secondary structure of lncRNA based on different classifications (cis- vs trans-, chain vs. non-chain, splicing vs. non-splicing, shortRNA containing-vs non-containing, etc.)
eRNA analysis
eRNA (enhancer RNA) is a class of lncRNA of particular interest. Analysis of eRNA is being headed up by Robin Andersson (robin@binf.ku.dk).
Specific tasks:
- (Robin) identification/classification, percent lncRNAs that are eRNAs, along with rationale
- basic statistics (e.g. length distribution, etc.)
- cell specificity
- exploring relationship between eRNA and associated promoters interactions
- expression correlation
- mutual information approach
- intersection with publicly available spatial genomic organization data
- (Miura-san, Robin, Nicolas) validation of eRNA interaction with promoter regions by intersect with existing HiC (?) data and/or more targeted validations
lncRNA and human disease overlap
- (Kenny, Peter, Juha) overlaying GWAS data with lncRNA
- cis-/trans-enrichment, cell-specificity of affected lncRNAs, etc...
- (Leonard,Alka) Rhett syndrome and cis-chain
miRNA promoters
Satellite paper based on Eivind and Kawaji-san's work
- (Eivind/Kawaji-san) idefinition of miRNA promoters based on DROSHA-KD, small RNA-seq and upstream hCAGE peaks
Timeline/order of analyses
Instead of wasting time assigning a bunch of meaningless dates to each task, I'll work out some of the dependencies which gives an idea of the prioritization. Then we'll follow up with groups assigned to the tasks as soon as they can be accomplished. The lists below are structured to imply dependency (indented tasks follow non-indented tasks...)
This is all dependent on finalization and normalization of the Kawaji-san promoterome clusters; however, everything listed below can begin using available data. After RNA-seq is used to confirm novel lncRNAs from FANTOM5, we may need to rerun a selected portion of the analyses on this set and possibly on the integrated set.
Network validation
Given the time this will require, we should get moving with what we have currently.
- Al and Max list of available experimental systems to Leonard
- Current best targets from Emily and Leonard sent to the OSC (Max and Al).
- Max and Al--> discussion with WP6.
Annotation/analysis of the non-redundant lncRNAome across FANTOM5 dataset
- Leonard submits final tweaks to the lncRNAome
- Lukasz perform listed tasks
- listed tasks are performed again on classified cis-acting and trans-acting lncRNAs, looking for differences
- Lukasz perform listed tasks
- novel lncRNAs are added to the set
- potentially repeat above analyses
General functional classification of lncRNAs
- Nicolas cis-acting classification
- Leonard annotation of formal "Chainome"
- Timo, Robin, Nicolas establishing locally-connected genes with lncRNA chains
- Eivind, Finn, Tom, Nicolas co-expression analysis on chains and complete set
- Eivind, Helena, Max overlaying small RNA information on chains and effects of small RNA on expression
- Leonard annotation of formal "Chainome"
- Eivind, Finn, Tom, Nicolas co-expression analysis on trans-acting lncRNAs
- Nicolas identifying complete space of physical interaction for trans-acting lncRNAs
- Nicolas overlaying the above two
- construction of the Alu element-containing mRNA list
- Yulia Alu elements role in trans-acting lncRNA gene-targeted expression
Motif enrichment in promoter regions of lncRNAs
- Boris motif enrichment
- Michiel MARA
lncRNA conservation in matching mouse primary cells
- Al matched human/mouse samples
- Leonard assists in defining determinants of lncRNA conservation
- list of mouse/human sequence conserved sites
- ? sequence conservation
- ? genome structure conservation (includes chains)
- expression conservation human/mouse based on conserved pairs identified in above two points
- ? basic statistics on human/mouse conservation (possibly dependent on cis-/trans- classification)
- ? compilation of conservation statistics (e.g. frequency, etc...)
Identification of novel lncRNAs using RNA-seq/CAGE-scan
see RNA-seq page.
- Laurens will receive the complete set of novel lncRNAs for further annotation
Structure features/subclassification of lncRNAs
- Eivind Max Helen overlap with short RNA see short RNA page
- CRBC general structural features of identified classes of lncRNAs
- take list of lncRNAs with overlapping short RNAs from above
- CRBC identification of "precursor" structure from RNA-seq and short RNA
- CRBC secondary structure predictions of lncRNAs in short RNA regions
- CRBC integration of the two above
eRNA analysis
- Robin percentage of lncRNAs that are eRNAs and rationale for choosing this
- Robin basic statistics/cell specificity
- Robin eRNA and affected promoter analysis
- Robin/others computational validation with public datasets
- Miura-san wet lab validation
lncRNA and human disease overlap
- Kenny/others GWAS overlap with lncRNAome set
- accompanying analysis
- Leonard and Alka pursue Rhett story