Small RNA paper page: Difference between revisions

From Wiki
Jump to navigationJump to search
No edit summary
No edit summary
Line 1: Line 1:
HEY! welcome to the short RNA FANTOM paper page. This paper is being headed up by Helena Persson, Eiven Valen, Michiel de Hoon (?), and Max Burroughs. Anyone interested is free to join in, just shoot an email over to Max (burrough@gsc.riken.jp) with your ideas for analyses, contributions for the paper.
Hey! Welcome to the short RNA FANTOM paper page. As discussed at the FANTOM5 Koyo meeting, this paper is being headed up by Helena Persson, Eiven Valen, and Max Burroughs. Anyone interested is free to join in, just shoot an email over to Max (burrough@gsc.riken.jp) with your ideas for analyses, contributions for the paper.


=Overview=
=Overview=
The strength of the FANTOM5 short RNA dataset is 1) overall scope of the samples ('''not''' the depth in any sample) and 2) matching hCAGE expression data.
The paper is currently designed to match the strengths of the FANTOM5 short RNA dataset which include 1) the range of the samples ('''not''' the depth in any one sample) and 2) matching hCAGE expression data.


==Data==
==Data==
Contrary to what was said at the Kouyou meeting, we have ~300 total samples sequenced, not total number of primary cells. The ~300 number includes replicates. Here is the detailed breakdown of total "unique" samples:
Contrary to what was said at the Kouyou meeting, we have ~300 total samples sequenced, not total number of primary cells (the ~300 number includes replicates). Here is a breakdown of where the "unique" libraries are coming from in reference to colloquial FANTOM5 data types:




Line 32: Line 32:
|}
|}


So this is a bit deflating, perhaps, but still a lot of primary cell types. Note the lone human timecourse sample does not mean we have sequenced an entire timecourse, just a single time point (single replicate) from an embryonic stem cell. For cross-species analysis we have Universal tissue in all species except mouse and Aortic smooth muscle in all species (although mouse has only a single replicate). The Universal tissue is not of much interest I think the aortic smooth muscle could be interesting...
So this is a bit deflating, perhaps, but still a lot of primary cell types. Note the lone human timecourse sample does not mean we have sequenced an entire timecourse, just a single time point (single replicate) from an embryonic stem cell. For cross-species analysis we have Universal tissue in all species except mouse and Aortic smooth muscle in all species (although mouse has only a single replicate). I'm afraid the Universal tissue is not of much interest although the aortic smooth muscle could be quite interesting in terms of conservation of short RNA regulatory processes.


In the ~300 samples, there are 10 that failed completely but if recall all of these had other replicates so the total unique primary cell count should remain at 119.
In the ~300 samples, there are 10 that failed completely but if I recall correctly all of these had additional replicates so the total unique primary cell count should remain at 119.


==Data processing==
==Data processing==
Line 40: Line 40:
Re-extraction and mapping will hopefully be underway within 2 weeks, taking another week or two to finish. We'll provide both BWA and Delve mappings (BWA is used to "seed" the iterative Delve mapping, so its easy to keep the initial mapping in sam format). I'll keep updating.
Re-extraction and mapping will hopefully be underway within 2 weeks, taking another week or two to finish. We'll provide both BWA and Delve mappings (BWA is used to "seed" the iterative Delve mapping, so its easy to keep the initial mapping in sam format). I'll keep updating.
===Normalization===
===Normalization===
I will rerun the EdgeR analysis similar to last [https://fantom5-collaboration.gsc.riken.jp/wiki/index.php/Expression_normalization_method#small_RNA_normalization time] and generate an miRNA expression table as well as a table with total tags, total miRNA tags, and RLE/TMM normalization factors for all libraries. I will also try to generate a table containing dispersion values for each set of replicates. For exploratory work it probably makes sense to keep all libraries (except maybe the 10 mentioned above) but for differential expression analysis, we probably want to identify and remove "bad" replicates. I'll look into this as well...
I will rerun the EdgeR analysis similar to last [https://fantom5-collaboration.gsc.riken.jp/wiki/index.php/Expression_normalization_method#small_RNA_normalization time] and generate an miRNA expression table as well as a table with total tags, total miRNA tags, and RLE/TMM normalization factors for all libraries. I will also try to generate a table containing dispersion values for each set of replicates. For exploratory work (e.g. novel miRNA-hunting) it probably makes sense to keep all libraries (except maybe the 10 mentioned above) but for differential expression analysis, we probably want to identify and remove "bad" replicates. I'll look into this as well...


==Analysis for paper==
==Analysis for paper==
I think we can safely divide the paper into two general areas: miRNA-related analysis and other short-RNA analysis. The miRNA-related analysis will give us a "safety net"--something that can definitely be published in a strong journal without worrying about extensive validation. The other short-RNA analysis is probably more exciting but would probably require more validation and all of its portending issues.
I think we can safely divide the paper into two general areas: miRNA-related analysis and other short-RNA analysis. The miRNA-related analysis will give us a "safety net"--something that can definitely be published without worrying about extensive validation. The other short-RNA analysis is probably more exciting but might require more validation and all of its portending issues.


For differential expression, perhaps we start with a more naive approach and if/when something develops from the promoterome paper which adjusts for clusters of similar tissues, we run with this. I don't think much would change, however...
For differential expression, perhaps we start with a more naive approach and if/when something develops from the promoterome paper which adjusts for clusters of similar tissues, we switch to this. I don't think much would change, however...


===Basic statistics, annotation of tags===
===Basic statistics, annotation of tags===
Since I'm handling the processing I'll just compile this, if there are no objections. Supplementary Table 1 and Supplementary Figure 1. Should be fun visualizing 119 library annotations.
Since I'm handling the basic processing I'll just compile this into Supplementary Table 1 and Supplementary Figure 1, if there are no objections. Should be fun visualizing 119 library annotations.


===miRNA===
===miRNA===
Line 56: Line 56:
## miRNA target analysis
## miRNA target analysis
# (Helena) novel miRNA, tissue specificity of these
# (Helena) novel miRNA, tissue specificity of these
# (?) differential isomiR usage across tissues (?)
# (?) differential isomiR usage across tissues
# (Max) differences in post-transcriptional modifications across tissues
# (Max) differences in post-transcriptional modifications across tissues
## hCAGE co-expression analysis to identify novel factors involved in miRNA regulation
## hCAGE co-expression analysis to identify novel factors involved in miRNA regulation


===other small RNA===
===other small RNA===
Results of the first three (?) summarized in excel format and provided to sample providers
Results of the first three (?) points below can be summarized in excel format and provided to sample providers
# (Max) tag-based differential expression analysis of "novel" short RNA populations
# (Max?) tag-based differential expression analysis of "novel" short RNA populations
# (Eiven?) endo-siRNA from coding regions, discovery, relative location, etc.
# (Eiven?) endo-siRNA from coding regions, discovery, relative location, etc.
## short RNA anti-sense to coding regions along length of transcript, not just promoter
# sense/antisense short RNA overlapping lncRNA, specifically in Leonard's "chains" (this will likely be folded into main paper, but we can refer to it if something interesting comes out of it
# (Helena?) differential expression, novel short RNA derived from non-coding precursors
# (Helena?) differential expression, novel short RNA derived from non-coding precursors
# promoter RNA
# promoter RNA
## differential expression
## (?) differential expression
## affected promoter GO-term enrichment (variation across cell type?)
## affected promoter GO-term enrichment (variation across cell type?)
## effects of promoter RNA expression on gene expression in hCAGE across cell types
## effects of promoter RNA expression on gene expression in hCAGE across cell types
## teasing out possible differences in effects of sense/antisense promoter sequences
## teasing out possible differences in effects of sense/antisense promoter sequences
## conservation (or lack of conservation) of promoter RNA across species in same cell type
## conservation (or lack of conservation) of promoter RNA across species in same cell type
# short RNA anti-sense to coding regions along length of transcript, not just promoter (?)
# sense/antisense short RNA overlapping lncRNA, specifically in Leonard's "chains" (this will likely be folded into main paper, but we can refer to it if something interesting comes out of it)
#general conservation in small RNAs derived from larger noncoding precursors in animals?
#general conservation in small RNAs derived from larger noncoding precursors in animals?


===Miscellaneous===
===Miscellaneous===
*see if we can get some timecourses. this would give the paper another angle and Helena points out it would be useful to show important RNAs for wet-bench collaborators
*see if we can get some timecourses. this would give the paper another angle and Helena points out it would be useful to show important RNAs for wet-bench collaborators
*expression comparisons between short RNAs and hCAGE peaks, could be an additional form of validation for novel stuff
*expression comparisons between short RNA libraries and hCAGE mature miRNA peaks, could be an additional form of validation for novel stuff
*reference to miRNA promoter satellite paper by Eiven and Kawaji-san
*reference to miRNA promoter satellite paper by Eiven and Kawaji-san
*Helena might be able to provide (limited) validation for novel miRNAs
*Helena might be able to provide (limited) validation for novel miRNAs

Revision as of 10:43, 25 October 2011

Hey! Welcome to the short RNA FANTOM paper page. As discussed at the FANTOM5 Koyo meeting, this paper is being headed up by Helena Persson, Eiven Valen, and Max Burroughs. Anyone interested is free to join in, just shoot an email over to Max (burrough@gsc.riken.jp) with your ideas for analyses, contributions for the paper.

Overview

The paper is currently designed to match the strengths of the FANTOM5 short RNA dataset which include 1) the range of the samples (not the depth in any one sample) and 2) matching hCAGE expression data.

Data

Contrary to what was said at the Kouyou meeting, we have ~300 total samples sequenced, not total number of primary cells (the ~300 number includes replicates). Here is a breakdown of where the "unique" libraries are coming from in reference to colloquial FANTOM5 data types:


short RNA samples
species primary cell timecourse tissue
human 119 1 4
mouse 1 0 0
rat 1 0 1
dog 1 0 1
chicken 1 0 1

So this is a bit deflating, perhaps, but still a lot of primary cell types. Note the lone human timecourse sample does not mean we have sequenced an entire timecourse, just a single time point (single replicate) from an embryonic stem cell. For cross-species analysis we have Universal tissue in all species except mouse and Aortic smooth muscle in all species (although mouse has only a single replicate). I'm afraid the Universal tissue is not of much interest although the aortic smooth muscle could be quite interesting in terms of conservation of short RNA regulatory processes.

In the ~300 samples, there are 10 that failed completely but if I recall correctly all of these had additional replicates so the total unique primary cell count should remain at 119.

Data processing

Mapping

Re-extraction and mapping will hopefully be underway within 2 weeks, taking another week or two to finish. We'll provide both BWA and Delve mappings (BWA is used to "seed" the iterative Delve mapping, so its easy to keep the initial mapping in sam format). I'll keep updating.

Normalization

I will rerun the EdgeR analysis similar to last time and generate an miRNA expression table as well as a table with total tags, total miRNA tags, and RLE/TMM normalization factors for all libraries. I will also try to generate a table containing dispersion values for each set of replicates. For exploratory work (e.g. novel miRNA-hunting) it probably makes sense to keep all libraries (except maybe the 10 mentioned above) but for differential expression analysis, we probably want to identify and remove "bad" replicates. I'll look into this as well...

Analysis for paper

I think we can safely divide the paper into two general areas: miRNA-related analysis and other short-RNA analysis. The miRNA-related analysis will give us a "safety net"--something that can definitely be published without worrying about extensive validation. The other short-RNA analysis is probably more exciting but might require more validation and all of its portending issues.

For differential expression, perhaps we start with a more naive approach and if/when something develops from the promoterome paper which adjusts for clusters of similar tissues, we switch to this. I don't think much would change, however...

Basic statistics, annotation of tags

Since I'm handling the basic processing I'll just compile this into Supplementary Table 1 and Supplementary Figure 1, if there are no objections. Should be fun visualizing 119 library annotations.

miRNA

  1. (Eiven, Helena?) Functional analysis of miRNAs entailing:
    1. differential expression analysis (deliver excel file to sample providers)
    2. hCAGE co-expression analysis to reveal miRNA function
    3. miRNA target analysis
  2. (Helena) novel miRNA, tissue specificity of these
  3. (?) differential isomiR usage across tissues
  4. (Max) differences in post-transcriptional modifications across tissues
    1. hCAGE co-expression analysis to identify novel factors involved in miRNA regulation

other small RNA

Results of the first three (?) points below can be summarized in excel format and provided to sample providers

  1. (Max?) tag-based differential expression analysis of "novel" short RNA populations
  2. (Eiven?) endo-siRNA from coding regions, discovery, relative location, etc.
    1. short RNA anti-sense to coding regions along length of transcript, not just promoter
  3. sense/antisense short RNA overlapping lncRNA, specifically in Leonard's "chains" (this will likely be folded into main paper, but we can refer to it if something interesting comes out of it
  4. (Helena?) differential expression, novel short RNA derived from non-coding precursors
  5. promoter RNA
    1. (?) differential expression
    2. affected promoter GO-term enrichment (variation across cell type?)
    3. effects of promoter RNA expression on gene expression in hCAGE across cell types
    4. teasing out possible differences in effects of sense/antisense promoter sequences
    5. conservation (or lack of conservation) of promoter RNA across species in same cell type
  6. general conservation in small RNAs derived from larger noncoding precursors in animals?

Miscellaneous

  • see if we can get some timecourses. this would give the paper another angle and Helena points out it would be useful to show important RNAs for wet-bench collaborators
  • expression comparisons between short RNA libraries and hCAGE mature miRNA peaks, could be an additional form of validation for novel stuff
  • reference to miRNA promoter satellite paper by Eiven and Kawaji-san
  • Helena might be able to provide (limited) validation for novel miRNAs