Small RNA paper page: Difference between revisions

From Wiki
Jump to navigationJump to search
No edit summary
No edit summary
Line 40: Line 40:
===Normalization===
===Normalization===
I will rerun the EdgeR analysis similar to last [https://fantom5-collaboration.gsc.riken.jp/wiki/index.php/Expression_normalization_method#small_RNA_normalization time] and generate an miRNA expression table as well as a table with total tags, total miRNA tags, and RLE/TMM normalization factors for all libraries. I will also try to generate a table containing dispersion values for each set of replicates. For exploratory work it probably makes sense to keep all libraries (except maybe the 10 mentioned above) but for differential expression analysis, we probably want to identify and remove "bad" replicates. I'll look into this as well...
I will rerun the EdgeR analysis similar to last [https://fantom5-collaboration.gsc.riken.jp/wiki/index.php/Expression_normalization_method#small_RNA_normalization time] and generate an miRNA expression table as well as a table with total tags, total miRNA tags, and RLE/TMM normalization factors for all libraries. I will also try to generate a table containing dispersion values for each set of replicates. For exploratory work it probably makes sense to keep all libraries (except maybe the 10 mentioned above) but for differential expression analysis, we probably want to identify and remove "bad" replicates. I'll look into this as well...

==Analysis for paper==
I think we can safely divide the paper into two general areas: miRNA-related analysis and other short-RNA analysis. The miRNA-related analysis will give us a "safety net"--something that can definitely be published in a strong journal without worrying about extensive validation. The other short-RNA analysis is probably more exciting but would probably require more validation and all of its portending issues.


Brainstorming:
Brainstorming:

Revision as of 17:32, 21 October 2011

HEY! welcome to the short RNA FANTOM paper page. This paper is being headed up by Helena Persson, Eiven Valen, Michiel de Hoon (?), and Max Burroughs. Anyone interested is free to join in, just shoot an email over to Max (burrough@gsc.riken.jp) with your ideas for analyses, contributions for the paper.

Overview

The strength of the FANTOM5 short RNA dataset is 1) overall scope of the samples (not the depth in any sample) and 2) matching hCAGE expression data.

Data

Contrary to what was said at the Kouyou meeting, we have ~300 total samples sequenced, not total number of primary cells. The ~300 number includes replicates. Here is the detailed breakdown of total "unique" samples:

short RNA samples
species primary cell timecourse tissue
human 119 1 4
mouse 1 0 0
rat 1 0 1
dog 1 0 1
chicken 1 0 1

So this is a bit deflating, perhaps, but still a lot of primary cell types. Note the lone human timecourse sample does not mean we have sequenced an entire timecourse, just a single time point (single replicate) from an embryonic stem cell. For cross-species analysis we have Universal tissue in all species except mouse and Aortic smooth muscle in all species (although mouse has only a single replicate). The Universal tissue is not of much interest I think the aortic smooth muscle could be interesting...

In the ~300 samples, there are 10 that failed completely but if recall all of these had other replicates so the total unique primary cell count should remain at 119.

Data processing

Mapping

Re-extraction and mapping will hopefully be underway within 2 weeks, taking another week or two to finish. We'll provide both BWA and Delve mappings (BWA is used to "seed" the iterative Delve mapping, so its easy to keep the initial mapping in sam format). I'll keep updating.

Normalization

I will rerun the EdgeR analysis similar to last time and generate an miRNA expression table as well as a table with total tags, total miRNA tags, and RLE/TMM normalization factors for all libraries. I will also try to generate a table containing dispersion values for each set of replicates. For exploratory work it probably makes sense to keep all libraries (except maybe the 10 mentioned above) but for differential expression analysis, we probably want to identify and remove "bad" replicates. I'll look into this as well...

Analysis for paper

I think we can safely divide the paper into two general areas: miRNA-related analysis and other short-RNA analysis. The miRNA-related analysis will give us a "safety net"--something that can definitely be published in a strong journal without worrying about extensive validation. The other short-RNA analysis is probably more exciting but would probably require more validation and all of its portending issues.

Brainstorming:

  • novel miRNA prediction, tissue specific: Helena
  • endo siRNA: Eiven?
  • coding/noncoding overlapping short RNA and mechanistic implications?: Eiven
  • cell-specific short RNA derived from known precursors: ?
  • cell-specific novel short RNA populations. I suspect these two are quite related, "novel" populations will likely be derived from known precursors which are significantly differentially processed in certain cell lines. I think its better to look for the differential expression of all tags instead of looking specifically at known noncoding classes--max
  • 'differential' post-transcriptional modifications: Max
  • differential expression of tiRNAs and how this effects hCAGE expression across the panel
  • one analysis we will need to do for the lncRNA paper is to overlay lnRNA-intersecting short RNA with to see if/how this influences the chains

Notes:

  • I vote for RLE normalization. It performs slightly better than TMM or tpmm and its easy. see this page for details. I can play around and see if RLE performs better on all tags (with a cut-off) or on just miRNA loci. (Max)
  • see if we can get some timecourses. this would give the paper another angle and Helena points out it would be useful to show important RNAs for wet-bench collaborators
  • eventually it might be nice to account for the relatedness of samples; I'm thinking we can wait to see what the promoterome paper does and then adopt that but until then we just use some naive differential expression technique?
  • defining 'clusters' for promoter-based RNAs/endo-siRNAs other novel populations could be a bit tricky
  • might be nice to do expression comparisons between short RNAs and hCAGE peaks, could be an additional form of validation for novel
  • Helena might be able to provide (limited) validation for novel miRNAs