MiRNA

From Wiki
Jump to navigationJump to search

FANTOM5 mature microRNA expression analysis


miRNA expression

Mature miRNA expression profiles for miRBase release 19 are provided for human, mouse, dog and rat samples (aligned reads are found in https://fantom5-collaboration.gsc.riken.jp/files/data/shared/UPDATE_020/f5pipeline/). UCSC liftOver was used to convert the mouse genomic coordinates from GRCm38 (used in miRBase 19) to NCBI37 (used for sequence data alignments).

Files named *_miRNAexpression.txt contain tables for individual samples and are sorted into folders for each species and sample type (primary cell, tissue etc.). These tables have ten columns:

1. Mature miRNA name.

2. Counts for exact matches to the annotated mature miRNA. If the sequence aligns to more than one position in the genome, all must be annotated as miRNAs with the same mature sequence.

3. Counts for approximate matches, i.e. sequences deviating by a maximum of 4 nt from the annotated mature miRNA, including exact matches. If the sequence aligns to more than one position in the genome, all must be annotated as miRNAs with the same mature sequence.

4. Counts for non-specific matches, including exact and approximate matches. For non-specific matches the genomic positions must still not correspond to more than one miRNA, but may have another or no annotation.

5. Counts per million aligned reads for exact matches.

6. Counts per million aligned reads for approximate matches.

7. Counts per million aligned reads for non-specific matches.

8. Counts per million miRNA reads for exact matches.

9. Counts per million miRNA reads for approximate matches.

10. Counts per million miRNA reads for non-specific matches.

I recommend using the approximate matches for most purposes. Unadjusted counts should be used with DESeq and edgeR. Counts per million aligned reads provides some adjustment for library size, library content and sequence quality. Counts per million miRNA reads is sometimes more appropriate, e.g. if different size ranges have been excised for the samples that are being compared.

Three tables summarising expression for all samples in a folder are also provided:

1. Counts for approximate matches to the annotated mature miRNA (maximum 4 nt deviation of the ends of the sequence). 2. Counts per million aligned reads for approximate matches. 3. Counts per million miRNA reads for approximate matches.

QC

Some basic alignment statistics are provided in the files named *_alignmentStatistics.txt:

1. Total reads. Take care with samples that have very few reads; the following samples appear to have failed completely:

acute%20myeloid%20leukemia%20%28FAB%20M5%29%20cell%20line%3aTHP-1%20%28nuclear%20fraction%29.SRhi10010.14299-155B6.hg19.CGATGT acute%20myeloid%20leukemia%20%28FAB%20M5%29%20cell%20line%3aTHP-1%20%28nuclear%20fraction%29.SRhi10010.14299-155B6.hg19.CGATGT Neural%20stem%20cells%2c%20donor1.SRhi10011.11275-116H6.hg19.ATGTCA

2. Aligned reads. A low fraction of aligning reads suggests poor sequence quality or failed sequencing library preparation.

3. Used reads. Excludes unaligned reads and reads with more genomic matches than the cut-off used for bwa alignment.

4. Mature miRNA. Reads matching known mature miRNAs. A low content could indicate problems with RNA quality or sequencing library preparation. It may however also be a consequence of sample biology (e.g. for samples representing the nuclear RNA fraction where high miRNA content may instead represent poor purity). In addition to the samples that failed sequencing and the nuclear fractions, the following primary cell samples have a very low miRNA content (<= the 5th percentile):

CD19%2b%20B%20Cells%2c%20donor1.SRhi10066.11544-120B5.hg19.TAGCTT CD19%2b%20B%20Cells%2c%20donor2.SRhi10066.11624-122B4.hg19.GGCTAC CD19%2b%20B%20Cells%2c%20donor3.SRhi10066.11705-123B4.hg19.CTTGTA Corneal%20Epithelial%20Cells%2c%20donor2.SRhi10014.11606-120I4.hg19.CTTGTA Endothelial%20Cells%20-%20Aortic%2c%20donor3.SRhi10011.11412-118E8.hg19.GGTAGC Esophageal%20Epithelial%20Cells%2c%20donor1.SRhi10011.11507-119G4.hg19.TGACCA Fibroblast%20-%20Mammary%2c%20donor3.SRhi10002.11701-123A9.hg19.GTTTCG Gingival%20epithelial%20cells%2c%20donor2%20%28GEA14%29.SRhi10011.11302-117B6.hg19.ACAGTG Keratocytes%2c%20donor2.SRhi10011.11607-120I5.hg19.AGTTCC Renal%20Proximal%20Tubular%20Epithelial%20Cell%2c%20donor1.SRhi10014.11515-119H3.hg19.CCGTCC Small%20Airway%20Epithelial%20Cells%2c%20donor3.SRhi10014.11406-118E2.hg19.GGTAGC

For some reason most tissue samples have a low miRNA content.

5. Precursor miRNA. Reads matching miRNA precursors, including mature miRNAs.


These results were generated using scripts developed by Helena Persson, Dept of Biosciences and Nutrition, Karolinska Institute (helena.persson@ki.se).