Small RNA data description

From Wiki
Jump to navigationJump to search

Introduction

This page is intended to provide an overview of the current state of small RNA sequencing in the FANTOM5 project. It addresses the raw material and methods used in sequencing small RNA, methods used to map the sequences, data normalization strategies, and some quality control measures. For discussion on the F5 small RNA paper, please see this [ page]. For an overview of all noncoding RNA data and resources in FANTOM5, see this [ page].

Sequencing Methods

Small RNA for the FANTOM5 project was sequenced using the Illumina TruSeq kit, with 24 distinct samples multiplexed in a single Illumina HiSeq2000 lane. In each lane, a control (the standard mouse whole body, embryo E17.5 internal control used for all of FANTOM5) was sequenced with the same barcode.

Source RNA

To this point, small RNA sequencing has focused on the FANTOM5 panel of primary cells. Primary cell RNA was selected for small RNA-seq based on the following conditions: 1) availability of the sequences at the time the sequencing order was made and 2) preparation of the original RNA sample before delivery to RIKEN OSC did not exclude small RNAs.

The list of all sequenced samples is attached [ here]. As previously announced, additional time course samples will be selected for small RNA-seq and we will update the sample list when we get more information.

The following is a chart of the organism and F5 classification of all sequenced samples. While we have sequenced roughly ~360 samples, the number of replicates for each cell varies between 1-6 and the total number of distinct samples is 131.

--add chart here--

Mapping

Sequences were extracted and then mapped with the bwa alignment program (PMID: [ ]). The resulting bwa alignments were then used as "seeds" for alignment via the Delve short read alignment program developed by Timo Lassmann at RIKEN OSC. For more information see the following [ page]. Briefly, Delve assigns a probability representing the likelihood that an alignment is derived from a given location on the genome. For anyone preferring to use the original bwa alignments, these are provided in the final .bam alignment files as secondary mappings.

Links to the raw bam files for all libraries.

Normalization

take this from previous writings.

Filtering

Describe process. List of all ids and samples. List of all ids and samples to use in differential expression comparisons.

Variance in Data

Describe dispersion values in biological versus technical replicates.