CAGEscan mapping protocol

From Wiki
Revision as of 11:00, 31 March 2011 by Plessy (talk | contribs) (Be more verbose on each step of the CAGEscan pipeline (to be continued).)
Jump to navigationJump to search

Sample splitting and linker removal

Input is 5′ and 3′ paired-end fastq files from the Illumina sequencers.

5′

The 9 first bases of the 5′ reads are trimmed. The 6 first are the index sequence (“barcode”) and the 3 next are the linker (GGG).

3′

The 6 first bases of the 3′ reads are trimmed because they derive from to the random part (N6) of the reverse-transcription primer, and therefore may not reflect the RNA sequences accurately, since the reverse-transcriptase tolerates mismatches even on the last two bases. See Mizuno et al., 1999 for example of priming over mismatches.

Output

Pairs of FASTQ files (5′ and 3′), where all the reads originate from the same RNA sample and all the linkers have been trimmed.

Final Output

Mapped paired-end tags in BAM format