CAGEscan mapping protocol
From Wiki
Jump to navigationJump to search
Sample splitting and linker removal
Input is 5′ and 3′ paired-end fastq files from the Illumina sequencers.
5′
The 9 first bases of the 5′ reads are trimmed. The 6 first are the index sequence (“barcode”) and the 3 next are the linker (GGG).
3′
The 6 first bases of the 3′ reads are trimmed because they derive from to the random part (N6) of the reverse-transcription primer, and therefore may not reflect the RNA sequences accurately, since the reverse-transcriptase tolerates mismatches even on the last two bases. See Mizuno et al., 1999 for example of priming over mismatches.
Output
Pairs of FASTQ files (5′ and 3′), where all the reads originate from the same RNA sample and all the linkers have been trimmed.
Final Output
Mapped paired-end tags in BAM format