CAGEscan mapping protocol: Difference between revisions

Revision as of 18:20, 4 April 2011

Input is 5′ and 3′ paired-end fastq files from the Illumina sequencers.

The 9 first bases of the 5′ reads are trimmed. The 6 first are the index sequence (“barcode”) and the 3 next are the linker (GGG).

The 6 first bases of the 3′ reads are trimmed because they derive from to the random part (N6) of the reverse-transcription primer, and therefore may not reflect the RNA sequences accurately, since the reverse-transcriptase tolerates mismatches even on the last two bases. See Mizuno et al., 1999 for example of priming over mismatches.

We use the in-house command PipelinePairedEndExtraction.pl. It generates pairs of FASTQ files (5′ and 3′).

Each FASTQ file is filtered with TagDust, using an empty construct as library sequences:

AATGATACGGCGACCACCGAGATCTACACTAGTCGAACTGAAGGTCTCCAGCA[barcode]gggAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG

Each FASTQ file is filtered again to remove reads that match the ribosomal DNA repeated unit (rDNA), with the program rRNAdust.

Mapped paired-end tags in BAM format

@@ Line 14: / Line 14: @@
 <code>AATGATACGGCGACCACCGAGATCTACACTAGTCGAACTGAAGGTCTCCAGCA[barcode]gggAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG</code>
+== Removal of rDNA sequences ==
+Each FASTQ file is filtered again to remove reads that match the ribosomal DNA repeated unit ([[rDNA]]), with the program [[User:Lassmann|rRNAdust]].
 == Final Output ==