CAGEscan mapping protocol: Difference between revisions

From Wiki
Jump to navigationJump to search
(Be more verbose on each step of the CAGEscan pipeline (to be continued).)
(Simplified sample splitting and linker removal.)
Line 3: Line 3:
Input is 5′ and 3′ paired-end fastq files from the Illumina sequencers.
Input is 5′ and 3′ paired-end fastq files from the Illumina sequencers.


* The 9 first bases of the 5′ reads are trimmed. The 6 first are the ''index sequence'' (“barcode”) and the 3 next are the linker (<code>GGG</code>).
=== 5′ ===


* The 6 first bases of the 3′ reads are trimmed because they derive from to the random part (N6) of the reverse-transcription primer, and therefore may not reflect the RNA sequences accurately, since the reverse-transcriptase tolerates mismatches even on the last two bases. See [http://pubmed.gov/9973624 Mizuno et al., 1999] for example of priming over mismatches.
The 9 first bases of the 5′ reads are trimmed. The 6 first are the ''index sequence'' (“barcode”) and the 3 next are the linker (<code>GGG</code>).


We use the in-house command <code>PipelinePairedEndExtraction.pl</code>. It generates pairs of FASTQ files (5′ and 3′).
=== 3′ ===

The 6 first bases of the 3′ reads are trimmed because they derive from to the random part (N6) of the reverse-transcription primer, and therefore may not reflect the RNA sequences accurately, since the reverse-transcriptase tolerates mismatches even on the last two bases. See [http://pubmed.gov/9973624 Mizuno et al., 1999] for example of priming over mismatches.

=== Output ===

Pairs of FASTQ files (5′ and 3′), where all the reads originate from the same RNA sample and all the linkers have been trimmed.


== Final Output ==
== Final Output ==

Revision as of 18:10, 4 April 2011

Sample splitting and linker removal

Input is 5′ and 3′ paired-end fastq files from the Illumina sequencers.

  • The 9 first bases of the 5′ reads are trimmed. The 6 first are the index sequence (“barcode”) and the 3 next are the linker (GGG).
  • The 6 first bases of the 3′ reads are trimmed because they derive from to the random part (N6) of the reverse-transcription primer, and therefore may not reflect the RNA sequences accurately, since the reverse-transcriptase tolerates mismatches even on the last two bases. See Mizuno et al., 1999 for example of priming over mismatches.

We use the in-house command PipelinePairedEndExtraction.pl. It generates pairs of FASTQ files (5′ and 3′).

Final Output

Mapped paired-end tags in BAM format