CAGEscan mapping protocol: Difference between revisions
From Wiki
Jump to navigationJump to search
m (Cosmetic.) |
(Be more verbose on each step of the CAGEscan pipeline (to be continued).) |
||
| Line 1: | Line 1: | ||
== Sample splitting and linker removal == |
|||
== Input == |
|||
Input is 5′ and 3′ paired-end fastq files from the Illumina sequencers. |
|||
=== 5′ === |
=== 5′ === |
||
| Line 11: | Line 11: | ||
The 6 first bases of the 3′ reads are trimmed because they derive from to the random part (N6) of the reverse-transcription primer, and therefore may not reflect the RNA sequences accurately, since the reverse-transcriptase tolerates mismatches even on the last two bases. See [http://pubmed.gov/9973624 Mizuno et al., 1999] for example of priming over mismatches. |
The 6 first bases of the 3′ reads are trimmed because they derive from to the random part (N6) of the reverse-transcription primer, and therefore may not reflect the RNA sequences accurately, since the reverse-transcriptase tolerates mismatches even on the last two bases. See [http://pubmed.gov/9973624 Mizuno et al., 1999] for example of priming over mismatches. |
||
== Output == |
=== Output === |
||
Pairs of FASTQ files (5′ and 3′), where all the reads originate from the same RNA sample and all the linkers have been trimmed. |
|||
== Final Output == |
|||
Mapped paired-end tags in BAM format |
Mapped paired-end tags in BAM format |
||
Revision as of 11:00, 31 March 2011
Sample splitting and linker removal
Input is 5′ and 3′ paired-end fastq files from the Illumina sequencers.
5′
The 9 first bases of the 5′ reads are trimmed. The 6 first are the index sequence (“barcode”) and the 3 next are the linker (GGG).
3′
The 6 first bases of the 3′ reads are trimmed because they derive from to the random part (N6) of the reverse-transcription primer, and therefore may not reflect the RNA sequences accurately, since the reverse-transcriptase tolerates mismatches even on the last two bases. See Mizuno et al., 1999 for example of priming over mismatches.
Output
Pairs of FASTQ files (5′ and 3′), where all the reads originate from the same RNA sample and all the linkers have been trimmed.
Final Output
Mapped paired-end tags in BAM format