CAGEscan mapping protocol: Difference between revisions
From Wiki
Jump to navigationJump to search
(Simplified sample splitting and linker removal.) |
(TagDust filtering) |
||
| Line 8: | Line 8: | ||
We use the in-house command <code>PipelinePairedEndExtraction.pl</code>. It generates pairs of FASTQ files (5′ and 3′). |
We use the in-house command <code>PipelinePairedEndExtraction.pl</code>. It generates pairs of FASTQ files (5′ and 3′). |
||
== Artefact filtering == |
|||
Each FASTQ file is filtered with [http://pubmed.gov/19737799 TagDust], using an empty construct as library sequences: |
|||
<code>AATGATACGGCGACCACCGAGATCTACACTAGTCGAACTGAAGGTCTCCAGCA[barcode]gggAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG</code> |
|||
== Final Output == |
== Final Output == |
||
Revision as of 18:16, 4 April 2011
Sample splitting and linker removal
Input is 5′ and 3′ paired-end fastq files from the Illumina sequencers.
- The 9 first bases of the 5′ reads are trimmed. The 6 first are the index sequence (“barcode”) and the 3 next are the linker (
GGG).
- The 6 first bases of the 3′ reads are trimmed because they derive from to the random part (N6) of the reverse-transcription primer, and therefore may not reflect the RNA sequences accurately, since the reverse-transcriptase tolerates mismatches even on the last two bases. See Mizuno et al., 1999 for example of priming over mismatches.
We use the in-house command PipelinePairedEndExtraction.pl. It generates pairs of FASTQ files (5′ and 3′).
Artefact filtering
Each FASTQ file is filtered with TagDust, using an empty construct as library sequences:
AATGATACGGCGACCACCGAGATCTACACTAGTCGAACTGAAGGTCTCCAGCA[barcode]gggAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG
Final Output
Mapped paired-end tags in BAM format