CAGEscan mapping protocol

Sample splitting and linker removal

Input is 5′ and 3′ paired-end fastq files from the Illumina sequencers.

The 9 first bases of the 5′ reads are trimmed. The 6 first are the index sequence (“barcode”) and the 3 next are the linker (GGG).

The 6 first bases of the 3′ reads are trimmed because they derive from to the random part (N6) of the reverse-transcription primer, and therefore may not reflect the RNA sequences accurately, since the reverse-transcriptase tolerates mismatches even on the last two bases. See Mizuno et al., 1999 for example of priming over mismatches.

We use the in-house command PipelinePairedEndExtraction.pl. It generates pairs of FASTQ files (5′ and 3′).

Artefact filtering

Each FASTQ file is filtered with TagDust, using an empty construct as library sequences:

AATGATACGGCGACCACCGAGATCTACACTAGTCGAACTGAAGGTCTCCAGCA[barcode]gggAGATCGGAAGAGCGGTTCAGCAGGAATGCCGAGACCGATCTCGTATGCCGTCTTCTGCTTG

Removal of rDNA sequences

Each FASTQ file is filtered again to remove reads that match the ribosomal DNA repeated unit (rDNA), with the program rRNAdust.

Synchronisation of the FASTQ files

Since in a pair one end can be valid and the other end can be filtered out, the resulting pairs of FASTQ files are not suitable for paired-end alignment. We use the in-house script called sync_paired_fastq to discard unpaired reads and re-sort the FASTQ files.

Final Output

Mapped paired-end tags in BAM format

CAGEscan mapping protocol

Contents

Sample splitting and linker removal

Artefact filtering

Removal of rDNA sequences

Synchronisation of the FASTQ files

Final Output

Navigation menu

Page actions

Page actions

Personal tools

Menu

Search

Special topics

Resources

ZENBU genome browser

UCSC Genome Browser RIKEN mirror

Navigation

Tools