User:Tlassmann/rRNA filtering

From Wiki
Jump to navigationJump to search

Purpose

Remove reads corresponding to rRNA from Helicos CAGE datasets.

Method

Since the error rate of Helicos is high and includes many insertion / deletion errors, the only viable option was to match sequences against rRNA sequences (U13369.1) using a non-heuristic alignment algorithm. Due to the amount of data a SSE parallelized version of Myers bit-parallel algorithm was implemented.

All reads matching the reference rRNA sequences with up to 2 errors are discarded at this step.

Input

Helicos fasta sequences.

Output

Reads not matching rRNA.