User:Tlassmann/rRNA filtering
From Wiki
Jump to navigationJump to search
Purpose
Remove reads corresponding to rRNA from Helicos CAGE datasets.
Method
Since the error rate of Helicos is high and includes many insertion / deletion errors, the only viable option was to match sequences against rRNA sequences (U13369.1) using a non-heuristic alignment algorithm. Due to the amount of data a SSE parallelized version of Myers bit-parallel algorithm was implemented.
All reads matching the reference rRNA sequences with up to 2 errors are discarded at this step.
Input
Helicos fasta sequences.
Output
Reads not matching rRNA.