Hey folks,
I'm working with aligning some pacbio reads to a reference and have a question about blasr that I can't seem to figure out from the documentation. What I want is to only map the corrected reads which are longer than 3kb, to try to avoid spurious mismapping errors in repeat/transposase regions.
The flag -minReadLength doesn't behave as I would expect it to though. When I peak at the results by running the following:
head -5000 blasr.sam | awk '{if ($1 ~/@.*/) next; else if (length($10) < 3000) print length($10);}'
I see that loads of alignments that are shorter than 3k which leaves me puzzled about a few things.
Thanks for your thoughts!
Lizzy
I'm working with aligning some pacbio reads to a reference and have a question about blasr that I can't seem to figure out from the documentation. What I want is to only map the corrected reads which are longer than 3kb, to try to avoid spurious mismapping errors in repeat/transposase regions.
The flag -minReadLength doesn't behave as I would expect it to though. When I peak at the results by running the following:
head -5000 blasr.sam | awk '{if ($1 ~/@.*/) next; else if (length($10) < 3000) print length($10);}'
I see that loads of alignments that are shorter than 3k which leaves me puzzled about a few things.
- Anyone know what this flag actually does?
- Am I missing something about what that tenth sequence field in the sam file means?
- The default behavior is to not clip the alignments, so why are do I see alignments shorter than the shortest read I'm aligning? Shouldn't it only be reporting full length alignments?
Thanks for your thoughts!
Lizzy
Comment