I'm trying to map quality trimmed Illumina 100bp reads to a reference in order to get the leftover reads which are potentially not the reference (bacteria/fungi).
For this I want very stringent mapping with bwa mem. I really want to get rid of all confidently aligned reads and it doesn't matter if some reads end up as unaligned because of more SNPs or other variations.
I tried to achieve this by increasing the seed (-k) e.g. from 20 to 40, but I'm not sure if it's right.
Should I increase the seed even more?
Increase the mismatch (-B) and gap open/extension penalty (-O -E)?
What about the Z-dropoff (-d)?
They way I've been checking this is by aligning E.coli reads against plants/animals. Unaligned reads comprise 99.90% with increasing the seed length. Would it be biologically correct to expect that some reads would still align?
Thx!
For this I want very stringent mapping with bwa mem. I really want to get rid of all confidently aligned reads and it doesn't matter if some reads end up as unaligned because of more SNPs or other variations.
I tried to achieve this by increasing the seed (-k) e.g. from 20 to 40, but I'm not sure if it's right.
Should I increase the seed even more?
Increase the mismatch (-B) and gap open/extension penalty (-O -E)?
What about the Z-dropoff (-d)?
They way I've been checking this is by aligning E.coli reads against plants/animals. Unaligned reads comprise 99.90% with increasing the seed length. Would it be biologically correct to expect that some reads would still align?
Thx!