I'm seeing some alignments that don't make sense to me come out of BWA. There only parameter we are setting is '-n 20', and these are 100mer reads from metagenomic samples being mapped against a bacterial database.
Our understanding of the '-n' parameter is that its setting the max allowable edit distance between query and the reference for a good alignment, so its something like the max number of mismatches allowed. But in the SAM output we're seeing alignments where the NM:i field is showing 70-75 (NM:i is supposed to show the number of mismatches).
How can BWA even be making an alignment of a 100mer query where there are 70-75 mismatches?
Another oddity I've seen is that if we reduce the size of the input query file, the exact same reads that previously were showing 70-75 mismatches now show < 20 mismatches. We've seen this weird error in fasta files of ~600k 100mer reads, and then we've broken that query file into chunks of 5000 100mer reads, and the same reads do not give this error. But the results of the small chunks seem to not match entirely with the BLASTN results. Mainly the small chunks will either give the same hit as BLASTN, or will fail to find a hit that BLASTN finds.
Is this a known issue? Or could I be doing something wrong by failing to set some needed parameter? I'm using BWA 0.5.7 on a 64bit machine.
Thanks,
John Martin
Our understanding of the '-n' parameter is that its setting the max allowable edit distance between query and the reference for a good alignment, so its something like the max number of mismatches allowed. But in the SAM output we're seeing alignments where the NM:i field is showing 70-75 (NM:i is supposed to show the number of mismatches).
How can BWA even be making an alignment of a 100mer query where there are 70-75 mismatches?
Another oddity I've seen is that if we reduce the size of the input query file, the exact same reads that previously were showing 70-75 mismatches now show < 20 mismatches. We've seen this weird error in fasta files of ~600k 100mer reads, and then we've broken that query file into chunks of 5000 100mer reads, and the same reads do not give this error. But the results of the small chunks seem to not match entirely with the BLASTN results. Mainly the small chunks will either give the same hit as BLASTN, or will fail to find a hit that BLASTN finds.
Is this a known issue? Or could I be doing something wrong by failing to set some needed parameter? I'm using BWA 0.5.7 on a 64bit machine.
Thanks,
John Martin
Comment