First off, I'm using v34.94 and you're using v33.64, so there may be a bit of a discrepancy. But it looks like BBMap is refusing to map to that location (which has an 894bp exact match, cigar 14S894=245S) because there is another location that scores so much higher: 1073 matches and 1bp deletion, cigar 14S765=1D308=66S.
BBMap has some heuristics that skip searching for sites that are substantially lower scoring than others, and it considers 1073 matches, 1 deletion, and 80 soft-clipped bases to be much higher scoring than 894 matches and 259 soft-clipped bases. You can make the read align there several ways:
Any of those will give you the alignment in question, but they each have other disadvantages - the first one will not allow any indels, the second does not allow any indels or mismatches, just clipping (it's very fast, though), and the third requires a minimum of 800bp consecutive exact match. So they work in this case but none will ensure you get all 99% identity alignments.
You can also forcibly align a single query sequence to everything in the reference like this:
...although this is not really what I designed that program for.
So, overall, I don't think you can make BBMap do exactly what you want in general because its opinions of "best alignment" are slightly different than yours, but you could map everything in semiperfectmode and then just re-map the unmapped reads (which you separate using outm and outu streams) allowing 99% identity. That's probably the closest you can get.
BBMap has some heuristics that skip searching for sites that are substantially lower scoring than others, and it considers 1073 matches, 1 deletion, and 80 soft-clipped bases to be much higher scoring than 894 matches and 259 soft-clipped bases. You can make the read align there several ways:
Code:
mapPacBio.sh in=lbc1_read.fasta out=mapped2.sam ow strictmaxindel=0 mapPacBio.sh in=lbc1_read.fasta out=mapped2.sam ow semiperfectmode bbmapskimmer.sh in=lbc1_read.fasta out=mapped2.sam ow kfilter=800
You can also forcibly align a single query sequence to everything in the reference like this:
Code:
msa.sh in=full_reference.fasta out=mapped.sam msa=MultiStateAligner9PacBio query=AGCTTGGAGACAACATGTGGTTCTTGACAGCTCTGCTCCTTTGGGTTCCAGTTGATGGGCAAGTGGATACCACAAAGGCAGTGATCACTTTGCAGCCTCCATGGGTCAGCGTGTTCCAAGAGGAAACTGTAACCTTACAGTGTGAGGTGCCCCGTCTGCCTGGGAGCAGCTCCACACAGTGGTTTCTCAATGGCACAGCCACTCAGACCTCGACTCCCAGCTACAGAATCACCTCTGCCAGTGTCAAGGACAGTGGTGAATACAGGTGCCAGAGAGGTCCCTCAGGGCGAAGTGACCCCATACAGCTGGAAATCCACAGAGACTGGCTACTACTGCAGGTATCCAGCAGAGTCTTCACAGAAGGAGAACCTCTGGCCTTGAGGTGTCATGCATGGAAGGATAAGCTGGTGTACAATGTGCTTTACTATCAAAATGGCAAAGCCTTTAAGTTTTTCTACCGGAATTCTCAACTCACCATTCTGAAAACCAACATAAGTCACAACGGCGCCTACCACTGCTCAGGCATGGGAAAGCATCGCTACACATCAGCAGGAGTATCTGTCACTGTGAAAGAGCTATTTCCAGCTCCAGTGCTGAATGCATCCGTGACATCCCCGCTCCTGGAGGGGAATCTGGTCACCCTGAGCTGTGAAACAAAGTTGCTTCTGCAGAGGCCTGGTTTGCAGCTTTACTTCTCCTTCTACATGGGCAGCAAGACCCTGCGAGGCAGGAACACGTCCTCTGAATACCAAATACTAACTGCTAGAAGAGAAGACTCTGGTTTTACTGGTGCGAGGCCACCACAGAAGACGGAAATGTCCTTAAGCGCAGCCCTGAGTTGGAGCTTCAAGTGCTTGGCCTCCAGTTACCAACTCCTGTCTGGCTTCATGTCCTTTTCTATCTGGTAGTGGGAATAATGTTTTTAGTGAACACTGTTCTCTGGGTGACAATACGTAAAGAACTGAAAAGAAAGAAAAAGTGGAATTTAGAAATCTCTTTGGATTCTGCTCATGAGAAGAAGGTAACTTCCAGCCTTCAAGAAGACAGACATTTAGAAGAAGAGCTGAAGAGTCAGGAACAAGAATAAAGAACAGCTGCAGGAAGGGGTGCACCGGAAGGAGCCTGAGGAGGCCAAGTAGCAGCAGCTCAGTGG
So, overall, I don't think you can make BBMap do exactly what you want in general because its opinions of "best alignment" are slightly different than yours, but you could map everything in semiperfectmode and then just re-map the unmapped reads (which you separate using outm and outu streams) allowing 99% identity. That's probably the closest you can get.
Comment