I performed mapping for my RNA-seq data (150 bp reads) with BBMap, just the default parameters (the only special parameter set was low memory):
When I looked in the resulting sam file, I noticed that for many reads I had CIGAR string looking like this:
This how the alignment looked (upper line is the read, lower is suggested hit):
CAGAAGTCAAGGAGTAATTTAGGAGTAATTTTGACTTTCAAGTCTTATTACTTAATAAATACATTTCATAAAGCTACAGCTGCCATAGATAGTGATTCCTCTGATGGATCTGGGCAAAGTCAATTGAAAACCTTCTGGAAAGGAGTCACC
..TGGA......************G.......C...C..............T..G.G...............G...CT.........G.........GT.....................CA.......................T.....
It is clearly incorrect mapping: this read should be just assigned to unmapped. From 1414 reads assigned to my gene of interest (attached), 949 had editing distance of 21 and more (34 reads had distance of 40 and more, up to 357!), i.e. obviously didn't belong to that gene. It means that a lot of the counts I get with the default settings are totally wrong. I am wondering if this is normal result for the mapper and what parameters I need to adjust to avoid this issue - `minratio`, `minind`, or something else?
Code:
bbmap.sh ref=gencode.v29.transcripts.fa path=index_bb_lomem in=sampleA_1.fq.gz out=sampleA.bam overwrite=f usemodulo
2=4X6=11I1X7=1X3=1X14=1X2=1X1=1X15=1X3=2X9=1X9=2X21=2X23=1X5=
CAGAAGTCAAGGAGTAATTTAGGAGTAATTTTGACTTTCAAGTCTTATTACTTAATAAATACATTTCATAAAGCTACAGCTGCCATAGATAGTGATTCCTCTGATGGATCTGGGCAAAGTCAATTGAAAACCTTCTGGAAAGGAGTCACC
..TGGA......************G.......C...C..............T..G.G...............G...CT.........G.........GT.....................CA.......................T.....
It is clearly incorrect mapping: this read should be just assigned to unmapped. From 1414 reads assigned to my gene of interest (attached), 949 had editing distance of 21 and more (34 reads had distance of 40 and more, up to 357!), i.e. obviously didn't belong to that gene. It means that a lot of the counts I get with the default settings are totally wrong. I am wondering if this is normal result for the mapper and what parameters I need to adjust to avoid this issue - `minratio`, `minind`, or something else?
Comment