Hi Brian,
Came across an instance today where I was attempting to parse bbmap SAM output for cigar string information to observe the mapping of some HIV read sequences against a reference. I was expecting for a cigar string to be present for each record when using the outm=output.sam parameter. In one of my mapping records, I observed an asterisk instead. Am I wrong to assume outm= is intended to include only those mapped reads? I can filter out these reads, but I'd like to make sure my mapping parameters make sense.
From the SAM output:
@HD VN:1.4 SO:unsorted
@SQ SN:NC_001802DRannotations_(modified) LN:9181
@PG ID:BBMap PN:BBMap VN:34.92 CL:java -Djava.library.path=/home/dnanexus/bbmap/jni/ -ea -Xmx10g align2.BBMap build=1 overwrite=true fastareadlen=500 in=reads_file ref=ref_file outm=sam_output minid=.8 strictmaxindel=10 k=8 subfilter=15 -Xmx10g
The offending mapping record:
M01472:214:000000000-AG0YC:1:2108:10755:20410 1:N:0:78 0 NC_001802DRannotations_(modified) 2154 4 * * 0 0 AAGTTAAACAATGGCCATTGACAGAAGAAAAAATAAAAGCATTAGTCATAGTAATATGGGGAAAGACTCCTAAATTTAAATTACCCATACAAAAGGAAACATGGGAAGCATGGTGGACAGAGTATTGGC CCCCCGGGGGGGGGGGGGGGGGGGGGGFDCFGECGGGF<AFEGGFEFGGGGFGGGGGGGGGGGGGGGGGFGFFGGFGEFGGAGFGEAFF<,FGGGGGGGGGGFGGFDGGGFECFGGGGGGGGGGCCCCC AM:i:4
This only occurred after parsing ~8 million mapping records out of a total 50 million.
Came across an instance today where I was attempting to parse bbmap SAM output for cigar string information to observe the mapping of some HIV read sequences against a reference. I was expecting for a cigar string to be present for each record when using the outm=output.sam parameter. In one of my mapping records, I observed an asterisk instead. Am I wrong to assume outm= is intended to include only those mapped reads? I can filter out these reads, but I'd like to make sure my mapping parameters make sense.
From the SAM output:
@HD VN:1.4 SO:unsorted
@SQ SN:NC_001802DRannotations_(modified) LN:9181
@PG ID:BBMap PN:BBMap VN:34.92 CL:java -Djava.library.path=/home/dnanexus/bbmap/jni/ -ea -Xmx10g align2.BBMap build=1 overwrite=true fastareadlen=500 in=reads_file ref=ref_file outm=sam_output minid=.8 strictmaxindel=10 k=8 subfilter=15 -Xmx10g
The offending mapping record:
M01472:214:000000000-AG0YC:1:2108:10755:20410 1:N:0:78 0 NC_001802DRannotations_(modified) 2154 4 * * 0 0 AAGTTAAACAATGGCCATTGACAGAAGAAAAAATAAAAGCATTAGTCATAGTAATATGGGGAAAGACTCCTAAATTTAAATTACCCATACAAAAGGAAACATGGGAAGCATGGTGGACAGAGTATTGGC CCCCCGGGGGGGGGGGGGGGGGGGGGGFDCFGECGGGF<AFEGGFEFGGGGFGGGGGGGGGGGGGGGGGFGFFGGFGEFGGAGFGEAFF<,FGGGGGGGGGGFGGFDGGGFECFGGGGGGGGGGCCCCC AM:i:4
This only occurred after parsing ~8 million mapping records out of a total 50 million.
Comment