It seems to me that the samtools indeed calling function is very buggy. Recently called indels on a dataset where all indels were reported with massive forward strand bias. Two output examples are shown here:
Direct interrogation of the reads revealed that there are plenty of reads from the reverse strand that support the indel. Therefore, we reverse-complemented the fastq dataset and re-called the exact same data mapped to the reverse strand. Again massive forward strand bias, despite the fact that the program had just called the reverse reads prior to them being reverse-complemented:
Option runs were:
samtools version 1.9
I can't think of any possible option that would cause this behavior, so I'm suspecting a bug. Thoughts?
Chr1 122767 . ACCC ACCCC 132.468 . INDEL;IDV=40;IMF=0.909091;DP=44;VDB=0.0700215;SGB=-0.692352;MQ0F=0;AF1=1;AC1=1;DP4=0,0,21,0;MQ=41;FQ=-999 GT:PL 1:170,63,0
Chr1 122924 . TGGGG TGGGGG 76.4669 . INDEL;IDV=18;IMF=0.818182;DP=22;VDB=0.0508436;SGB=-0.676189;MQ0F=0;AF1=1;AC1=1;DP4=0,0,11,0;MQ=37;FQ=-999 GT:PL 1:114,33,0
Chr1 122924 . TGGGG TGGGGG 76.4669 . INDEL;IDV=18;IMF=0.818182;DP=22;VDB=0.0508436;SGB=-0.676189;MQ0F=0;AF1=1;AC1=1;DP4=0,0,11,0;MQ=37;FQ=-999 GT:PL 1:114,33,0
Chr1 122767 . ACCC ACCCC 134.468 . INDEL;IDV=50;IMF=0.909091;DP=55;VDB=0.132339;SGB=-0.693021;MQ0F=0;AF1=1;AC1=1;DP4=0,0,27,0;MQ=42;FQ=-999 GT:PL 1:172,81,0
Chr1 122924 . TGGGG TGGGGG 120.467 . INDEL;IDV=29;IMF=0.852941;DP=34;VDB=0.245878;SGB=-0.692352;MQSB=0.473684;MQ0F=0;AF1=1;AC1=1;DP4=0,0,19,2;MQ=37;FQ=-999 GT:PL 1:158,63,0
Chr1 122924 . TGGGG TGGGGG 120.467 . INDEL;IDV=29;IMF=0.852941;DP=34;VDB=0.245878;SGB=-0.692352;MQSB=0.473684;MQ0F=0;AF1=1;AC1=1;DP4=0,0,19,2;MQ=37;FQ=-999 GT:PL 1:158,63,0
Code:
samtools mpileup -uf
Code:
bcftools --ploidy 1 -v -c
I can't think of any possible option that would cause this behavior, so I'm suspecting a bug. Thoughts?