Tried to re-run the whole pipeline, starting from the BWA alignment (including the faidx indexing of the reference sequence), but no luck.
Moreover, mpileup - bcf - varfilter is able to predict around 81% (1891 out of 2321) of the present indels correctly, and for around 15% it's able to predict the correct "region" (+- 5 nt), but does not report the correct indel.
Seems that for these 15% indels, there are other indels reporterd in the neighborhood of the "reported" indel in the total mpileup-output (before running varFilter), and one of these other indels (not reported) seem to be the true one.
A few lines of the complete mpileup-output is given below, the reported deletion is -GT (fourth line) while the correct one is -TT (last line). The wrong one might be selected because it has more reads supporting it (40 + 32 versus 38 + 30).
Code:
NC_000913 84533 . G . 209 . DP=109;VDB=0.0392;AF1=0;AC1=0;DP4=52,55,0,2;MQ=48;FQ=-282;PV4=0.5,1,0.075,1 PL 0 NC_000913 84533 . GCT G 54.5 . INDEL;DP=109;VDB=0.0808;AF1=1;AC1=2;DP4=1,5,21,14;MQ=57;FQ=-63.5;PV4=0.08,0.27,1,1 GT:PL:GQ 1/1:95,29,0:55 NC_000913 84534 . C . 209 . DP=104;VDB=0.0318;AF1=0;AC1=0;DP4=50,52,1,1;MQ=48;FQ=-282;PV4=1,1,0.081,0.076 PL 0 NC_000913 84534 . CTGT CT 214 . INDEL;DP=104;VDB=0.0777;AF1=1;AC1=2;DP4=0,0,40,32;MQ=52;FQ=-252 GT:PL:GQ 1/1:255,217,0:99 NC_000913 84535 . T . 207 . DP=96;VDB=0.0413;AF1=0;AC1=0;DP4=48,47,0,1;MQ=47;FQ=-282;PV4=1,0,0.17,0.15 PL 0 NC_000913 84536 . G . 205 . DP=96;VDB=0.0639;AF1=0;AC1=0;DP4=47,45,0,0;MQ=47;FQ=-282 PL 0 NC_000913 84536 . GTT G 214 . INDEL;DP=96;VDB=0.0777;AF1=1;AC1=2;DP4=0,0,38,30;MQ=52;FQ=-240 GT:PL:GQ 1/1:255,205,0:99
Leave a comment: