Hi all,
I am using samtools - mpileup (version 0.1.18) to predict SNP and indels in bacterial genomes. When applying this pipeline to artificially generated Illumina data (using dwgsim), it seems to me that mpilup is able to predict the region where indels appear, but is reporting in some cases the wrong indel. In total around 2000 indels are reported, and for ~ 300 of them, a behavior like below is observed.
E.g. on position 3.211.144, dwgsim artificially introduced a deletion of 3 nucleotides (-GAG) in the reference sequence.
However, mpileup is giving on position 3211140 as ref TGCCG and as alternative TG, which would result in a deletion of CCG:
ref_id 3211140 . TGCCG TG 214 . INDEL;DP=97;VDB=0.0710;AF1=1;AC1=2;DP4=0,0,24,28;MQ=39;FQ=-192 GT:PL:GQ 1/1:255,157,0:99
However, when looking at the alignment of the genes in the BAM file (with tview), this does not seem to be correct (see attached).
It seems to me that samtools is predicting the indel correctly, but is rather making an error in reporting, by reporting an indel with the same size a few nucleotides upstream of the real indel location. Or am I missing something?
Thank you in advance!
Pieter
I am using samtools - mpileup (version 0.1.18) to predict SNP and indels in bacterial genomes. When applying this pipeline to artificially generated Illumina data (using dwgsim), it seems to me that mpilup is able to predict the region where indels appear, but is reporting in some cases the wrong indel. In total around 2000 indels are reported, and for ~ 300 of them, a behavior like below is observed.
E.g. on position 3.211.144, dwgsim artificially introduced a deletion of 3 nucleotides (-GAG) in the reference sequence.
However, mpileup is giving on position 3211140 as ref TGCCG and as alternative TG, which would result in a deletion of CCG:
ref_id 3211140 . TGCCG TG 214 . INDEL;DP=97;VDB=0.0710;AF1=1;AC1=2;DP4=0,0,24,28;MQ=39;FQ=-192 GT:PL:GQ 1/1:255,157,0:99
However, when looking at the alignment of the genes in the BAM file (with tview), this does not seem to be correct (see attached).
It seems to me that samtools is predicting the indel correctly, but is rather making an error in reporting, by reporting an indel with the same size a few nucleotides upstream of the real indel location. Or am I missing something?
Thank you in advance!
Pieter
Comment