I use this to get mutations:
samtools mpileup -d 16000 -L 16000 -E -uf $reference $sorted.bam | bcftools view -bvgc - > raw.bcf
bcftools view raw.bcf > flt.vcf
After what i can see in output mutations that does not exist. I checked reads sequences in fastq and also checked it visually to get sure that some mutations really doesnt exist. They are very few in amount (and all are short indels) but still i wish to to know why they appear and can i sort them out. For example:
AMPL7153133463 82 . ATC A 207 . INDEL;DP=5349;VDB=0.0000;AF1=1;AC1=2;DP4=43,1,2551,1380;MQ=60;FQ=-290;PV4=2e-07,2.6e-260,1,1 GT:PL:GQ 1/1:248,255,0:99
AMPL7153133463 85 . ATCTTT ATT 214 . INDEL;DP=5349;VDB=0.0000;AF1=1;AC1=2;DP4=0,0,3141,1824;MQ=60;FQ=-290 GT:PL:GQ 1/1:255,255,0:99
Here first mutation does not exist, while the second one exist. In this example these indels got differemt frame so mistake cant be derived from different way of writing indels or alignment mistake (as i know). As one can see, both mutations found with very good quality, DP4 shows no strand biases and genotype qualities are also good. I found that non-exist heterozygous mutations shows strong baseQ bias (first number in PV4 field), but as i know it does not make sense to use PV4 field to sort out homozygous mutations.
So the quations are why these mutations appear (i tried really lots of different input parameters variation launching samtools mpileup and bcftools and nothing helped) and how can i filter them out (particularry is it reliable to use PV4 field for that aim in heterozygous case).
Thx in advance.
samtools mpileup -d 16000 -L 16000 -E -uf $reference $sorted.bam | bcftools view -bvgc - > raw.bcf
bcftools view raw.bcf > flt.vcf
After what i can see in output mutations that does not exist. I checked reads sequences in fastq and also checked it visually to get sure that some mutations really doesnt exist. They are very few in amount (and all are short indels) but still i wish to to know why they appear and can i sort them out. For example:
AMPL7153133463 82 . ATC A 207 . INDEL;DP=5349;VDB=0.0000;AF1=1;AC1=2;DP4=43,1,2551,1380;MQ=60;FQ=-290;PV4=2e-07,2.6e-260,1,1 GT:PL:GQ 1/1:248,255,0:99
AMPL7153133463 85 . ATCTTT ATT 214 . INDEL;DP=5349;VDB=0.0000;AF1=1;AC1=2;DP4=0,0,3141,1824;MQ=60;FQ=-290 GT:PL:GQ 1/1:255,255,0:99
Here first mutation does not exist, while the second one exist. In this example these indels got differemt frame so mistake cant be derived from different way of writing indels or alignment mistake (as i know). As one can see, both mutations found with very good quality, DP4 shows no strand biases and genotype qualities are also good. I found that non-exist heterozygous mutations shows strong baseQ bias (first number in PV4 field), but as i know it does not make sense to use PV4 field to sort out homozygous mutations.
So the quations are why these mutations appear (i tried really lots of different input parameters variation launching samtools mpileup and bcftools and nothing helped) and how can i filter them out (particularry is it reliable to use PV4 field for that aim in heterozygous case).
Thx in advance.