Hi,
I'm analysing a mixed populaton of reads mapped to a gene and want to get the frequency of ALL SNPs in the data, even ones that samtools / bcftools things are not real, because they don't seem to be really designed for large mixed populations.
The allele count (AC) field doesn't seem to be it, for example at base 19 in this sequence there is a coverage of 222 reads, 191 of which are same base as reference, A, and 31 of which are G. This is a significant population but it is not reported as a SNP, I can get bcftools to report all variants, and get an info line for this base, but it doesn't seem to contain the frequency anywhere, I need to do this for all bases for a lot of samples, so have to use scripts and command line tools, samtools, bcftools. The DP4 field doesn't have the information either because it pools the frequency of all alternative bases, and doesn't give the frequency of each one.
Here's the line I need the frequency from:
reference_gene 19 . A G 3.27827 . DP=222;VDB=3.97257e-05;SGB=-0.691153;RPB=0.609132;MQB=0.848924;BQB=3.59825e-05;MQ0F=0;AC=0;AN=2;DP4=199,0,18,0;MQ=29 GT:PL 0/0:0,255,245
Thanks very much for any help!
S.
edit: my command line:
samtools mpileup -uf A.bam | bcftools call -A -m -Ov > A.vcf
I'm analysing a mixed populaton of reads mapped to a gene and want to get the frequency of ALL SNPs in the data, even ones that samtools / bcftools things are not real, because they don't seem to be really designed for large mixed populations.
The allele count (AC) field doesn't seem to be it, for example at base 19 in this sequence there is a coverage of 222 reads, 191 of which are same base as reference, A, and 31 of which are G. This is a significant population but it is not reported as a SNP, I can get bcftools to report all variants, and get an info line for this base, but it doesn't seem to contain the frequency anywhere, I need to do this for all bases for a lot of samples, so have to use scripts and command line tools, samtools, bcftools. The DP4 field doesn't have the information either because it pools the frequency of all alternative bases, and doesn't give the frequency of each one.
Here's the line I need the frequency from:
reference_gene 19 . A G 3.27827 . DP=222;VDB=3.97257e-05;SGB=-0.691153;RPB=0.609132;MQB=0.848924;BQB=3.59825e-05;MQ0F=0;AC=0;AN=2;DP4=199,0,18,0;MQ=29 GT:PL 0/0:0,255,245
Thanks very much for any help!
S.
edit: my command line:
samtools mpileup -uf A.bam | bcftools call -A -m -Ov > A.vcf
Comment