Hi folks,
We work with sea urchin larvae in our lab. They are very, very, very tiny and, thus, we need to collect a whole bunch of them at a time to get sufficient starting material for NGS. Urchins are also highly polymorphic.
RESULT: There are times in which some SNPs are effectively tri-allelic in a single sample, something that simply isn't ever going to happen if your sample consists of a happily diploid individual human (or medical model system of your choice).
To see what happens when one has three alleles at a polymorphic site, I constructed a fake dataset (which I can provide) consisting of three reads each of three different haplotypes. Using samtools mpileup, I can generate the following line for the base in question
Great, the program sees that there are three alleles at 124217
Now, lets take a look at the results of bcftools view
T? That was not what I was expecting. I was hoping for A,T,G
That brings me to my two questions.
1) Given the equal balance of alleles at SNP 124217, why does bcftools choose 'T'?
2) Are there any situations in which bcftools can return more than two alleles at a single SNP?
Any insights would be greatly appreciated.
Thanks,
David
We work with sea urchin larvae in our lab. They are very, very, very tiny and, thus, we need to collect a whole bunch of them at a time to get sufficient starting material for NGS. Urchins are also highly polymorphic.
RESULT: There are times in which some SNPs are effectively tri-allelic in a single sample, something that simply isn't ever going to happen if your sample consists of a happily diploid individual human (or medical model system of your choice).
To see what happens when one has three alleles at a polymorphic site, I constructed a fake dataset (which I can provide) consisting of three reads each of three different haplotypes. Using samtools mpileup, I can generate the following line for the base in question
Code:
samtools mpileup -f mySeqs.fa combined.bam > combined.pileup dgarfield$ less combined.pileup | grep 124217 Scaffold1200 124217 G 9 aaattt,,, =========p
Now, lets take a look at the results of bcftools view
Code:
samtools mpileup -uf mySeqs.fa combined.bam > combined.pileup_u dgarfield$ bcftools view -cg combined.pileup_u | grep 124217 Scaffold1200 124217 . G T 19.1 . DP=9;AF1=0.5;CI95=0.5,0.5;DP4=0,3,0,6;MQ=60;FQ=19.1;PV4=1,1,1,1 GT:PL:GQ 0/1:49,0,49:49
That brings me to my two questions.
1) Given the equal balance of alleles at SNP 124217, why does bcftools choose 'T'?
2) Are there any situations in which bcftools can return more than two alleles at a single SNP?
Any insights would be greatly appreciated.
Thanks,
David
Comment