I am testing samtools pileup as a SNP caller on a Mosaik-generated alignment of a mix of two bacterial strains versus a related reference genome. It seems to me that pileup plus samtools.pl varFilter can't be used to extract information on small numbers of SNPs that differ from the reference. Using samtools pileup with the -c option, for example, I get the following output line:
The consensus and reference base are correctly marked as "C". I can see that there are two bases marked as "T" in the read bases line, but when I apply samtools.pl varFilter to the pileup output, this site is not returned. I have tried a wide range of varFilter parameters but haven't hit on any combination that will return info on this or other doubleton sites.
Is it possible to get this info on low frequency alleles using samtools.pl varFilter or do I just have to write my own parser script for the raw pileup output? Can anyone recommend a better tool for deriving allele frequencies?
Code:
species_name 1609034 C C 99 0 57 66 .$.,......,,.....,.,..T.t....,,,...,,,,,,..,,,.,,,,,...,....,,,,.^_.^_, 8=BBC@?C<BB>@B@<CAC@1C;@@>B=?B=>A=B@A@B;CBAABBB>CB>BCBC8=ABB1B?BB%
Is it possible to get this info on low frequency alleles using samtools.pl varFilter or do I just have to write my own parser script for the raw pileup output? Can anyone recommend a better tool for deriving allele frequencies?