I have a few questions about mpileup and SNP calling:
1) I have a very polymorphic species (theta_silent=0.03, about 30X humans). Is -C50 still recommended for bwa alignments or is there a chance I will lose haplotypes/SNPs?
2) I am interested in quantifying polymorphism (i.e. not just identifying SNPs, but calculating diversity statistics), and therefore need a pipeline that also identifies invariant sites and gives their confidence. In the documentation of the VCF format, line 6 is the 'Phred-scaled probability of all samples being homozygous reference.'. For sites that are called invariant, would this give the Phred-scaled probability of at least one sample being heterozygous? I.e. can I meaningfully use the Q value here to decide if I'm confident that a site is NOT variant?
3) I may have missed it, but I can't seem to find what the file format should look like for the -l command, if I want to give mpileup a list of sites to analyse?
Thanks!
1) I have a very polymorphic species (theta_silent=0.03, about 30X humans). Is -C50 still recommended for bwa alignments or is there a chance I will lose haplotypes/SNPs?
2) I am interested in quantifying polymorphism (i.e. not just identifying SNPs, but calculating diversity statistics), and therefore need a pipeline that also identifies invariant sites and gives their confidence. In the documentation of the VCF format, line 6 is the 'Phred-scaled probability of all samples being homozygous reference.'. For sites that are called invariant, would this give the Phred-scaled probability of at least one sample being heterozygous? I.e. can I meaningfully use the Q value here to decide if I'm confident that a site is NOT variant?
3) I may have missed it, but I can't seem to find what the file format should look like for the -l command, if I want to give mpileup a list of sites to analyse?
Thanks!