Hello all-
I've got several full genomes sequenced off a HiSeq. There's good coverage (~30x). Alignments were performed with default parameters in BWA. These are Caucasian individuals.
Variant detection was performed using mpileup, using almost exact parameters listed on the samtools page:
samtools mpileup -uf ref.fa output.sorted.bam | bcftools view -bvcg - > var.raw.bcf
bcftools view var.raw.bcf | vcfutils.pl varFilter -D100000 > var.flt.vcf
Not a single indel was called, and only ~2 million SNPs. 2 million is well below the >3 million I expected.
I was wondering if anyone who has aligned a full genome with similar depth of coverage, and called variants using mpileup and got a more reasonable variant number, would be willing to share what parameters were used?
The raw bcf file has ~3.9 million variants.
Any ideas/suggestions?
I've got several full genomes sequenced off a HiSeq. There's good coverage (~30x). Alignments were performed with default parameters in BWA. These are Caucasian individuals.
Variant detection was performed using mpileup, using almost exact parameters listed on the samtools page:
samtools mpileup -uf ref.fa output.sorted.bam | bcftools view -bvcg - > var.raw.bcf
bcftools view var.raw.bcf | vcfutils.pl varFilter -D100000 > var.flt.vcf
Not a single indel was called, and only ~2 million SNPs. 2 million is well below the >3 million I expected.
I was wondering if anyone who has aligned a full genome with similar depth of coverage, and called variants using mpileup and got a more reasonable variant number, would be willing to share what parameters were used?
The raw bcf file has ~3.9 million variants.
Any ideas/suggestions?
Comment