Understanding coverage in VCF

elfuser

Member

Join Date: Dec 2009

Posts: 16
- Share
- Tweet
#1

Understanding coverage in VCF

05-31-2012, 03:25 PM

Hello and sorry if this question has been asked before or it is super obvious.
I'm having a bit of a hard time finding out what the numbers mean in the vcf pileup.
I created an alignment BAM file of my illumina paired-end reads to a bacterial genome using the following:

Code:

bwa aln ref.fa r1.fq > r1.sai bwa aln ref.fa r2.fq > r2.sai bwa sampe ref.fa r1.sai r2.sai r1.fq r2.fq > aln.sam samtools view -bS aln.sam > aln.bam

Then I create the pileup with

Code:

samtools mpileup -uf aln.bam > pile.bcf bcftools view pile.bcf > pile.vcf

I also create a depth file with

Code:

samtools depth aln.bam > depth.txt

Let's take position 214 in the reference genome
In the vcf

Code:

gi|218888746|ref|NC_011770.1| 214 . G C 202 . DP=45;AF1=1;AC1=2;DP4=0,0,25,13;MQ=55;FQ=-141 GT:PL:GQ 1/1:235,114,0:99

In the depth

Code:

gi|218888746|ref|NC_011770.1| 214 53

I understand that the DP(45) value in the vcf file refers to the raw coverage of the given position with high quality(q13?). An that the coverage number in the depth file (53) refers to the number of bases called at that given position regardless of quality and/or conflict. I also understand that the DP4 values (0,0,25,13) reflect if the reads that cover that position match the reference or an alternative variant. In this case 38 (25+13) high quality reads match the alternative variant and 0 the reference. If this is right what happened to the other 7 (45-38) high quality reads that cover that position.

Another doubt I have is when I find multiple genotypes at a position. For example:
In the VCF

Code:

gi|218888746|ref|NC_011770.1| 3081 . C T,G 213 . DP=70;AF1=1;AC1=2;DP4=0,0,31,32;MQ=50;FQ=-214 GT:PL:GQ 1/1:246,187,0,247,170,244:99

My understanding is that in this case that 63 reads support either a T or a G at this position. So what I am trying to understand is how I can tell the support for each genotype. I tried looking at the GT:PL:GQ, but I dont quite understand it.
Thanks!
Tags: bwa, samtools bcftools

Previous template Next

Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing

by GATTACAT

Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
- Channel: Articles
07-01-2026, 11:43 AM
Nine Things a Sample Prep Scientist Thinks About Before Sequencing

by SEQadmin2

I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

Here are nine questions we think about, in roughly the order they matter, before...
- Channel: Articles
06-18-2026, 07:11 AM

Topics	Statistics	Last Post
High-Resolution Sequencing Exposes Hidden Toxoplasma Diversity by SEQadmin2 Started by SEQadmin2, Yesterday, 11:08 AM	0 responses 6 views 0 reactions	Last Post by SEQadmin2 Yesterday, 11:08 AM
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, 06-30-2026, 05:37 AM	0 responses 11 views 0 reactions	Last Post by SEQadmin2 06-30-2026, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 19 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 53 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM

Unconfigured Ad

Understanding coverage in VCF

Latest Articles

ad_right_rmr

News