Hello and sorry if this question has been asked before or it is super obvious.
I'm having a bit of a hard time finding out what the numbers mean in the vcf pileup.
I created an alignment BAM file of my illumina paired-end reads to a bacterial genome using the following:
Then I create the pileup with
I also create a depth file with
Let's take position 214 in the reference genome
In the vcf
In the depth
I understand that the DP(45) value in the vcf file refers to the raw coverage of the given position with high quality(q13?). An that the coverage number in the depth file (53) refers to the number of bases called at that given position regardless of quality and/or conflict. I also understand that the DP4 values (0,0,25,13) reflect if the reads that cover that position match the reference or an alternative variant. In this case 38 (25+13) high quality reads match the alternative variant and 0 the reference. If this is right what happened to the other 7 (45-38) high quality reads that cover that position.
Another doubt I have is when I find multiple genotypes at a position. For example:
In the VCF
My understanding is that in this case that 63 reads support either a T or a G at this position. So what I am trying to understand is how I can tell the support for each genotype. I tried looking at the GT:PL:GQ, but I dont quite understand it.
Thanks!
I'm having a bit of a hard time finding out what the numbers mean in the vcf pileup.
I created an alignment BAM file of my illumina paired-end reads to a bacterial genome using the following:
Code:
bwa aln ref.fa r1.fq > r1.sai bwa aln ref.fa r2.fq > r2.sai bwa sampe ref.fa r1.sai r2.sai r1.fq r2.fq > aln.sam samtools view -bS aln.sam > aln.bam
Code:
samtools mpileup -uf aln.bam > pile.bcf bcftools view pile.bcf > pile.vcf
Code:
samtools depth aln.bam > depth.txt
In the vcf
Code:
gi|218888746|ref|NC_011770.1| 214 . G C 202 . DP=45;AF1=1;AC1=2;DP4=0,0,25,13;MQ=55;FQ=-141 GT:PL:GQ 1/1:235,114,0:99
Code:
gi|218888746|ref|NC_011770.1| 214 53
Another doubt I have is when I find multiple genotypes at a position. For example:
In the VCF
Code:
gi|218888746|ref|NC_011770.1| 3081 . C T,G 213 . DP=70;AF1=1;AC1=2;DP4=0,0,31,32;MQ=50;FQ=-214 GT:PL:GQ 1/1:246,187,0,247,170,244:99
Thanks!