Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Understanding coverage in VCF

    Hello and sorry if this question has been asked before or it is super obvious.
    I'm having a bit of a hard time finding out what the numbers mean in the vcf pileup.
    I created an alignment BAM file of my illumina paired-end reads to a bacterial genome using the following:

    Code:
    bwa aln ref.fa r1.fq > r1.sai
    bwa aln ref.fa r2.fq > r2.sai
    bwa sampe ref.fa r1.sai r2.sai r1.fq r2.fq > aln.sam
    samtools view -bS aln.sam > aln.bam
    Then I create the pileup with
    Code:
    samtools mpileup -uf aln.bam > pile.bcf
    bcftools view pile.bcf > pile.vcf
    I also create a depth file with
    Code:
    samtools depth aln.bam > depth.txt
    Let's take position 214 in the reference genome
    In the vcf
    Code:
    gi|218888746|ref|NC_011770.1|	214	.	G	C	202	.	DP=45;AF1=1;AC1=2;DP4=0,0,25,13;MQ=55;FQ=-141	GT:PL:GQ	1/1:235,114,0:99
    In the depth
    Code:
    gi|218888746|ref|NC_011770.1|	214	53
    I understand that the DP(45) value in the vcf file refers to the raw coverage of the given position with high quality(q13?). An that the coverage number in the depth file (53) refers to the number of bases called at that given position regardless of quality and/or conflict. I also understand that the DP4 values (0,0,25,13) reflect if the reads that cover that position match the reference or an alternative variant. In this case 38 (25+13) high quality reads match the alternative variant and 0 the reference. If this is right what happened to the other 7 (45-38) high quality reads that cover that position.

    Another doubt I have is when I find multiple genotypes at a position. For example:
    In the VCF
    Code:
    gi|218888746|ref|NC_011770.1|	3081	.	C	T,G	213	.	DP=70;AF1=1;AC1=2;DP4=0,0,31,32;MQ=50;FQ=-214	GT:PL:GQ	1/1:246,187,0,247,170,244:99
    My understanding is that in this case that 63 reads support either a T or a G at this position. So what I am trying to understand is how I can tell the support for each genotype. I tried looking at the GT:PL:GQ, but I dont quite understand it.
    Thanks!

Latest Articles

Collapse

  • seqadmin
    Recent Advances in Sequencing Analysis Tools
    by seqadmin


    The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
    05-06-2024, 07:48 AM
  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 07:03 AM
0 responses
9 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-10-2024, 06:35 AM
0 responses
27 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-09-2024, 02:46 PM
0 responses
35 views
0 likes
Last Post seqadmin  
Started by seqadmin, 05-07-2024, 06:57 AM
0 responses
30 views
0 likes
Last Post seqadmin  
Working...
X