Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • What does DV format mean in a VCF file ?

    Hello everyone,

    I got a VCF file with the following line :

    ##FORMAT=<ID=DV,Number=1,Type=Integer,Description="Number of high-quality non-reference bases">
    My question is pretty simple but I didn't find any answer : what is a "high quality" ? How can I know which threshold was used ?

    I've used samtools mpileup and then bcftools call with -m option.

  • #2
    The '-Q' option of samtools mpileup sets the quality threshold for bases; by default any base with a quality of less than 13 is ignored.

    Comment


    • #3
      The SNP is not filtered out because I obtain it in my VCF file.
      For example I can have a SNP with a DP equal to 40 but the DV will be 4,3 and 5 for my 3 samples.

      So I dunno what does that mean...

      Comment


      • #4
        The '-Q' option of mpileup refers to individual bases, not entire reads, or SNPs inferred from reads. Quality can vary over the length of a read, so this statistic only counts portions of reads that are high quality.

        Comment


        • #5
          Originally posted by ClemBuntu View Post
          The SNP is not filtered out because I obtain it in my VCF file.
          For example I can have a SNP with a DP equal to 40 but the DV will be 4,3 and 5 for my 3 samples.

          So I dunno what does that mean...
          DP is the total number of reads that cover the SNP position; of those, four contained the SNP (e.g., G when the reference is T) for sample A, three for sample B, and five for sample C. The remaining reads typically match the reference base (T), although it's possible that they contain non-SNP/non-reference calls (A or C).

          Comment


          • #6
            Originally posted by HESmith View Post
            DP is the total number of reads that cover the SNP position; of those, four contained the SNP (e.g., G when the reference is T) for sample A, three for sample B, and five for sample C. The remaining reads typically match the reference base (T), although it's possible that they contain non-SNP/non-reference calls (A or C).
            Then why the DV description is "Number of high-quality non-reference bases" and not "Number of non-reference bases" ?

            Plus you're talking about the DP which is in INFO :
            ##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">

            But if you look at the DP which is in format :
            ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Number of high-quality bases">

            I could also ask what's the difference between DP and DP4 :

            ##INFO=<ID=DP4,Number=4,Type=Integer,Description="Number of high-quality ref-forward , ref-reverse, alt-forward and alt-reverse bases">

            Comment


            • #7
              Then why the DV description is "Number of high-quality non-reference bases" and not "Number of non-reference bases" ?
              Because it's usually only the high-quality bases that should be looked at when identifying variants.

              I could also ask what's the difference between DP and DP4
              DP4 gives another way of identifying poorly-covered regions. Variants should have roughly equal numbers of forward and reverse reads (assuming a sample prep that is not strand-specific). An imbalance in this may indicate that something funny is going on with sequences that span the region.

              Comment


              • #8
                Originally posted by gringer View Post
                Because it's usually only the high-quality bases that should be looked at when identifying variants.
                I agree, and according to your previous post "high quality" means above "-Q" option used with mpileup right (ie 13 by default) ?

                Comment


                • #9
                  I expect so. Modifying the '-Q' option changes how many bases are shown in the mpileup output, and I would expect that those are the only bases that make it through for the variant calculations.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Genetic Variation in Immunogenetics and Antibody Diversity
                    by seqadmin



                    The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                    11-06-2024, 07:24 PM
                  • seqadmin
                    Choosing Between NGS and qPCR
                    by seqadmin



                    Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                    10-18-2024, 07:11 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 11-08-2024, 11:09 AM
                  0 responses
                  184 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 11-08-2024, 06:13 AM
                  0 responses
                  139 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 11-01-2024, 06:09 AM
                  0 responses
                  80 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 10-30-2024, 05:31 AM
                  0 responses
                  26 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X