Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • AmitL
    Member
    • Aug 2011
    • 36

    Missing information in VCFs

    Hi everybody!
    I've just started working with all these bioinformatic analysis tools, so please forgive me if my question is stupid.

    I am comparing the VCF files I get from SAMTools with the old Pileup files, and there is some information that is present in the Pileup but absent from the VCFs.
    Highlighted below:

    Pileup:
    chr10 93489 G S 26 26 33 8 c,,,cc,, W]Z]QMZJ

    VCF:
    chr10 94083 . C T 71 . DP=43;AF1=0.5;CI95=0.5,0.5;DP4=5,5,2,25;MQ=23;FQ=47;PV4=0.0092,4.4e-22,1,1 GT:PL:GQ 0/1:101,0,74:77

    Does anyone know how to extract this info using the VCF format?

    Thanks a lot
    Amit
  • sdvie
    Member
    • Jul 2010
    • 68

    #2
    Hi Amit,

    the part from the pileup that you underlined corresponds to two columns:

    c,,,cc,, the bases in all reads (in this case 8) covering the given position and
    W]Z]QMZJ the base qualities (corresponding to the 8 bases) for these bases

    Check here for the details of the pileup format and on how bases and qualities are annotated.

    This information does appear somewhat summarized in the vcf format:

    The 4th and 5th column show the reference and alternative base, as summarized from the frequency of each base at this position in the pileup (in your example, the position of the pileup and the vcf do not coincide).

    As for the base qualities, there is a field inside INFO defined in the VCF format that is named BQ and gives you the root mean square of the base qualities at this position as a summary of the individual base qualities in the pileup. (Your example does not have this field though, you might try another variant caller.)

    Check here for the details of the vcf format.

    Obviously, the pileup format gives you more details in some respects, but then the focus of the vcf is a different one. However, basically, the information is kept.

    Cheers!

    Comment

    • AmitL
      Member
      • Aug 2011
      • 36

      #3
      Hi sdvie

      Thanks for the quick reply!

      I am currently using SAMTools 0.1.16 for this process. What would you suggest to use in order to get this information?

      cheers

      Comment

      • sdvie
        Member
        • Jul 2010
        • 68

        #4
        Hi Amit,

        unfortunately, I could not find any tool that outputs a vcf file containing the BQ field.
        (I am mostly using the GATK pipeline and the GATK Unified Genotyper on my bam files.)

        Maybe someone else knows more...

        cheers,
        Sophia

        Comment

        • sdvie
          Member
          • Jul 2010
          • 68

          #5
          sorry, have to correct myself:

          there is an option in samtools:

          calmd -r

          Looks like you have to generate an extended sam file with this command first and then generate the pileup from this one to have the BQ tag included.

          Never used this one before... live and learn.

          cheers!
          Last edited by sdvie; 08-30-2011, 02:21 AM.

          Comment

          • AmitL
            Member
            • Aug 2011
            • 36

            #6
            Thank you Sophie!
            Another thing, I need to know how many read are reference and how many are variant.
            Is this information present in the VCF?

            Good day
            Amit

            Comment

            • nilshomer
              Nils Homer
              • Nov 2008
              • 1283

              #7
              Yes, see the DP and DP4 fields.

              Comment

              • AmitL
                Member
                • Aug 2011
                • 36

                #8
                YES!! Just what I needed!

                Thanks a lot you guys, you saved me life!

                Have a cheerful day =D

                Comment

                Latest Articles

                Collapse

                • GATTACAT
                  Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                  by GATTACAT
                  Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                  07-01-2026, 11:43 AM
                • SEQadmin2
                  Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                  by SEQadmin2


                  I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                  Here are nine questions we think about, in roughly the order they matter, before...
                  06-18-2026, 07:11 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, 07-02-2026, 11:08 AM
                0 responses
                16 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-30-2026, 05:37 AM
                0 responses
                17 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-26-2026, 11:10 AM
                0 responses
                20 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-17-2026, 06:09 AM
                0 responses
                54 views
                0 reactions
                Last Post SEQadmin2  
                Working...