Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • zee
    NGS specialist
    • Apr 2008
    • 249

    samtools mpileup VCF4.0 format problem

    Hi All,

    I was just wondering if somebody was getting the same problem as I have when running samtools mpileup

    My command

    Code:
     samtools mpileup -ugf Human_hg18.fa SRR040810.recal.bam |bcftools view -vcg -
    My output VCF

    Code:
    ##fileformat=VCFv4.0
    #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SRR040810
    chr1    583     .       G       A       8.44    .       DP=2;AF1=1;CI95=0.5,1;DP4=0,0,0,2;MQ=43 PL:GT:GQ        39,6,0:1/1:49
    chr1    59133   .       A       G       27      .       DP=3;AF1=1;CI95=0.5,1;DP4=0,0,3,0;MQ=24 PL:GT:GQ        59,9,0:1/1:63
    chr1    59374   .       A       G       99      .       DP=32;AF1=1;CI95=1,1;DP4=0,0,19,11;MQ=45        PL:GT:GQ        255,90,0:1/1:99
    chr1    742584  .       A       G       5.29    .       DP=2;AF1=1;CI95=0.5,1;DP4=0,0,0,2;MQ=60 PL:GT:GQ        35,6,0:1/1:49
    chr1    745803  .       C       T       29.8    .       DP=2;AF1=1;CI95=0.5,1;DP4=0,0,0,2;MQ=60 PL:GT:GQ        61,6,0:1/1:49
    chr1    746467  .       A       G       3.02    .       DP=2;AF1=0.9999;CI95=0.5,1;DP4=0,0,1,0;MQ=60    PL:GT:GQ        30,3,0:1/1:41
    chr1    748187  .       T       C       6.98    .       DP=1;AF1=0.9999;CI95=0.5,1;DP4=0,0,1,0;MQ=60    PL:GT:GQ        36,3,0:1/1:41
    chr1    774913  .       G       A       9.31    .       DP=2;AF1=1;CI95=0.5,1;DP4=0,0,2,0;MQ=60 PL:GT:GQ        40,6,0:1/1:49
    chr1    777068  .       G       A       4.77    .       DP=1;AF1=0.9999;CI95=0.5,1;DP4=0,0,1,0;MQ=60    PL:GT:GQ        33,3,0:1/1:41
    chr1    789326  .       T       C       9.53    .       DP=1;AF1=0.9999;CI95=0.5,1;DP4=0,0,0,1;MQ=60    PL:GT:GQ        39,3,0:1/1:41
    chr1    792183  .       G       A       3.55    .       DP=1;AF1=0.9999;CI95=0.5,1;DP4=0,0,1,0;MQ=31    PL:GT:GQ        31,3,0:1/1:41
    chr1    798494  .       G       A       99      .       DP=14;AF1=1;CI95=1,1;DP4=0,0,14,0;MQ=60 PL:GT:GQ        174,42,0:1/1:99
    chr1    798785  .       G       A       99      .       DP=20;AF1=1;CI95=1,1;DP4=0,0,13,7;MQ=53 PL:GT:GQ        235,60,0:1/1:99
    chr1    798791  .       C       T       99      .       DP=22;AF1=1;CI95=1,1;DP4=1,0,14,7;MQ=52;PV4=1,0.00035,1,1       PL:GT:GQ        235,53,0:1/1:99
    chr1    811811  .       T       C       6.98    .       DP=1;AF1=0.9999;CI95=0.5,1;DP4=0,0,1,0;MQ=60    PL:GT:GQ        36,3,0:1/1:41
    chr1    815113  .       A       G       7.8     .       DP=1;AF1=0.9999;CI95=0.5,1;DP4=0,0,0,1;MQ=60    PL:GT:GQ        37,3,0:1/1:41
    chr1    815845  .       C       G       5.47    .       DP=3;AF1=0.5007;CI95=0.5,0.5;DP4=1,0,1,1;MQ=52;PV4=1,0.022,0.33,1       PL:GT:GQ        34,0,25:0/1:27
    chr1    815857  .       T       A       16.1    .       DP=4;AF1=0.5;CI95=0.5,0.5;DP4=2,0,1,1;MQ=54;PV4=1,0.031,0.21,1  PL:GT:GQ        46,0,55:0/1:48
    chr1    835972  .       G       C       6.98    .       DP=1;AF1=0.9999;CI95=0.5,1;DP4=0,0,0,0;MQ=0     PL:GT:GQ        36,3,0:1/1:41
    chr1    836201  .       A       G       65      .       DP=3;AF1=1;CI95=0.5,1;DP4=0,0,0,0;MQ=0  PL:GT:GQ        97,9,0:1/1:63
    chr1    840643  .       C       T       4.77    .       DP=1;AF1=0.9999;CI95=0.5,1;DP4=0,0,0,0;MQ=0     PL:GT:GQ        33,3,0:1/1:41
    And the validation error

    Code:
    Expected GT as the first genotype field at chr1:583
    INFO field at chr1:583 .. INFO tag [MQ] not listed in the header,INFO tag [DP4] not listed in the header,INFO tag [CI95] not listed in the header,INFO tag [DP] not listed in the header,INFO tag [AF1] not listed in the header
    SRR040810 column at chr1:583 .. FORMAT tag [GQ] not listed in the header,FORMAT tag [GT] not listed in the header,FORMAT tag [PL] not listed in the header
    Expected GT as the first genotype field at chr1:59133
    Expected GT as the first genotype field at chr1:59374
    Expected GT as the first genotype field at chr1:742584
    Expected GT as the first genotype field at chr1:745803
    Expected GT as the first genotype field at chr1:746467
    Expected GT as the first genotype field at chr1:748187
    Expected GT as the first genotype field at chr1:774913
    Expected GT as the first genotype field at chr1:777068
    Expected GT as the first genotype field at chr1:789326
    Expected GT as the first genotype field at chr1:792183
    Expected GT as the first genotype field at chr1:798494
    Expected GT as the first genotype field at chr1:798785
    Expected GT as the first genotype field at chr1:798791
    INFO field at chr1:798791 .. INFO tag [PV4] not listed in the header
    Expected GT as the first genotype field at chr1:811811
    Expected GT as the first genotype field at chr1:815113
    Expected GT as the first genotype field at chr1:815845
    Expected GT as the first genotype field at chr1:815857
    Expected GT as the first genotype field at chr1:835972
    Expected GT as the first genotype field at chr1:836201
    Expected GT as the first genotype field at chr1:840643
    Expected GT as the first genotype field at chr1:841620
    I am not quite sure if I am doing the right thing to produce VCF4.0 format. I used the vcf-validator from vcftools to check the VCF format.

    Some help would be greatly beneficial.

    Thanks
  • lh3
    Senior Member
    • Feb 2008
    • 686

    #2
    There is a bcf-fix.pl script to fix this issue. in VCF, GT is required to be the first field.

    EDIT:

    Could you point me to the link to SRR040810.recal.bam? I see the zero DP4 issue again.

    Comment

    • zee
      NGS specialist
      • Apr 2008
      • 249

      #3
      Thanks lh3, that has been immensely helpful. I will test it out and send you a location for the BAM file.

      Comment

      • lh3
        Senior Member
        • Feb 2008
        • 686

        #4
        Thanks for the example. However, I cannot reproduce the zero-DP4 problem with your alignment. As I tested, DP4 at 840643 is "DP4=0,0,1,0" not "DP4=0,0,0,0" in your example. I would assume this has been solved.

        Comment

        • zee
          NGS specialist
          • Apr 2008
          • 249

          #5
          I tried running it again on and still got the DP4 error

          my command was

          Code:
          samtools mpileup -ugf Human_hg18.fa SRR040810.recal.bam 2>err1 | bcftools view -vcg -

          Code:
          chr1    836201  .       A       G       65      .       DP=3;AF1=1;CI95=0.5,1;DP4=0,0,0,0;MQ=0  PL:GT:GQ        97,9,0:1/1:63
          chr1    840643  .       C       T       4.77    .       DP=1;AF1=0.9999;CI95=0.5,1;DP4=0,0,0,0;MQ=0     PL:GT:GQ        33,3,0:1/1:41
          chr1    841620  .       A       G       5.46    .       DP=2;AF1=0.9999;CI95=0.5,1;DP4=0,0,0,0;MQ=0     PL:GT:GQ        34,3,0:1/1:41
          I am using samtools Version: 0.1.11 (r851) and svn is at revision 855.

          Comment

          • lh3
            Senior Member
            • Feb 2008
            • 686

            #6
            That is weird. I am using the same version and it has no problem.

            Could you try:

            samtools mpileup -ugf Human_hg18.fa SRR040810.recal.bam 2>err1 | bcftools view -

            and show me the line at 841620?

            Comment

            • zee
              NGS specialist
              • Apr 2008
              • 249

              #7
              As requested

              Code:
              chr1    841620  .       A       G,X     0       .       DP=2;I16=0,0,0,1,0,0,34,1156,0,0,60,3600,0,0,3,9        PL      34,3,34,0,3,34

              Comment

              • lh3
                Senior Member
                • Feb 2008
                • 686

                #8
                Could you try the following (need to index first):

                samtools mpileup -ugf Human_hg18.fa -r chr1:841620-841620 SRR040810.recal.bam > 1.bcf

                bcftools view 1.bcf

                bcftools view -vc 1.bcf

                Could you show me the output from the two BCF commands? Thanks!

                Comment

                • zee
                  NGS specialist
                  • Apr 2008
                  • 249

                  #9
                  Code:
                  $ samtools mpileup -ugf Human_hg18.fa -r chr1:841620-841620 SRR040810.recal.bam > 1.bcf
                  [mpileup] 1 samples in 1 input files
                  <mpileup> Set max per-sample depth to 8000
                  
                  $ bcftools view 1.bcf
                  ##fileformat=VCFv4.0
                  #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SRR040810
                  chr1    841620  .       A       G,X     0       .       DP=2;I16=0,0,0,1,0,0,34,1156,0,0,60,3600,0,0,3,9        PL      34,3,34,0,3,34
                  
                  $ bcftools view -vc 1.bcf
                  ##fileformat=VCFv4.0
                  #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SRR040810
                  chr1    841620  .       A       G       5.46    .       DP=2;AF1=0.9999;CI95=0.5,1;DP4=0,0,0,1;MQ=60    PL      34,3,0
                  [afs] 0:0.284 1:0.358 2:0.357

                  Comment

                  • lh3
                    Senior Member
                    • Feb 2008
                    • 686

                    #10
                    Ok, the first try. r856

                    Please run the orginal command:

                    samtools mpileup -ugf Human_hg18.fa SRR040810.recal.bam 2>err1 | bcftools view -vcg -

                    and check position 841620

                    Comment

                    • zee
                      NGS specialist
                      • Apr 2008
                      • 249

                      #11
                      OK, I updated my SVN

                      Code:
                      URL: https://samtools.svn.sourceforge.net/svnroot/samtools/trunk/samtools
                      Repository Root: https://samtools.svn.sourceforge.net/svnroot/samtools
                      Repository UUID: fd675618-d7d1-4fc3-a54f-e58cce934862
                      Revision: 856
                      Node Kind: directory
                      Schedule: normal
                      Last Changed Author: lh3lh3
                      Last Changed Rev: 856

                      And then reran the mpileup command

                      Code:
                      chr1    836201  .       A       G       65      .       DP=3;AF1=1;CI95=0.5,1;DP4=0,0,0,0;MQ=0  PL:GT:GQ        97,9,0:1/1:63
                      chr1    840643  .       C       T       4.77    .       DP=1;AF1=0.9999;CI95=0.5,1;DP4=0,0,0,0;MQ=0     PL:GT:GQ        33,3,0:1/1:41
                      chr1    841620  .       A       G       5.46    .       DP=2;AF1=0.9999;CI95=0.5,1;DP4=0,0,0,0;MQ=0     PL:GT:GQ        34,3,0:1/1:41
                      chr1    842827  .       T       G       14.9    .       DP=2;AF1=1;CI95=0.5,1;DP4=0,0,0,0;MQ=0  PL:GT:GQ        46,6,0:1/1:49
                      bcf view output

                      Code:
                       $bcftools view 1.bcf
                      ##fileformat=VCFv4.0
                      #CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  SRR040810
                      chr1    841620  .       A       G,X     0       .       DP=2;I16=0,0,0,1,0,0,34,1156,0,0,60,3600,0,0,3,9        PL      34,3,34,0,3,34
                      Do I have to do any re-indexing of bam file or reference fasta?

                      Comment

                      • lh3
                        Senior Member
                        • Feb 2008
                        • 686

                        #12
                        Em... I do not know what is happening now...

                        The last thing I can think of is:

                        samtools mpileup -ugf Human_hg18.fa SRR040810.recal.bam > 1.bcf

                        bcftools view 1.bcf

                        bcftools view -vcg 1.bcf

                        and check the output at 841620. You can control-C after you run mpileup for a few minutes. These outputs are kind of weird. Still I am not sure if this is a samtools problem or bcftools problem. The difficult part is I cannot reproduce the problem here. Valgrind does not report errors around that position, either...

                        Thanks very much.

                        Comment

                        • zee
                          NGS specialist
                          • Apr 2008
                          • 249

                          #13
                          OK, here is the output:

                          Code:
                          bcftools view 1.bcf |grep -w 841620
                          chr1    841620  .       A       G,X     0       .       DP=2;I16=0,0,0,1,0,0,34,1156,0,0,60,3600,0,0,3,9        PL      34,3,34,0,3,34
                          
                          $ bcftools view -vcg 1.bcf | grep -w 841620
                          chr1    841620  .       A       G       5.46    .       DP=2;AF1=0.9999;CI95=0.5,1;DP4=0,0,0,0;MQ=0     PL:GT:GQ        34,3,0:1/1:41
                          I even checked the md5sum for the bam file I put up for download vs the one in my directory and they are the same. Weird indeed.

                          Comment

                          • lh3
                            Senior Member
                            • Feb 2008
                            • 686

                            #14
                            The first 4 numbers in I16 should be identical to the 4 numbers in DP4, but they different. I do not really know why only the few sites are different. Could you send me a link to 1.bcf? Thanks.

                            Comment

                            • lh3
                              Senior Member
                              • Feb 2008
                              • 686

                              #15
                              Hi Zee,

                              I have thought of one possible cause of the error. Could you help to try? You may comment out the 133-rd line in bcftools/call1.c. That line looks like:

                              Code:
                              if (anno[i] == 0 && (errno == EINVAL || errno == ERANGE)) return -2;
                              Another possible fix is to change the line 132 to:

                              Code:
                              errno = 0; anno[i] = strtol(p, &p, 10);
                              I could not reproduce the problem at my hand, so I have to rely on you. Thank you in advance!
                              Last edited by lh3; 12-02-2010, 12:54 PM.

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                                Here are nine questions we think about, in roughly the order they matter, before...
                                06-18-2026, 07:11 AM
                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-17-2026, 06:09 AM
                              0 responses
                              36 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              100 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              120 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              113 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...