Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • JimC
    Member
    • Nov 2008
    • 10

    Odd behavior using samtools mpileup, bcftools

    I've tried this a few times and I'm getting very occassional / odd behavior in the output from mpileup / bcftools / vcfutils varFilter. The output reports an indel, but upon examination of the reads (with IGV), a single read contains the indel, while all the others do not, yet the vcf output file for my data reports:

    Chr1-gr2-PU5 662026 . ATGGAATGGTGGATTGG ATGGAATGGTGGATTGGAATGGTGGATTGG 99 . INDEL;DP=398;AF1=1;CI95=1,1;DP4=0,0,173,125;MQ=60 PL:GT:GQ 255,255,0:1/1:99

    One problem is that this output appears to state that there are 173, and 125 reads supporting the indel, when in fact there is 1. Other than manual inspection, I can't tell this apart from other true indels.

    I'm curious if this is due to a parameter in the command line? or if this is some kind of bug? (probably the former).

    The versions and command line parameters are as follows:
    This is a bacterial genome resequencing project with read depth > 300.
    Paired-end, 100 nt reads were aligned with default BWA parameters to a reference genome (BWA version 0.5.9-r16. (The genome contains some IUPAC codes, but none in the region of the problem I'm reporting).

    The resulting (sorted, indexed) bam file was used for input into samtools mpileup (0.1.12a)

    >samtools mpileup -uf ref.fasta aln.sorted.bam |bcftools view -bvcg - > aln.raw.bcf

    >bcftools view aln.raw.bcf | vcfutils.pl varFilter > aln.flt.vcf


    Thanks for any help !

    Jim
  • narain
    Banned
    • Aug 2011
    • 73

    #2
    in the varfilter there is an option of specifying the minimum depth of reads that are aligned to be called as a SNP/InDel . I guess its the -D parameter.

    Comment

    • swbarnes2
      Senior Member
      • May 2008
      • 910

      #3
      Between mpileup, and eyeballing IGV, I'd be more inclined to think that IGV is for some reason refusing to display the relevant reads, than that samtools is just making up a bunch of reads that really aren't there.

      Look at the .sam entries for that region. That's the sure way.
      Last edited by swbarnes2; 05-15-2013, 03:51 PM.

      Comment

      • narain
        Banned
        • Aug 2011
        • 73

        #4
        I havea a similar problems with calling InDels

        I tried to see if I can get InDels by using some contigs from E.Coli that was generated by SGA assembler. In a way the contigs would serve as a single end NGS data with the sequences being much longer.

        I first aligned the contigs to reference e.coli using bwa-sw to obtain SAM/BAM file. The BAM file was then sorted and indexed.

        Then, here is what I did for pileup for processing for a small region of interest:

        samtools mpileup -r 0:4,000,679-5,000,894 -uf e.coli.fa contigalignsorted.bam > contigalignsorted.region.mpileup

        Then to obtain SNPs and InDels from mpileup file, here is what I did:

        bcftools view -bvcg contigalignsorted.region.mpileup > contigalignsorted.region.bcf

        bcftools view contigalignsorted.region.bcf > contigalignsorted.region.vcf


        However, when i see my alignment of BAM file on IGV for this region, I notice that the final VCF file obtained through commands above, has a lot of SNPs and InDels missing!


        What went wrong in the calls ?

        Comment

        • abi
          Member
          • May 2013
          • 18

          #5
          More importantly , it even shows up wrong SNPs which otherwise is not a SNP as per the bam file loaded on IGV.


          0 4009334 . A C 125 . DP=5;VDB=3.277706e-02;RPB=-1.291774e+00;AF1=0.5;AC1=1;DP4=1,1,2,1;MQ=59;FQ=74;PV4=1,0,1,0.38 GT:PL:GQ 0/1:155,0,101:99
          0 4009336 . T A 17.1 . DP=5;VDB=6.080000e-02;RPB=1.291774e+00;AF1=0.5;AC1=1;DP4=2,1,1,1;MQ=59;FQ=20.1;PV4=1,1,0.14,1 GT:PL:GQ 0/1:47,0,154:50

          As above 4009334 is a SNP but not as per IGV picture below:
          Attached Files

          Comment

          • swbarnes2
            Senior Member
            • May 2008
            • 910

            #6
            IGV is not designed to visualize exactly what samtools mpileup is doing. They are different pieces of software, and they filter things differently.

            Likely, IGV is for some reason filtering away the relevant reads, or something else is different between the two pieces of software.

            Your vcf says there is a mixed C/A SNP, and your picture shows a mixed C/A SNP. Maybe there is a discrepancy in the references between the reference you aligned to for the vcf, and the reference you are looking at in IGV.

            Examine the raw read themselves, in the .bam. That's the way to be sure.
            Last edited by swbarnes2; 05-21-2013, 11:22 AM.

            Comment

            • abi
              Member
              • May 2013
              • 18

              #7
              @swbarnes2

              I did the indexing of my genome file which was used for alignment. I used the same genome to be uploaded to IGV and not ones already provided by IGV drop-down options. You can clearly see in the picture that IGV does not point out a SNP at location 4009334 while SAMtools/mpileup/bcftools extract it out! One got to be wrong, possibly the latter.

              Comment

              • abi
                Member
                • May 2013
                • 18

                #8
                Thanks @swbarnes2 for pointing out the difference between IGV and SAMtools. I am now using TVIEW and am able to see the SNP from my BAM file.

                Comment

                Latest Articles

                Collapse

                • SEQadmin2
                  Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                  by SEQadmin2


                  I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                  Here are nine questions we think about, in roughly the order they matter, before...
                  06-18-2026, 07:11 AM
                • SEQadmin2
                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                  by SEQadmin2


                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                  ...
                  06-02-2026, 10:05 AM
                • SEQadmin2
                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                  by SEQadmin2


                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                  Introduction

                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                  05-22-2026, 06:42 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, 06-17-2026, 06:09 AM
                0 responses
                22 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-09-2026, 11:58 AM
                0 responses
                40 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-05-2026, 10:09 AM
                0 responses
                47 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-04-2026, 08:59 AM
                0 responses
                49 views
                0 reactions
                Last Post SEQadmin2  
                Working...