Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • mikael
    Junior Member
    • Feb 2011
    • 5

    Samtools don't read the whole data

    Hello,

    I'm having troubles running samtools. When launching the following command:
    Code:
    samtools mpileup -uf ref.fa aln1.bam | bcftools view -bvcg - > var.raw.bcf
    It ends without error but it only considered the first 3Mbp of the first human chromosome… The results seem to be consistent but very partial (I have reads for all the chromosomes)

    If I launch that command
    Code:
    samtools mpileup -f ref.fa aln1.bam
    I got the output for all the positions in pileup format, as expected. That is not exactly what I want since there is no SNP calling but that shows mpileup process the whole data.

    So, I assume that samtools mpileup is working properly but then why does bcftools stop after a few Mbp? Any idea would be welcome
    Thanks!
  • 12jrowley2
    Member
    • Dec 2010
    • 14

    #2
    Did this get resolved? I am having the same issue.
    Jesse

    Comment

    • nilshomer
      Nils Homer
      • Nov 2008
      • 1283

      #3
      What happens when you exclude the "-v" option:
      Code:
      samtools mpileup -uf ref.fa aln1.bam | bcftools view -bcg - > var.raw.bcf
      This should call all positions. Perhaps there are no variants called at later positions...

      Comment

      • 12jrowley2
        Member
        • Dec 2010
        • 14

        #4
        samtools doesn't read whole data

        I updated to samtools 0.1.17 from 0.1.12a. Now samtools doesn't seem to begin processing the data:

        ./samtools-0.1.17/samtools mpileup -I -C 0 -Q 40 -u -f ./hg19.fasta ./7741X2_Converted_reheaded.bam | ./samtools-0.1.17/bcftools/bcftools view -vcg - > ./7741X2.var.raw4

        [mpileup] 1 samples in 1 input files
        <mpileup> Set max per-file depth to 8000
        and stops here

        when I do -bcg instead it looks like reads begin processing:


        [mpileup] 1 samples in 1 input files
        <mpileup> Set max per-file depth to 8000
        [bcfview] 100000 sites processed.
        [afs] 0:99850.000 1:100.000 2:50.000
        [bcfview] 200000 sites processed.
        etc.

        before with 0.1.12a, both the -bcg and -vcg options began processing, only -vcg stopped prematurely.

        Is there a way to know if samtools gets all the way through the data?
        this is my header

        @HD VN:1.0 SO:coordinate
        @SQ SN:chr10 LN:135534747 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr11 LN:135006516 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr12 LN:133851895 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr13 LN:115169878 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr14 LN:107349540 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr15 LN:102531392 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr16 LN:90354753 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr17 LN:81195210 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr18 LN:78077248 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr19 LN:59128983 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr1 LN:249250621 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr20 LN:63025520 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr21 LN:48129895 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr22 LN:51304566 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr2 LN:243199373 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr3 LN:198022430 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr4 LN:191154276 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr5 LN:180915260 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr6 LN:171115067 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr7 LN:159138663 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr8 LN:146364022 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr9 LN:141213431 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chrAdapter LN:9260 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chrM LN:16571 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chrPhiX_Illumina LN:5386 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chrX LN:155270560 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chrY LN:59373566 AS:hg19EnsKnGn46bpAdaptPhi.novoindex

        thanks,

        Comment

        • swbarnes2
          Senior Member
          • May 2008
          • 910

          #5
          The mpileup step is very very long, much longer than the bcftools steps. Piping it doesn't really save much time, because the bcftools steps are a lot faster. And I guess it makes sense that if you ask it to make a all-points bcf, it'll show more progress. But is that really what you want?

          Comment

          • 12jrowley2
            Member
            • Dec 2010
            • 14

            #6
            Thanks for your input. Samtools pipes output line by line- as they are processed- into BCF-tools, thus the benefit of piping, right? I tried the -b option as troubleshooting to figure out why -v option is not processing the data. Any other suggestions?
            Last edited by 12jrowley2; 08-09-2011, 11:11 AM.

            Comment

            • swbarnes2
              Senior Member
              • May 2008
              • 910

              #7
              Originally posted by 12jrowley2 View Post
              Thanks for your input. Samtools pipes output line by line- as they are processed- into BCF-tools, thus the benefit of piping, right? I tried the -b option as troubleshooting to figure out why -v option is not processing the data. Any other suggestions?
              If it takes mpileup 2 hours to process all your reads, and then bcftools can process that output in 2 minutes, then piping is saving you all of 2 minutes.

              When I run mpileup solo, it sit on the "[mpileup] 1 samples in 1 input files
              <mpileup> Set max per-file depth to 8000" step for a long time. But the output file grows all the time. Then I run bcftools on that when it's done. (Or while it's running, just to get an idea of how far along it is, and about how many SNPs its finding) I can use the mpileup output to make a vcf for variant calling, and an all-points vcf for use in the vcf2fq program.

              Comment

              Latest Articles

              Collapse

              • SEQadmin2
                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                by SEQadmin2


                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                ...
                06-02-2026, 10:05 AM
              • SEQadmin2
                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                by SEQadmin2


                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                Introduction

                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                05-22-2026, 06:42 AM
              • SEQadmin2
                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                by SEQadmin2

                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                05-06-2026, 09:04 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, Today, 08:59 AM
              0 responses
              11 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-02-2026, 12:03 PM
              0 responses
              21 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-02-2026, 11:40 AM
              0 responses
              17 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 05-28-2026, 11:40 AM
              0 responses
              31 views
              0 reactions
              Last Post SEQadmin2  
              Working...