Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Samtools don't read the whole data

    Hello,

    I'm having troubles running samtools. When launching the following command:
    Code:
    samtools mpileup -uf ref.fa aln1.bam | bcftools view -bvcg - > var.raw.bcf
    It ends without error but it only considered the first 3Mbp of the first human chromosome… The results seem to be consistent but very partial (I have reads for all the chromosomes)

    If I launch that command
    Code:
    samtools mpileup -f ref.fa aln1.bam
    I got the output for all the positions in pileup format, as expected. That is not exactly what I want since there is no SNP calling but that shows mpileup process the whole data.

    So, I assume that samtools mpileup is working properly but then why does bcftools stop after a few Mbp? Any idea would be welcome
    Thanks!

  • #2
    Did this get resolved? I am having the same issue.
    Jesse

    Comment


    • #3
      What happens when you exclude the "-v" option:
      Code:
      samtools mpileup -uf ref.fa aln1.bam | bcftools view -bcg - > var.raw.bcf
      This should call all positions. Perhaps there are no variants called at later positions...

      Comment


      • #4
        samtools doesn't read whole data

        I updated to samtools 0.1.17 from 0.1.12a. Now samtools doesn't seem to begin processing the data:

        ./samtools-0.1.17/samtools mpileup -I -C 0 -Q 40 -u -f ./hg19.fasta ./7741X2_Converted_reheaded.bam | ./samtools-0.1.17/bcftools/bcftools view -vcg - > ./7741X2.var.raw4

        [mpileup] 1 samples in 1 input files
        <mpileup> Set max per-file depth to 8000
        and stops here

        when I do -bcg instead it looks like reads begin processing:


        [mpileup] 1 samples in 1 input files
        <mpileup> Set max per-file depth to 8000
        [bcfview] 100000 sites processed.
        [afs] 0:99850.000 1:100.000 2:50.000
        [bcfview] 200000 sites processed.
        etc.

        before with 0.1.12a, both the -bcg and -vcg options began processing, only -vcg stopped prematurely.

        Is there a way to know if samtools gets all the way through the data?
        this is my header

        @HD VN:1.0 SO:coordinate
        @SQ SN:chr10 LN:135534747 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr11 LN:135006516 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr12 LN:133851895 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr13 LN:115169878 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr14 LN:107349540 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr15 LN:102531392 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr16 LN:90354753 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr17 LN:81195210 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr18 LN:78077248 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr19 LN:59128983 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr1 LN:249250621 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr20 LN:63025520 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr21 LN:48129895 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr22 LN:51304566 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr2 LN:243199373 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr3 LN:198022430 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr4 LN:191154276 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr5 LN:180915260 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr6 LN:171115067 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr7 LN:159138663 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr8 LN:146364022 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chr9 LN:141213431 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chrAdapter LN:9260 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chrM LN:16571 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chrPhiX_Illumina LN:5386 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chrX LN:155270560 AS:hg19EnsKnGn46bpAdaptPhi.novoindex
        @SQ SN:chrY LN:59373566 AS:hg19EnsKnGn46bpAdaptPhi.novoindex

        thanks,

        Comment


        • #5
          The mpileup step is very very long, much longer than the bcftools steps. Piping it doesn't really save much time, because the bcftools steps are a lot faster. And I guess it makes sense that if you ask it to make a all-points bcf, it'll show more progress. But is that really what you want?

          Comment


          • #6
            Thanks for your input. Samtools pipes output line by line- as they are processed- into BCF-tools, thus the benefit of piping, right? I tried the -b option as troubleshooting to figure out why -v option is not processing the data. Any other suggestions?
            Last edited by 12jrowley2; 08-09-2011, 11:11 AM.

            Comment


            • #7
              Originally posted by 12jrowley2 View Post
              Thanks for your input. Samtools pipes output line by line- as they are processed- into BCF-tools, thus the benefit of piping, right? I tried the -b option as troubleshooting to figure out why -v option is not processing the data. Any other suggestions?
              If it takes mpileup 2 hours to process all your reads, and then bcftools can process that output in 2 minutes, then piping is saving you all of 2 minutes.

              When I run mpileup solo, it sit on the "[mpileup] 1 samples in 1 input files
              <mpileup> Set max per-file depth to 8000" step for a long time. But the output file grows all the time. Then I run bcftools on that when it's done. (Or while it's running, just to get an idea of how far along it is, and about how many SNPs its finding) I can use the mpileup output to make a vcf for variant calling, and an all-points vcf for use in the vcf2fq program.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 04-25-2024, 11:49 AM
              0 responses
              19 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-24-2024, 08:47 AM
              0 responses
              17 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              62 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              60 views
              0 likes
              Last Post seqadmin  
              Working...
              X