Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bug in samtools view -s?

    Hi, I'm experiencing difficulties trying to downsample .bam files using samtools view -s. Specifically some of the commands fail while others work; this seems sometimes to be correlated with the -s float argument being > 0.5 (but not always). Here I'm c/p'ing some of the code that worked and some which failed.

    Thanks to any helpful suggestions!

    samtools view -b -s 0.271 1.bam > 1_ds.bam # gives 730697 reads as expected
    samtools view -b -s 0.5077 2.bam > 2_ds.bam # gives 0 reads unexpectedly
    samtools view -b -s 0.2113 3.bam > 3_ds.bam # gives 730697 reads as expected
    samtools view -b -s 0.3322 4.bam > 4_ds.bam # gives 730697 reads as expected
    samtools view -b -s 0.5306 5.bam > 5_ds.bam# gives 0 reads unexpectedly
    samtools view -b -s 0.204 6.bam > 6_ds.bam # gives 730697 reads as expected
    samtools view -b -s 0.3841 7.bam > 7_ds.bam # gives 730697 reads as expected
    samtools view -b -s 0.4691 8.bam > 8_ds.bam # gives 730697 reads as expected
    samtools view -b -s 0.6861 9.bam > 9_ds.bam # gives 0 reads unexpectedly
    samtools view -b -s 0.2261 10.bam > 10_ds.bam # gives 730697 reads as expected

    samtools view -b -s 0.6653 23.bam > 23_ds.bam # gives 730697 reads as expected
    samtools view -b -s 0.0444 24.bam > 24_ds.bam # gives 730697 reads as expected
    samtools view -b -s 0.0492 25.bam > 25_ds.bam # gives 730697 reads as expected
    samtools view -b -s 0.1648 26.bam > 26_ds.bam # gives 730697 reads as expected
    samtools view -b -s 0.0801 27.bam > 27_ds.bam # gives 730697 reads as expected
    samtools view -b -s 0.171 28.bam > 28_ds.bam # gives 730697 reads as expected
    samtools view -b -s 0.0979 29.bam > 29_ds.bam # gives 730697 reads as expected
    samtools view -b -s 0.0511 30.bam > 30_ds.bam # gives 730697 reads as expected

  • #2
    Not answering your question directly but you could use "reformat.sh" from BBMap suite to do this as well. You can specify sampling parameters with more granularity (even as certain number of reads etc).

    Comment


    • #3
      Originally posted by jkzebrafish View Post
      Hi, I'm experiencing difficulties trying to downsample .bam files using samtools view -s. Specifically some of the commands fail while others work; this seems sometimes to be correlated with the -s float argument being > 0.5 (but not always). Here I'm c/p'ing some of the code that worked and some which failed.
      It seems there are specific read alignments that are causing the failures. You could confirm this by using taking one of the .bam's that failed, use different random seeds w/ a small sample fraction, and you should see the failure some percentage on of the time.

      Are these alignments of very long reads? (> 65k bp). Alignments with cigar strings longer than the 16-bit integer limit (65,535) can behave strangely

      Comment


      • #4
        Thanks cstack for the response. These are paired end 75bp reads, nothing crazy.

        Here is a little more information:

        samtools view -b -s 0.6861 9.bam > 9_ds.bam # gives 0 reads
        samtools view -b -s 0.4861 9.bam > 9_ds.bam # gives ~50k reads
        samtools view -b -s 0.5 9.bam > 9_ds.bam # gives ~50k reads
        samtools view -b -s 0.5001 9.bam > 9_ds.bam # gives 0 reads
        samtools view -b -s 1.6861 9.bam > 9_ds.bam # gives 0 reads
        samtools view -b -s 5.6861 9.bam > 9_ds.bam # gives 0 reads
        samtools view -b -s 100.6861 9.bam > 9_ds.bam # gives 0 reads

        No errors or warnings are given, hence my confusion. Thanks for any insight.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Genetic Variation in Immunogenetics and Antibody Diversity
          by seqadmin



          The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
          11-06-2024, 07:24 PM
        • seqadmin
          Choosing Between NGS and qPCR
          by seqadmin



          Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
          10-18-2024, 07:11 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Today, 11:09 AM
        0 responses
        22 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, Today, 06:13 AM
        0 responses
        20 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 11-01-2024, 06:09 AM
        0 responses
        30 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 10-30-2024, 05:31 AM
        0 responses
        21 views
        0 likes
        Last Post seqadmin  
        Working...
        X