Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • bug in samtools view -s?

    Hi, I'm experiencing difficulties trying to downsample .bam files using samtools view -s. Specifically some of the commands fail while others work; this seems sometimes to be correlated with the -s float argument being > 0.5 (but not always). Here I'm c/p'ing some of the code that worked and some which failed.

    Thanks to any helpful suggestions!

    samtools view -b -s 0.271 1.bam > 1_ds.bam # gives 730697 reads as expected
    samtools view -b -s 0.5077 2.bam > 2_ds.bam # gives 0 reads unexpectedly
    samtools view -b -s 0.2113 3.bam > 3_ds.bam # gives 730697 reads as expected
    samtools view -b -s 0.3322 4.bam > 4_ds.bam # gives 730697 reads as expected
    samtools view -b -s 0.5306 5.bam > 5_ds.bam# gives 0 reads unexpectedly
    samtools view -b -s 0.204 6.bam > 6_ds.bam # gives 730697 reads as expected
    samtools view -b -s 0.3841 7.bam > 7_ds.bam # gives 730697 reads as expected
    samtools view -b -s 0.4691 8.bam > 8_ds.bam # gives 730697 reads as expected
    samtools view -b -s 0.6861 9.bam > 9_ds.bam # gives 0 reads unexpectedly
    samtools view -b -s 0.2261 10.bam > 10_ds.bam # gives 730697 reads as expected

    samtools view -b -s 0.6653 23.bam > 23_ds.bam # gives 730697 reads as expected
    samtools view -b -s 0.0444 24.bam > 24_ds.bam # gives 730697 reads as expected
    samtools view -b -s 0.0492 25.bam > 25_ds.bam # gives 730697 reads as expected
    samtools view -b -s 0.1648 26.bam > 26_ds.bam # gives 730697 reads as expected
    samtools view -b -s 0.0801 27.bam > 27_ds.bam # gives 730697 reads as expected
    samtools view -b -s 0.171 28.bam > 28_ds.bam # gives 730697 reads as expected
    samtools view -b -s 0.0979 29.bam > 29_ds.bam # gives 730697 reads as expected
    samtools view -b -s 0.0511 30.bam > 30_ds.bam # gives 730697 reads as expected

  • #2
    Not answering your question directly but you could use "reformat.sh" from BBMap suite to do this as well. You can specify sampling parameters with more granularity (even as certain number of reads etc).

    Comment


    • #3
      Originally posted by jkzebrafish View Post
      Hi, I'm experiencing difficulties trying to downsample .bam files using samtools view -s. Specifically some of the commands fail while others work; this seems sometimes to be correlated with the -s float argument being > 0.5 (but not always). Here I'm c/p'ing some of the code that worked and some which failed.
      It seems there are specific read alignments that are causing the failures. You could confirm this by using taking one of the .bam's that failed, use different random seeds w/ a small sample fraction, and you should see the failure some percentage on of the time.

      Are these alignments of very long reads? (> 65k bp). Alignments with cigar strings longer than the 16-bit integer limit (65,535) can behave strangely

      Comment


      • #4
        Thanks cstack for the response. These are paired end 75bp reads, nothing crazy.

        Here is a little more information:

        samtools view -b -s 0.6861 9.bam > 9_ds.bam # gives 0 reads
        samtools view -b -s 0.4861 9.bam > 9_ds.bam # gives ~50k reads
        samtools view -b -s 0.5 9.bam > 9_ds.bam # gives ~50k reads
        samtools view -b -s 0.5001 9.bam > 9_ds.bam # gives 0 reads
        samtools view -b -s 1.6861 9.bam > 9_ds.bam # gives 0 reads
        samtools view -b -s 5.6861 9.bam > 9_ds.bam # gives 0 reads
        samtools view -b -s 100.6861 9.bam > 9_ds.bam # gives 0 reads

        No errors or warnings are given, hence my confusion. Thanks for any insight.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Exploring the Dynamics of the Tumor Microenvironment
          by seqadmin




          The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
          07-08-2024, 03:19 PM
        • seqadmin
          Exploring Human Diversity Through Large-Scale Omics
          by seqadmin


          In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
          06-25-2024, 06:43 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 07-10-2024, 07:30 AM
        0 responses
        25 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 07-03-2024, 09:45 AM
        0 responses
        201 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 07-03-2024, 08:54 AM
        0 responses
        211 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 07-02-2024, 03:00 PM
        0 responses
        193 views
        0 likes
        Last Post seqadmin  
        Working...
        X