Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Papillon
    Member
    • Mar 2011
    • 13

    #16
    Originally posted by Papillon View Post
    I now first filter out reads with a low mapping quality within the SAM-file and reads that are 'B' flagged for over 90% by Illumina within FASTQ-files...
    This doesn't seems to work. Although alignment looks okay at first sight, creating the SAM file fails hopelessly (as you can see below). The same thing happened to me when I tried to map fastq-files in which I trimmed the bad read-ends.

    Since mapping went well with untouched fastq-files, would it be safe for me to assume that tempering with these files can be quite tricky?
    The only other cause could be using a newer version of BWA - v. 0.5.9b - and using the -I option for Illumina scoring.
    Those are the only differences between being able to map at all and the absolute failures.

    In retrospect, this method wasn't powerful enough to begin with, so I removed it from the pipeline.

    [bwa_sai2sam_pe_core] time elapses: 10.97 sec
    [bwa_sai2sam_pe_core] changing coordinates of 18 alignments.
    [bwa_sai2sam_pe_core] align unmapped mate...
    [bwa_paired_sw] 19395 out of 60331 Q17 singletons are mated.
    [bwa_paired_sw] 238 out of 193695 Q17 discordant pairs are fixed.
    [bwa_sai2sam_pe_core] time elapses: 22668.70 sec
    [bwa_sai2sam_pe_core] refine gapped alignments... 0.81 sec
    [bwa_sai2sam_pe_core] print alignments... 2.02 sec
    [bwa_sai2sam_pe_core] 262144 sequences have been processed.
    [bwa_sai2sam_pe_core] convert to sequence coordinate...
    [infer_isize] (25, 50, 75) percentile: (16981, 39972, 70412)
    [infer_isize] low and high boundaries: 101 and 177274 for estimating avg and std
    [infer_isize] inferred external isize from 39 pairs: 42807.359 +/- 29283.488
    [infer_isize] skewness: 0.344; kurtosis: -1.096; ap_prior: 1.00e-05
    [infer_isize] inferred maximum insert size: 220265 (6.06 sigma)
    [bwa_sai2sam_pe_core] time elapses: 11.07 sec
    [bwa_sai2sam_pe_core] changing coordinates of 12 alignments.
    [bwa_sai2sam_pe_core] align unmapped mate...
    [bwa_paired_sw] 20365 out of 59571 Q17 singletons are mated.
    [bwa_paired_sw] 253 out of 194428 Q17 discordant pairs are fixed.
    [bwa_sai2sam_pe_core] time elapses: 27222.88 sec
    [bwa_sai2sam_pe_core] refine gapped alignments... 0.83 sec
    [bwa_sai2sam_pe_core] print alignments... 2.03 sec
    [bwa_sai2sam_pe_core] 524288 sequences have been processed.
    [bwa_sai2sam_pe_core] convert to sequence coordinate...
    [infer_isize] fail to infer insert size: too few good pairs

    Comment

    • genlyai
      Member
      • Aug 2009
      • 39

      #17
      It should not be tricky, but you need to remove both sequences from a pair of reads if you decide to remove one. Although I'm not familiar with BWA, that seems like your problem.

      Comment

      • Papillon
        Member
        • Mar 2011
        • 13

        #18
        In the end I only removed ~109,000 reads out of ~92,000,000 per fastq file, so I'd expected that BWA would only have difficulties with pairing those few reads. Trimming reads (not removing them) seemed to cause similar problems, although I have to admit there were > ~50,000 with a 100% Q2/B flag, so that would be identical to removing them.

        Somewhere on this forum, the same problem is described (thread: 'BWA sampe hanging'). Same symptoms, slightly different causes, but it seems to agree with your answer that all pairs have to be lined up correctly and it seems that a small change can cause serious problems.

        Comment

        • genlyai
          Member
          • Aug 2009
          • 39

          #19
          I'm not familiar with BWA, but on bowtie, it just uses the read order to find read pairs, so as soon as you remove one read but not its pair, all subsequent reads go out of alignment and can't be correctly paired.

          Comment

          • Papillon
            Member
            • Mar 2011
            • 13

            #20
            Thank you for responding! I remapped my data again, using the latest version of BWA and it worked out great, so the most likely cause for my previous failure would indeed be that the read order was disturbed and therefor caused all problems.

            BWA does try to fix it, so it seems, but it takes an insane amount of time and in the end the results are not quite what you would expect (see my previous copy-paste of BWA's output).

            Comment

            Latest Articles

            Collapse

            • SEQadmin2
              From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
              by SEQadmin2


              Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


              The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
              ...
              Yesterday, 10:05 AM
            • SEQadmin2
              Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
              by SEQadmin2


              With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


              Introduction

              Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
              05-22-2026, 06:42 AM
            • SEQadmin2
              Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
              by SEQadmin2

              Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


              Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
              05-06-2026, 09:04 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, Yesterday, 12:03 PM
            0 responses
            19 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, Yesterday, 11:40 AM
            0 responses
            14 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 05-28-2026, 11:40 AM
            0 responses
            29 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 05-26-2026, 10:12 AM
            0 responses
            31 views
            0 reactions
            Last Post SEQadmin2  
            Working...