Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Originally posted by Papillon View Post
    I now first filter out reads with a low mapping quality within the SAM-file and reads that are 'B' flagged for over 90% by Illumina within FASTQ-files...
    This doesn't seems to work. Although alignment looks okay at first sight, creating the SAM file fails hopelessly (as you can see below). The same thing happened to me when I tried to map fastq-files in which I trimmed the bad read-ends.

    Since mapping went well with untouched fastq-files, would it be safe for me to assume that tempering with these files can be quite tricky?
    The only other cause could be using a newer version of BWA - v. 0.5.9b - and using the -I option for Illumina scoring.
    Those are the only differences between being able to map at all and the absolute failures.

    In retrospect, this method wasn't powerful enough to begin with, so I removed it from the pipeline.

    [bwa_sai2sam_pe_core] time elapses: 10.97 sec
    [bwa_sai2sam_pe_core] changing coordinates of 18 alignments.
    [bwa_sai2sam_pe_core] align unmapped mate...
    [bwa_paired_sw] 19395 out of 60331 Q17 singletons are mated.
    [bwa_paired_sw] 238 out of 193695 Q17 discordant pairs are fixed.
    [bwa_sai2sam_pe_core] time elapses: 22668.70 sec
    [bwa_sai2sam_pe_core] refine gapped alignments... 0.81 sec
    [bwa_sai2sam_pe_core] print alignments... 2.02 sec
    [bwa_sai2sam_pe_core] 262144 sequences have been processed.
    [bwa_sai2sam_pe_core] convert to sequence coordinate...
    [infer_isize] (25, 50, 75) percentile: (16981, 39972, 70412)
    [infer_isize] low and high boundaries: 101 and 177274 for estimating avg and std
    [infer_isize] inferred external isize from 39 pairs: 42807.359 +/- 29283.488
    [infer_isize] skewness: 0.344; kurtosis: -1.096; ap_prior: 1.00e-05
    [infer_isize] inferred maximum insert size: 220265 (6.06 sigma)
    [bwa_sai2sam_pe_core] time elapses: 11.07 sec
    [bwa_sai2sam_pe_core] changing coordinates of 12 alignments.
    [bwa_sai2sam_pe_core] align unmapped mate...
    [bwa_paired_sw] 20365 out of 59571 Q17 singletons are mated.
    [bwa_paired_sw] 253 out of 194428 Q17 discordant pairs are fixed.
    [bwa_sai2sam_pe_core] time elapses: 27222.88 sec
    [bwa_sai2sam_pe_core] refine gapped alignments... 0.83 sec
    [bwa_sai2sam_pe_core] print alignments... 2.03 sec
    [bwa_sai2sam_pe_core] 524288 sequences have been processed.
    [bwa_sai2sam_pe_core] convert to sequence coordinate...
    [infer_isize] fail to infer insert size: too few good pairs

    Comment


    • #17
      It should not be tricky, but you need to remove both sequences from a pair of reads if you decide to remove one. Although I'm not familiar with BWA, that seems like your problem.

      Comment


      • #18
        In the end I only removed ~109,000 reads out of ~92,000,000 per fastq file, so I'd expected that BWA would only have difficulties with pairing those few reads. Trimming reads (not removing them) seemed to cause similar problems, although I have to admit there were > ~50,000 with a 100% Q2/B flag, so that would be identical to removing them.

        Somewhere on this forum, the same problem is described (thread: 'BWA sampe hanging'). Same symptoms, slightly different causes, but it seems to agree with your answer that all pairs have to be lined up correctly and it seems that a small change can cause serious problems.

        Comment


        • #19
          I'm not familiar with BWA, but on bowtie, it just uses the read order to find read pairs, so as soon as you remove one read but not its pair, all subsequent reads go out of alignment and can't be correctly paired.

          Comment


          • #20
            Thank you for responding! I remapped my data again, using the latest version of BWA and it worked out great, so the most likely cause for my previous failure would indeed be that the read order was disturbed and therefor caused all problems.

            BWA does try to fix it, so it seems, but it takes an insane amount of time and in the end the results are not quite what you would expect (see my previous copy-paste of BWA's output).

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-25-2024, 11:49 AM
            0 responses
            20 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-24-2024, 08:47 AM
            0 responses
            20 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            62 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            61 views
            0 likes
            Last Post seqadmin  
            Working...
            X