Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • fkrueger
    Senior Member
    • Sep 2009
    • 627

    #16
    Hi Iris,

    There is another potential issue I can think of which might cause such a low mapping percentage, and that is quite long reads paired with a high error rate.

    As you are apparently using fasta sequences a Phred score of 40 is assumed for each base (making it impossible to look for base call error rates). If the sequencing run had a high error rate towards the end of the run you will potentially accumulate too many high quality mismatches which might result in the rejection of alignments. If this is the case you could try and raise the ceiling of cumulative mismatch scores (default 70) to 150 or so (-e 150). Alternatively you could trim the read length down to a value which should not have a high error rate, e.g. trim 100bp reads down to 50 or 60 bases which should be plenty to allow unique mapping.

    Good luck!

    Comment

    • IrisZhu
      Member
      • Jul 2010
      • 25

      #17
      Originally posted by fkrueger View Post
      Hi Iris,

      There is another potential issue I can think of which might cause such a low mapping percentage, and that is quite long reads paired with a high error rate.

      As you are apparently using fasta sequences a Phred score of 40 is assumed for each base (making it impossible to look for base call error rates). If the sequencing run had a high error rate towards the end of the run you will potentially accumulate too many high quality mismatches which might result in the rejection of alignments. If this is the case you could try and raise the ceiling of cumulative mismatch scores (default 70) to 150 or so (-e 150). Alternatively you could trim the read length down to a value which should not have a high error rate, e.g. trim 100bp reads down to 50 or 60 bases which should be plenty to allow unique mapping.

      Good luck!
      Thanks for your reply.
      Is what you suggested (raise the mismatch score) equivalent to increasing the # of mismatches? Now the default is 2 (allowing 2 mismatch bases), so I can increase it to, say, 5? But either way will compromise the accuracy, right? --- more unqualified reads will get mapped.

      Comment

      • Lee Sam
        Member
        • Oct 2008
        • 57

        #18
        Originally posted by fkrueger View Post
        Hi Iris,

        There is another potential issue I can think of which might cause such a low mapping percentage, and that is quite long reads paired with a high error rate.

        As you are apparently using fasta sequences a Phred score of 40 is assumed for each base (making it impossible to look for base call error rates). If the sequencing run had a high error rate towards the end of the run you will potentially accumulate too many high quality mismatches which might result in the rejection of alignments. If this is the case you could try and raise the ceiling of cumulative mismatch scores (default 70) to 150 or so (-e 150). Alternatively you could trim the read length down to a value which should not have a high error rate, e.g. trim 100bp reads down to 50 or 60 bases which should be plenty to allow unique mapping.

        Good luck!
        A high error rate doing bowtie alignments shouldn't make a difference using the parameters she specifies since a seed region from the 5' end of the read is used to do the alignment (if memory serves me correctly).

        The bowtie manual (http://bowtie-bio.sourceforge.net/manual.shtml) says:
        -l/--seedlen <int>
        The "seed length"; i.e., the number of bases on the high-quality end of the read to which the -n ceiling applies. The lowest permitted setting is 5 and the default is 28. bowtie is faster for larger values of -l.
        If she was using the end-to-end alignment mode (which at least used to be an option in the past) errors in the reads may be an issue causing this, but the default is the seed-and-extend method. It shouldn't be that hard to find unique 28-mers...
        Last edited by Lee Sam; 08-11-2010, 12:28 PM.

        Comment

        • IrisZhu
          Member
          • Jul 2010
          • 25

          #19
          Originally posted by Lee Sam View Post
          A high error rate doing bowtie alignments shouldn't make a difference using the parameters she specifies since a seed region from the 5' end of the read is used to do the alignment (if memory serves me correctly).

          The bowtie manual (http://bowtie-bio.sourceforge.net/manual.shtml) says:


          If she was using the end-to-end alignment mode (which at lest used to be an option) errors in the reads may be an issue causing this, but the default is the seed-and-extend method. It shouldn't be that hard to find unique 28-mers...
          Oh I see i got it wrong. The 2 mismatches are for the seed length (28 bp from the high-quality end). But does it mean that it allows lots of mismatches in the other end?
          Thank you so much guys. I learned a lot!

          Comment

          • fkrueger
            Senior Member
            • Sep 2009
            • 627

            #20
            I wasn't talking about an error rate during the bowtie alignment, but the error rate of the sequencing run. We have seen quite a few runs where the error rate increased drastically towards the end of long runs (such as 75 bp). If actualy errors from the sequencing run get transformed into fastA files, a Phred score of 40, and therefore a very reliable (but wrong) basecall is assumed for that base.

            I was talking about the -e ceiling (not the -n ceiling which applies for the seed length, which can be anything between 0 and 3), which is defined as follows:

            -e/--maqerr <int>
            Maximum permitted total of quality values at all mismatched read positions throughout the entire alignment, not just in the "seed". The default is 70. Like Maq, bowtie rounds quality values to the nearest 10 and saturates at 30; rounding can be disabled with --nomaqround.

            This means that each mismatch in a fastA file counts as 30, i.e. 3 mismatches (even if it is quite close to the 3' end of the sequence) willexceed the default value of 70 and cause the sequence to be removed (and scored as not aligned).

            Comment

            • Lee Sam
              Member
              • Oct 2008
              • 57

              #21
              Originally posted by fkrueger View Post
              I was talking about the -e ceiling (not the -n ceiling which applies for the seed length, which can be anything between 0 and 3), which is defined as follows:

              -e/--maqerr <int>
              Maximum permitted total of quality values at all mismatched read positions throughout the entire alignment, not just in the "seed". The default is 70. Like Maq, bowtie rounds quality values to the nearest 10 and saturates at 30; rounding can be disabled with --nomaqround.

              This means that each mismatch in a fastA file counts as 30, i.e. 3 mismatches (even if it is quite close to the 3' end of the sequence) willexceed the default value of 70 and cause the sequence to be removed (and scored as not aligned).
              I see the manual changed a little since 0.10.0 (last version where I regularly read the manual), which read:

              -e/--maqerr <int> The maximum permitted total of quality values at
              mismatched read positions. This total is also
              called the "quality-weighted hamming distance" or
              "Q-distance." This is analogous to the -e option
              for "maq map". The default is 70. Note that,
              like Maq, Bowtie rounds quality values to the
              nearest 10 and saturates at 30.
              I was confused in my understanding of how the -e parameter works then. I had been under the impression that the totals were only within the seed region. I stand corrected.

              Comment

              • Sol
                Member
                • Oct 2010
                • 13

                #22
                Please


                I did the transcriptome SOLID, but reads not fully aligned, I'm losing 16 million of the data. what happened? where can I find? which software to use? I used the Bioscope. Can I use phred filter??
                thanks

                Comment

                Latest Articles

                Collapse

                • SEQadmin2
                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                  by SEQadmin2


                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                  ...
                  06-02-2026, 10:05 AM
                • SEQadmin2
                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                  by SEQadmin2


                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                  Introduction

                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                  05-22-2026, 06:42 AM
                • SEQadmin2
                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                  by SEQadmin2

                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                  05-06-2026, 09:04 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, 06-02-2026, 12:03 PM
                0 responses
                19 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-02-2026, 11:40 AM
                0 responses
                14 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 05-28-2026, 11:40 AM
                0 responses
                29 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 05-26-2026, 10:12 AM
                0 responses
                31 views
                0 reactions
                Last Post SEQadmin2  
                Working...