Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ranel
    Junior Member
    • Mar 2009
    • 1

    question about bowtie -e parameter

    Hi,

    My question concerns Bowtie's –e (--maqerr) parameter.

    If I understand correctly, while –n sets the maximum number of mismatches permitted in the "seed", the total number of mismatches over the entire read length can be controlled by the –e parameter. Indeed, increasing this parameter could greatly increase the number of aligned reads. For example, in one sample that I tested, increasing –e from 70 (default) to 140, increased the percentage of aligned reads from 31% to 48%.

    Default (-e 70)
    --------------------------------
    # reads processed: 4424341
    # reads with at least one reported alignment: 1395000 (31.53%)
    # reads that failed to align: 3029341 (68.47%)

    -e 140
    --------------
    # reads processed: 4424341
    # reads with at least one reported alignment: 2125127 (48.03%)
    # reads that failed to align: 2299214 (51.97%)

    Does the default value (70) is the recommended level for 36 bp reads? Did anyone test how –e should be increased with the increasing of reads length? For example, any recommendation on how –e should be increased if reads length is 80 bp?

    Many thanks in advance,
    Rani
  • adamdeluca
    Member
    • Jul 2010
    • 95

    #2
    It all depends on your application, because this is a tradeoff between the quality of the alignment and the number of reads aligned. Figure between 30 and 40 per mismatch.

    Comment

    • mrawlins
      Member
      • Apr 2010
      • 63

      #3
      For RNA-Seq the desired output is usually a read count, so the reads only have to be of sufficient quality to map to the right location. The value for e in those applications can be 300+, depending on read length, without sacrificing quality of results.
      For SNP calling the quality of the reads is more important than the quantity, so a much lower -e is useful. For longer reads (80 bases) I wouldn't do anything lower than 100.

      I generally have used this method to figure out how to set -e.
      How many of the bases not covered in the seed would I tolerate being wrong, assuming they are high-quality bases. I take that number times 30 to set -e. If you don't care about what comes after the seed, take the number of non-seed bases and multiply by 30.
      Larger values for -e seem to slow bowtie down.

      Comment

      • fkrueger
        Senior Member
        • Sep 2009
        • 627

        #4
        Quality values get rounded to a the nearest 10, which means reads will be rejected if you have 3 high quality mismatches (it saturates at 30) in your mismatch. If the basecall quality is quite bad however, you can easily end up with 10 or 15 low scoring mismatches.

        As adamdeluca mentioned already there might be applications where it is worth increasing the limit (e.g. many high quality SNPs if you are sequencing another strain). Increasing -e does increase the alignment time considerably however.

        It might be worth performing some quality control on the data to see if the error rates start to increase drastically towards later cycles (e.g. with fastqc), and if so you might just trim all sequences to a cycle where you do still trust the basecalls before running bowtie.

        Comment

        Latest Articles

        Collapse

        • SEQadmin2
          Nine Things a Sample Prep Scientist Thinks About Before Sequencing
          by SEQadmin2


          I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

          Here are nine questions we think about, in roughly the order they matter, before...
          06-18-2026, 07:11 AM
        • SEQadmin2
          From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
          by SEQadmin2


          Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


          The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
          ...
          06-02-2026, 10:05 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by SEQadmin2, 06-26-2026, 11:10 AM
        0 responses
        16 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-17-2026, 06:09 AM
        0 responses
        49 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-09-2026, 11:58 AM
        0 responses
        108 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-05-2026, 10:09 AM
        0 responses
        125 views
        0 reactions
        Last Post SEQadmin2  
        Working...