Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • sparks
    Senior Member
    • Mar 2008
    • 126

    #16
    Hi Chipper,

    With novoalign try setting option -t60. This will limit to 2 mismatches at high quality base positions or maybe a 1 base insert/delete. It should run a bit faster.

    If you want to try novolaign with no indel capability set -o200 or something like that. It'll make a gap open so expensive that novoalign will do an ungapped alignment. It should improve performance further.

    The -t option of Novoalign is a bit like -e option of Bowtie. Novolaign will limit penalty (quality) to 30 for all bases so even a base that has Phred quality of 50 will only get penalised 30 points for a mismatch - this allows for SNP rates.

    Memory will still be higher than Bowtie.

    Cheers, Colin

    Comment

    • swbarnes2
      Senior Member
      • May 2008
      • 910

      #17
      Could Bowtie be altered to have an interative trimming function, like SOAP has? I just did a quick comparison, and while untrimmed SOAP and Bowtie had about the same number of aligned reads with no trimming, (and Bowtie was much faster) I find that iteratively trimming the last bases with SOAP, 8 at a time, gives a huge boost to the number of reads that align, up to 30%.

      Comment

      • Chipper
        Senior Member
        • Mar 2008
        • 323

        #18
        It has an option already to trim before alignment (-3 / -5) so why not try with that. It would help though if the unaligned reads were saved separately.

        Comment

        • swbarnes2
          Senior Member
          • May 2008
          • 910

          #19
          But I don't want to trim accurate bases needlessly. The virture of iterative trimming is that it only trims as many as it needs.

          I could run the program a bunch of times with different trimming, and recombine the data after, but that's a pain, and might not be as efficient as having the program trim each read as it is handling it.

          Comment

          • Ben Langmead
            Senior Member
            • Sep 2008
            • 200

            #20
            Hi swbarnes2 - I'm going to add your suggestion to the sourceforge feature request list. You seem invested in SOAP-style alignment, but I would note that Maq-style (the default) accomplishes something like iterative trimming by simply discounting the penalty associated with mismatches at low-quality positions (usually clustered at the 3' end).

            Out of curiosity, do you use SOAP's mode for aligning with indels?

            Comment

            • swbarnes2
              Senior Member
              • May 2008
              • 910

              #21
              Originally posted by Ben Langmead View Post
              Hi swbarnes2 - I'm going to add your suggestion to the sourceforge feature request list. You seem invested in SOAP-style alignment, but I would note that Maq-style (the default) accomplishes something like iterative trimming by simply discounting the penalty associated with mismatches at low-quality positions (usually clustered at the 3' end).
              Yes, I did notice that, (and I'm running pretty close to default:--best -p 4 -t) but I still see about the same number of aligned reads as I do with SOAP set to no trimming. SOAP iteratively trimming yields a whole lot more. By a test I ran last night, allowing SOAP to iteratively trim every base pair until there were no more than 2 mismatches yielded an extra million reads aligning in one lane compared to bowtie. During the day, I run the faster 'trim 8 bp at a time', but the difference is still substantial.

              Out of curiosity, do you use SOAP's mode for aligning with indels?
              Yes, that's part of the reason. Maq's indel detection is pretty hopeless, or it was when I looked at it last. And I think that it was not handling repeats at all, but maybe I'm misremembering.

              I've tried Maq, and I use it as a compliment to SOAP, but I didn't like that the output was all processed for me. I wanted to see qualities and repetitiveness and read IDs and pair distances across the genome, and the pile-up view doesn't show that, and I'm pretty sure that Maqview won't either. But the output of Maq didn't give me the raw alignment info to construct a file with all that info, so I use SOAP, and process that output.

              Comment

              • Chipper
                Senior Member
                • Mar 2008
                • 323

                #22
                Are the extra million reads aligned after truncatinon really correctly placed?

                Comment

                • swbarnes2
                  Senior Member
                  • May 2008
                  • 910

                  #23
                  They seem to be. But I'm only doing bacteria, and that's easier to align correctly to. Reference genomes are rarely what they are cracked up to be, but when aligning I look across the genome at what aligned where, I see mostly 48-mers, but also 40-mers, 32-mers and occasionally 24-mers, when I trim by 8's. And I know that when I compare the two output files of my test, the reads that show up in SOAP that didn't show up in Bowtie are all ones that SOAP trimmed.

                  Comment

                  • jyli
                    Member
                    • Nov 2008
                    • 21

                    #24
                    Memory requirement on a window 32x

                    I tried to test human index downloaded from recommended site with the command

                    bowtie -c h_sapiens_asm ATTCAGTAGGTACTATAAATGGCCGAT

                    then, I got error:

                    Out of memory allocating ebwt[] in Ebwt::read() at ebwt.h:2811
                    terminate called after throwing an instance of 'std::bad_alloc'
                    what(): std::bad_alloc

                    This application has requested the Runtime to terminate it in an unusual way.
                    Please contact the application's support team for more information.

                    So, my question is about the memory allocation or whether I did anything wrong?

                    Thank you for your attention.

                    Comment

                    • Ben Langmead
                      Senior Member
                      • Sep 2008
                      • 200

                      #25
                      Hi jyli,

                      The memory footprint of the whole-human index is about 2.2 GB without the -z ("phased") option. With the -z option it's closer to 1.3 GB (last I checked). If your machine has 3 GB of RAM or more and you'd like to align to human, the default mode should be fine. If your machine has 2 gigabytes of RAM and you'd like to align to human, you'll need to use the -z option.

                      (The unfriendly error message is my fault! - I'm going to fix that for the next release.)

                      Thanks,
                      Ben

                      Comment

                      • myrna
                        Member
                        • Feb 2008
                        • 44

                        #26
                        memory issues when creating index file

                        I am unable to index the human genome on my MacPro (16G RAM). I have the same problem when using the provided Mac binary or compling from source. I have posted the error output below.

                        Any ideas?

                        Thanks

                        ./bowtie-build -f ../../genomes/all_human_build_36.fa human_all
                        Settings:
                        Output files: "human_all.*.ebwt"
                        Line rate: 6 (line is 64 bytes)
                        Lines per side: 1 (side is 64 bytes)
                        Offset rate: 5 (one in 32)
                        FTable chars: 10
                        Max bucket size: default
                        Max bucket size, sqrt multiplier: default
                        Max bucket size, len divisor: 4
                        Difference-cover sample period: 1024
                        Reference base cutoff: none
                        Endianness: little
                        Actual local endianness: little
                        Sanity checking: disabled
                        Assertions: disabled
                        Random seed: 0
                        Sizeofs: void*:4, int:4, long:4, size_t:4
                        Input files DNA, FASTA:
                        ../../genomes/all_human_build_36.fa
                        Reading reference sizes
                        Choose best chunkRate: 15
                        Time reading reference sizes: 00:01:09
                        Calculating joined length
                        = 2860744704 (5384364 characters of padding)
                        Writing header
                        Reserving space for joined string
                        bowtie-build(6713) malloc: *** mmap(size=2860744704) failed (error code=12)
                        *** error: can't allocate region
                        *** set a breakpoint in malloc_error_break to debug
                        Out of memory creating joined string in Ebwt::initFromVector() at ebwt.h:586

                        Comment

                        • Ben Langmead
                          Senior Member
                          • Sep 2008
                          • 200

                          #27
                          Hello myrna,

                          Yes, sorry, other users have seen that problem too. It seems that even if your machine has plenty of RAM in total, the memory allocator may not be able to dole it out in large enough chunks to satisfy Bowtie (due to memory fragmentation within the allocator). I'm working on a solution for the 0.9.8 release. For now, you can usually work around the problem by using bowtie-build-packed, which uses 2-bit-per-base encoding to save memory.

                          BTW, a good place to report issues is the sourceforge bug tracker: (https://sourceforge.net/tracker/?fun...7&atid=1101606). It leaves a better paper trail.

                          Thanks!
                          Ben

                          Comment

                          • myrna
                            Member
                            • Feb 2008
                            • 44

                            #28
                            memory issues when creating index file

                            Hi Ben.
                            Thanks for your prompt reply. This time around I see this error (after quite awhile):

                            bowtie-build-packed(14780) malloc: *** mmap(size=2860744704) failed (error code=12)
                            *** error: can't allocate region
                            *** set a breakpoint in malloc_error_break to debug
                            Could not allocate a suffix-array block of 2860744708 bytes
                            Please try using a larger number of blocks by specifying a smaller --bmax or
                            --bmaxmultsqrt or a larger --bmaxdivn

                            I will play with bmaxdivn and bmaxmultsqrt to see if I can get a successful build. Any suggestions?

                            Regards,

                            Ryan

                            Comment

                            • Ben Langmead
                              Senior Member
                              • Sep 2008
                              • 200

                              #29
                              Hello myrna,

                              As soon as the *next* version of Bowtie comes out, this pain will go away because there will be a "-a/--auto" option that automatically follows the suggestion printed in the error message. As 0.9.7.1 stands, you'll have to do what it says yourself, i.e., just try larger values of --bmaxdivn until it fits in memory. Again - I promise this will be easier in the next version.

                              Thanks,
                              Ben

                              Comment

                              • myrna
                                Member
                                • Feb 2008
                                • 44

                                #30
                                Bowtie on a Mac

                                I found a way to fix the memory issue I mentioned in this thread on a Mac. It seems that the binary was run as a 32-bit intel process, which forces it to use 32-bit memory addressing. This meant that as soon as the process hit the 32-bit memory ceiling, it choked. I edited the Makefile and recompiled, and it runs as a 64-bit process now. I no longer get any complaints about memory, and don't have to tweak any of the runtime parameters.

                                Makefile modification:
                                old:
                                EXTRA_FLAGS =
                                new:
                                EXTRA_FLAGS = -arch x86_64

                                Ryan

                                Comment

                                Latest Articles

                                Collapse

                                • SEQadmin2
                                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                  by SEQadmin2


                                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                  ...
                                  06-02-2026, 10:05 AM
                                • SEQadmin2
                                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                  by SEQadmin2


                                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                  Introduction

                                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                  05-22-2026, 06:42 AM
                                • SEQadmin2
                                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                  by SEQadmin2

                                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                  05-06-2026, 09:04 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by SEQadmin2, Yesterday, 08:59 AM
                                0 responses
                                13 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-02-2026, 12:03 PM
                                0 responses
                                21 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-02-2026, 11:40 AM
                                0 responses
                                19 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-28-2026, 11:40 AM
                                0 responses
                                31 views
                                0 reactions
                                Last Post SEQadmin2  
                                Working...