Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Hi Chipper,

    With novoalign try setting option -t60. This will limit to 2 mismatches at high quality base positions or maybe a 1 base insert/delete. It should run a bit faster.

    If you want to try novolaign with no indel capability set -o200 or something like that. It'll make a gap open so expensive that novoalign will do an ungapped alignment. It should improve performance further.

    The -t option of Novoalign is a bit like -e option of Bowtie. Novolaign will limit penalty (quality) to 30 for all bases so even a base that has Phred quality of 50 will only get penalised 30 points for a mismatch - this allows for SNP rates.

    Memory will still be higher than Bowtie.

    Cheers, Colin

    Comment


    • #17
      Could Bowtie be altered to have an interative trimming function, like SOAP has? I just did a quick comparison, and while untrimmed SOAP and Bowtie had about the same number of aligned reads with no trimming, (and Bowtie was much faster) I find that iteratively trimming the last bases with SOAP, 8 at a time, gives a huge boost to the number of reads that align, up to 30%.

      Comment


      • #18
        It has an option already to trim before alignment (-3 / -5) so why not try with that. It would help though if the unaligned reads were saved separately.

        Comment


        • #19
          But I don't want to trim accurate bases needlessly. The virture of iterative trimming is that it only trims as many as it needs.

          I could run the program a bunch of times with different trimming, and recombine the data after, but that's a pain, and might not be as efficient as having the program trim each read as it is handling it.

          Comment


          • #20
            Hi swbarnes2 - I'm going to add your suggestion to the sourceforge feature request list. You seem invested in SOAP-style alignment, but I would note that Maq-style (the default) accomplishes something like iterative trimming by simply discounting the penalty associated with mismatches at low-quality positions (usually clustered at the 3' end).

            Out of curiosity, do you use SOAP's mode for aligning with indels?

            Comment


            • #21
              Originally posted by Ben Langmead View Post
              Hi swbarnes2 - I'm going to add your suggestion to the sourceforge feature request list. You seem invested in SOAP-style alignment, but I would note that Maq-style (the default) accomplishes something like iterative trimming by simply discounting the penalty associated with mismatches at low-quality positions (usually clustered at the 3' end).
              Yes, I did notice that, (and I'm running pretty close to default:--best -p 4 -t) but I still see about the same number of aligned reads as I do with SOAP set to no trimming. SOAP iteratively trimming yields a whole lot more. By a test I ran last night, allowing SOAP to iteratively trim every base pair until there were no more than 2 mismatches yielded an extra million reads aligning in one lane compared to bowtie. During the day, I run the faster 'trim 8 bp at a time', but the difference is still substantial.

              Out of curiosity, do you use SOAP's mode for aligning with indels?
              Yes, that's part of the reason. Maq's indel detection is pretty hopeless, or it was when I looked at it last. And I think that it was not handling repeats at all, but maybe I'm misremembering.

              I've tried Maq, and I use it as a compliment to SOAP, but I didn't like that the output was all processed for me. I wanted to see qualities and repetitiveness and read IDs and pair distances across the genome, and the pile-up view doesn't show that, and I'm pretty sure that Maqview won't either. But the output of Maq didn't give me the raw alignment info to construct a file with all that info, so I use SOAP, and process that output.

              Comment


              • #22
                Are the extra million reads aligned after truncatinon really correctly placed?

                Comment


                • #23
                  They seem to be. But I'm only doing bacteria, and that's easier to align correctly to. Reference genomes are rarely what they are cracked up to be, but when aligning I look across the genome at what aligned where, I see mostly 48-mers, but also 40-mers, 32-mers and occasionally 24-mers, when I trim by 8's. And I know that when I compare the two output files of my test, the reads that show up in SOAP that didn't show up in Bowtie are all ones that SOAP trimmed.

                  Comment


                  • #24
                    Memory requirement on a window 32x

                    I tried to test human index downloaded from recommended site with the command

                    bowtie -c h_sapiens_asm ATTCAGTAGGTACTATAAATGGCCGAT

                    then, I got error:

                    Out of memory allocating ebwt[] in Ebwt::read() at ebwt.h:2811
                    terminate called after throwing an instance of 'std::bad_alloc'
                    what(): std::bad_alloc

                    This application has requested the Runtime to terminate it in an unusual way.
                    Please contact the application's support team for more information.

                    So, my question is about the memory allocation or whether I did anything wrong?

                    Thank you for your attention.

                    Comment


                    • #25
                      Hi jyli,

                      The memory footprint of the whole-human index is about 2.2 GB without the -z ("phased") option. With the -z option it's closer to 1.3 GB (last I checked). If your machine has 3 GB of RAM or more and you'd like to align to human, the default mode should be fine. If your machine has 2 gigabytes of RAM and you'd like to align to human, you'll need to use the -z option.

                      (The unfriendly error message is my fault! - I'm going to fix that for the next release.)

                      Thanks,
                      Ben

                      Comment


                      • #26
                        memory issues when creating index file

                        I am unable to index the human genome on my MacPro (16G RAM). I have the same problem when using the provided Mac binary or compling from source. I have posted the error output below.

                        Any ideas?

                        Thanks

                        ./bowtie-build -f ../../genomes/all_human_build_36.fa human_all
                        Settings:
                        Output files: "human_all.*.ebwt"
                        Line rate: 6 (line is 64 bytes)
                        Lines per side: 1 (side is 64 bytes)
                        Offset rate: 5 (one in 32)
                        FTable chars: 10
                        Max bucket size: default
                        Max bucket size, sqrt multiplier: default
                        Max bucket size, len divisor: 4
                        Difference-cover sample period: 1024
                        Reference base cutoff: none
                        Endianness: little
                        Actual local endianness: little
                        Sanity checking: disabled
                        Assertions: disabled
                        Random seed: 0
                        Sizeofs: void*:4, int:4, long:4, size_t:4
                        Input files DNA, FASTA:
                        ../../genomes/all_human_build_36.fa
                        Reading reference sizes
                        Choose best chunkRate: 15
                        Time reading reference sizes: 00:01:09
                        Calculating joined length
                        = 2860744704 (5384364 characters of padding)
                        Writing header
                        Reserving space for joined string
                        bowtie-build(6713) malloc: *** mmap(size=2860744704) failed (error code=12)
                        *** error: can't allocate region
                        *** set a breakpoint in malloc_error_break to debug
                        Out of memory creating joined string in Ebwt::initFromVector() at ebwt.h:586

                        Comment


                        • #27
                          Hello myrna,

                          Yes, sorry, other users have seen that problem too. It seems that even if your machine has plenty of RAM in total, the memory allocator may not be able to dole it out in large enough chunks to satisfy Bowtie (due to memory fragmentation within the allocator). I'm working on a solution for the 0.9.8 release. For now, you can usually work around the problem by using bowtie-build-packed, which uses 2-bit-per-base encoding to save memory.

                          BTW, a good place to report issues is the sourceforge bug tracker: (https://sourceforge.net/tracker/?fun...7&atid=1101606). It leaves a better paper trail.

                          Thanks!
                          Ben

                          Comment


                          • #28
                            memory issues when creating index file

                            Hi Ben.
                            Thanks for your prompt reply. This time around I see this error (after quite awhile):

                            bowtie-build-packed(14780) malloc: *** mmap(size=2860744704) failed (error code=12)
                            *** error: can't allocate region
                            *** set a breakpoint in malloc_error_break to debug
                            Could not allocate a suffix-array block of 2860744708 bytes
                            Please try using a larger number of blocks by specifying a smaller --bmax or
                            --bmaxmultsqrt or a larger --bmaxdivn

                            I will play with bmaxdivn and bmaxmultsqrt to see if I can get a successful build. Any suggestions?

                            Regards,

                            Ryan

                            Comment


                            • #29
                              Hello myrna,

                              As soon as the *next* version of Bowtie comes out, this pain will go away because there will be a "-a/--auto" option that automatically follows the suggestion printed in the error message. As 0.9.7.1 stands, you'll have to do what it says yourself, i.e., just try larger values of --bmaxdivn until it fits in memory. Again - I promise this will be easier in the next version.

                              Thanks,
                              Ben

                              Comment


                              • #30
                                Bowtie on a Mac

                                I found a way to fix the memory issue I mentioned in this thread on a Mac. It seems that the binary was run as a 32-bit intel process, which forces it to use 32-bit memory addressing. This meant that as soon as the process hit the 32-bit memory ceiling, it choked. I edited the Makefile and recompiled, and it runs as a 64-bit process now. I no longer get any complaints about memory, and don't have to tweak any of the runtime parameters.

                                Makefile modification:
                                old:
                                EXTRA_FLAGS =
                                new:
                                EXTRA_FLAGS = -arch x86_64

                                Ryan

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Choosing Between NGS and qPCR
                                  by seqadmin



                                  Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                                  10-18-2024, 07:11 AM
                                • seqadmin
                                  Non-Coding RNA Research and Technologies
                                  by seqadmin




                                  Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                                  Nobel Prize for MicroRNA Discovery
                                  This week,...
                                  10-07-2024, 08:07 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 05:31 AM
                                0 responses
                                10 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-24-2024, 06:58 AM
                                0 responses
                                20 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-23-2024, 08:43 AM
                                0 responses
                                48 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-17-2024, 07:29 AM
                                0 responses
                                58 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X