Seqanswers Leaderboard Ad

**sparks** · 11-11-2008, 12:55 AM

Hi Chipper,

With novoalign try setting option -t60. This will limit to 2 mismatches at high quality base positions or maybe a 1 base insert/delete. It should run a bit faster.

If you want to try novolaign with no indel capability set -o200 or something like that. It'll make a gap open so expensive that novoalign will do an ungapped alignment. It should improve performance further.

The -t option of Novoalign is a bit like -e option of Bowtie. Novolaign will limit penalty (quality) to 30 for all bases so even a base that has Phred quality of 50 will only get penalised 30 points for a mismatch - this allows for SNP rates.

Memory will still be higher than Bowtie.

Cheers, Colin

**swbarnes2** · 11-11-2008, 03:11 PM

Could Bowtie be altered to have an interative trimming function, like SOAP has? I just did a quick comparison, and while untrimmed SOAP and Bowtie had about the same number of aligned reads with no trimming, (and Bowtie was much faster) I find that iteratively trimming the last bases with SOAP, 8 at a time, gives a huge boost to the number of reads that align, up to 30%.

**Chipper** · 11-12-2008, 04:35 AM

It has an option already to trim before alignment (-3 / -5) so why not try with that. It would help though if the unaligned reads were saved separately.

**swbarnes2** · 11-12-2008, 11:07 AM

But I don't want to trim accurate bases needlessly. The virture of iterative trimming is that it only trims as many as it needs.

I could run the program a bunch of times with different trimming, and recombine the data after, but that's a pain, and might not be as efficient as having the program trim each read as it is handling it.

**Ben Langmead** · 11-13-2008, 05:37 AM

Hi swbarnes2 - I'm going to add your suggestion to the sourceforge feature request list. You seem invested in SOAP-style alignment, but I would note that Maq-style (the default) accomplishes something like iterative trimming by simply discounting the penalty associated with mismatches at low-quality positions (usually clustered at the 3' end).

Out of curiosity, do you use SOAP's mode for aligning with indels?

**swbarnes2** · 11-13-2008, 08:56 AM

Originally posted by Ben Langmead View Post

Hi swbarnes2 - I'm going to add your suggestion to the sourceforge feature request list. You seem invested in SOAP-style alignment, but I would note that Maq-style (the default) accomplishes something like iterative trimming by simply discounting the penalty associated with mismatches at low-quality positions (usually clustered at the 3' end).

Yes, I did notice that, (and I'm running pretty close to default:--best -p 4 -t) but I still see about the same number of aligned reads as I do with SOAP set to no trimming. SOAP iteratively trimming yields a whole lot more. By a test I ran last night, allowing SOAP to iteratively trim every base pair until there were no more than 2 mismatches yielded an extra million reads aligning in one lane compared to bowtie. During the day, I run the faster 'trim 8 bp at a time', but the difference is still substantial.

Out of curiosity, do you use SOAP's mode for aligning with indels?

Yes, that's part of the reason. Maq's indel detection is pretty hopeless, or it was when I looked at it last. And I think that it was not handling repeats at all, but maybe I'm misremembering.

I've tried Maq, and I use it as a compliment to SOAP, but I didn't like that the output was all processed for me. I wanted to see qualities and repetitiveness and read IDs and pair distances across the genome, and the pile-up view doesn't show that, and I'm pretty sure that Maqview won't either. But the output of Maq didn't give me the raw alignment info to construct a file with all that info, so I use SOAP, and process that output.

**Chipper** · 11-13-2008, 11:31 AM

Are the extra million reads aligned after truncatinon really correctly placed?

**swbarnes2** · 11-13-2008, 12:24 PM

They seem to be. But I'm only doing bacteria, and that's easier to align correctly to. Reference genomes are rarely what they are cracked up to be, but when aligning I look across the genome at what aligned where, I see mostly 48-mers, but also 40-mers, 32-mers and occasionally 24-mers, when I trim by 8's. And I know that when I compare the two output files of my test, the reads that show up in SOAP that didn't show up in Bowtie are all ones that SOAP trimmed.

**jyli** · 11-13-2008, 01:07 PM

Memory requirement on a window 32x

I tried to test human index downloaded from recommended site with the command

bowtie -c h_sapiens_asm ATTCAGTAGGTACTATAAATGGCCGAT

then, I got error:

Out of memory allocating ebwt[] in Ebwt::read() at ebwt.h:2811
terminate called after throwing an instance of 'std::bad_alloc'
what(): std::bad_alloc

This application has requested the Runtime to terminate it in an unusual way.
Please contact the application's support team for more information.

So, my question is about the memory allocation or whether I did anything wrong?

Thank you for your attention.

**Ben Langmead** · 11-14-2008, 07:00 AM

Hi jyli,

The memory footprint of the whole-human index is about 2.2 GB without the -z ("phased") option. With the -z option it's closer to 1.3 GB (last I checked). If your machine has 3 GB of RAM or more and you'd like to align to human, the default mode should be fine. If your machine has 2 gigabytes of RAM and you'd like to align to human, you'll need to use the -z option.

(The unfriendly error message is my fault! - I'm going to fix that for the next release.)

Thanks,
Ben

**myrna** · 11-18-2008, 09:57 AM

memory issues when creating index file

I am unable to index the human genome on my MacPro (16G RAM). I have the same problem when using the provided Mac binary or compling from source. I have posted the error output below.

Any ideas?

Thanks

./bowtie-build -f ../../genomes/all_human_build_36.fa human_all
Settings:
Output files: "human_all.*.ebwt"
Line rate: 6 (line is 64 bytes)
Lines per side: 1 (side is 64 bytes)
Offset rate: 5 (one in 32)
FTable chars: 10
Max bucket size: default
Max bucket size, sqrt multiplier: default
Max bucket size, len divisor: 4
Difference-cover sample period: 1024
Reference base cutoff: none
Endianness: little
Actual local endianness: little
Sanity checking: disabled
Assertions: disabled
Random seed: 0
Sizeofs: void*:4, int:4, long:4, size_t:4
Input files DNA, FASTA:
../../genomes/all_human_build_36.fa
Reading reference sizes
Choose best chunkRate: 15
Time reading reference sizes: 00:01:09
Calculating joined length
= 2860744704 (5384364 characters of padding)
Writing header
Reserving space for joined string
bowtie-build(6713) malloc: *** mmap(size=2860744704) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Out of memory creating joined string in Ebwt::initFromVector() at ebwt.h:586

**Ben Langmead** · 11-18-2008, 11:13 AM

Hello myrna,

Yes, sorry, other users have seen that problem too. It seems that even if your machine has plenty of RAM in total, the memory allocator may not be able to dole it out in large enough chunks to satisfy Bowtie (due to memory fragmentation within the allocator). I'm working on a solution for the 0.9.8 release. For now, you can usually work around the problem by using bowtie-build-packed, which uses 2-bit-per-base encoding to save memory.

BTW, a good place to report issues is the sourceforge bug tracker: (https://sourceforge.net/tracker/?fun...7&atid=1101606). It leaves a better paper trail.

Thanks!
Ben

**myrna** · 11-18-2008, 02:57 PM

memory issues when creating index file

Hi Ben.
Thanks for your prompt reply. This time around I see this error (after quite awhile):

bowtie-build-packed(14780) malloc: *** mmap(size=2860744704) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug
Could not allocate a suffix-array block of 2860744708 bytes
Please try using a larger number of blocks by specifying a smaller --bmax or
--bmaxmultsqrt or a larger --bmaxdivn

I will play with bmaxdivn and bmaxmultsqrt to see if I can get a successful build. Any suggestions?

Regards,

Ryan

**Ben Langmead** · 11-18-2008, 03:02 PM

Hello myrna,

As soon as the *next* version of Bowtie comes out, this pain will go away because there will be a "-a/--auto" option that automatically follows the suggestion printed in the error message. As 0.9.7.1 stands, you'll have to do what it says yourself, i.e., just try larger values of --bmaxdivn until it fits in memory. Again - I promise this will be easier in the next version.

Thanks,
Ben

**myrna** · 11-19-2008, 11:55 AM

Bowtie on a Mac

I found a way to fix the memory issue I mentioned in this thread on a Mac. It seems that the binary was run as a 32-bit intel process, which forces it to use 32-bit memory addressing. This meant that as soon as the process hit the 32-bit memory ceiling, it choked. I edited the Makefile and recompiled, and it runs as a 64-bit process now. I no longer get any complaints about memory, and don't have to tweak any of the runtime parameters.

Makefile modification:
old:
EXTRA_FLAGS =
new:
EXTRA_FLAGS = -arch x86_64

Ryan

Topics	Statistics	Last Post
ASHG 2024 Highlights – Part Two by seqadmin Started by seqadmin, Today, 11:09 AM	0 responses 24 views 0 likes	Last Post by seqadmin Today, 11:09 AM
ASHG 2024 Highlights – Part One by seqadmin Started by seqadmin, Today, 06:13 AM	0 responses 20 views 0 likes	Last Post by seqadmin Today, 06:13 AM
Seq-Scope Expands Possibilities for High-Resolution Gene Expression Analysis by seqadmin Started by seqadmin, 11-01-2024, 06:09 AM	0 responses 30 views 0 likes	Last Post by seqadmin 11-01-2024, 06:09 AM
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, 10-30-2024, 05:31 AM	0 responses 21 views 0 likes	Last Post by seqadmin 10-30-2024, 05:31 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News