Bowtie, an ultrafast, memory-efficient, open source short read aligner

sahiilseth replied

08-22-2013, 11:52 AM
[QUOTE=GenoMax;114232]From Bowtie website:
They also say:
'If your computer has more than 3-4 GB of memory and you would like to exploit that fact to make index building faster, use a 64-bit version of the bowtie2-build binary. The 32-bit version of the binary is restricted to using less than 4 GB of memory. If a 64-bit pre-built binary does not yet exist for your platform on the sourceforge download site, you will need to build one from source.'

I thought 64 bit binary, should be able to handle more characters as well; not true?
Leave a comment:
GenoMax replied

08-22-2013, 11:49 AM
Originally posted by sahiilseth View Post

Hi I am using the latest builds of bowtie 1 and 2 with 64 bit support..

But they are still dying with the error:
Reading reference sizes
Error: Reference sequence has more than 2^32-1 characters! Please divide the
reference into batches or chunks of about 3.6 billion characters or less each

64-bit
Built on do-dmxp-mac.win.ad.jhu.edu
Tue Feb 26 13:33:50 EST 2013
Compiler: gcc version 4.1.2 20080704 (Red Hat 4.1.2-54)
Options: -O3 -m64 -msse2 -funroll-loops -g3
Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}

From Bowtie website:

Because bowtie2-build uses 32-bit pointers internally, it can handle up to a theoretical maximum of 2^32-1 (somewhat more than 4 billion) characters in an index, though, with other constraints, the actual ceiling is somewhat less than that. If your reference exceeds 2^32-1 characters, bowtie2-build will print an error message and abort. To resolve this, divide your reference sequences into smaller batches and/or chunks and build a separate index for each.

BWA is able to handle genomes > 4 GB in size (individual chromosomes < 2 GB).

Last edited by GenoMax; 08-22-2013, 11:51 AM.
Leave a comment:
sahiilseth replied

08-22-2013, 11:09 AM
bowtie2

Hi I am using the latest builds of bowtie 1 and 2 with 64 bit support..

But they are still dying with the error:
Reading reference sizes
Error: Reference sequence has more than 2^32-1 characters! Please divide the
reference into batches or chunks of about 3.6 billion characters or less each

64-bit
Built on do-dmxp-mac.win.ad.jhu.edu
Tue Feb 26 13:33:50 EST 2013
Compiler: gcc version 4.1.2 20080704 (Red Hat 4.1.2-54)
Options: -O3 -m64 -msse2 -funroll-loops -g3
Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}
Leave a comment:
dpryan replied

06-06-2013, 06:05 AM
Originally posted by sdm View Post

thanks for your quick reply ! Apparently, can't use bowtie1 then for paired-end alignment.

If this sort of thing is common in your library, then it'd be best not to (unless you use the trim_galore trick that I mentioned). I expect for most libraries this is a rare enough occurrence that bowtie1 works fine (I don't actually use bowtie1 these days, so I can't say I've checked).
Leave a comment:
sdm replied

06-06-2013, 05:53 AM
Originally posted by dpryan View Post

You didn't miss anything, bowtie1 doesn't deal well with those (or whenever the start/end coordinates of one reads are found completely within another). I believe that functions normally in bowtie2. Alternatively, trim_galore has an option to get around this in your read-trimming step.

thanks for your quick reply ! Apparently, can't use bowtie1 then for paired-end alignment.
Leave a comment:
dpryan replied

06-06-2013, 05:30 AM
Originally posted by sdm View Post

Hi all,

I have used bowtie in paired-end mode. When I checked the results I don't understand the following result:

these are 2 mates, as far as I understand two identical sequences (forward and reverse), which map to chromosome 6, if I map them individually (bowtie -m1 -v2)

1.sam:MISEQ:2:000000000-A26AB:1:1101:17175:1762 0 chr6 72678938 255 77M * 0 0 GGACAATTAAAAAGCAACAACCACAATTAATACGGTTTACACAGGCAAAACTCATTAAGTGTGGGTTGGGGCGCTCT DDDDB9BFFFFF??C;ECFFFEHFFEEGHFGHGFHHHHHFFFGFCCACFHFHHHHHG-ECFBEEECE>>*5+CCHHH XA:i:1MD:Z:76C0 NM:i:1

2.sam:MISEQ:2:000000000-A26AB:1:1101:17175:1762 16 chr6 72678938 255 77M * 0 0 GGACAATTAAAAAGCAACAACCACAATTAATACGGTTTACACAGGCAAAACTCATTAAGTGTGGGTTGGGGCGCTCT CAC-C5-,FFFCAA,C>+A5+-5-AA--CA+C7A9...A-A.EEA,FEAA../A..CC@+@+@@=<+@@==+<,,5, XA:i:1MD:Z:76C0 NM:i:1

however if I run
bowtie-0.12.7/bowtie --phred33-quals -X 2000 --fr --chunkmbs 300 -p 4 -a -v 2 --sam -q -1 1.fq -2 2.fq > paired.sam

the sequence pair is said to be either unmapped or with an insert size of -1109, it should be 0 in this case?

paired.sam:MISEQ:2:000000000-A26AB:1:1101:17175:1762 1:N:0: 77 * 0 0 * * 0 0 GGACAATTAAAAAGCAACAACCACAATTAATACGGTTTACACAGGCAAAACTCATTAAGTGTGGGTTGGGGCGCTCT DDDDB9BFFFFF??C;ECFFFEHFFEEGHFGHGFHHHHHFFFGFCCACFHFHHHHHG-ECFBEEECE>>*5+CCHHH XM:i:0
raw.paired.sam:MISEQ:2:000000000-A26AB:1:1101:17175:1762 147 chr6 72678938 255 77M = 72677906 -1109 GGACAATTAAAAAGCAACAACCACAATTAATACGGTTTACACAGGCAAAACTCATTAAGTGTGGGTTGGGGCGCTCT CAC-C5-,FFFCAA,C>+A5+-5-AA--CA+C7A9...A-A.EEA,FEAA../A..CC@+@+@@=<+@@==+<,,5, XA:i:1 MD:Z:76C0 NM:i:1

If anybody has an idea what I have missed, I would be very grateful.

You didn't miss anything, bowtie1 doesn't deal well with those (or whenever the start/end coordinates of one reads are found completely within another). I believe that functions normally in bowtie2. Alternatively, trim_galore has an option to get around this in your read-trimming step.
Leave a comment:
sdm replied

06-06-2013, 05:18 AM
Hi all,

I have used bowtie in paired-end mode. When I checked the results I don't understand the following result:

these are 2 mates, as far as I understand two identical sequences (forward and reverse), which map to chromosome 6, if I map them individually (bowtie -m1 -v2)

1.sam:MISEQ:2:000000000-A26AB:1:1101:17175:1762 0 chr6 72678938 255 77M * 0 0 GGACAATTAAAAAGCAACAACCACAATTAATACGGTTTACACAGGCAAAACTCATTAAGTGTGGGTTGGGGCGCTCT DDDDB9BFFFFF??C;ECFFFEHFFEEGHFGHGFHHHHHFFFGFCCACFHFHHHHHG-ECFBEEECE>>*5+CCHHH XA:i:1MD:Z:76C0 NM:i:1

2.sam:MISEQ:2:000000000-A26AB:1:1101:17175:1762 16 chr6 72678938 255 77M * 0 0 GGACAATTAAAAAGCAACAACCACAATTAATACGGTTTACACAGGCAAAACTCATTAAGTGTGGGTTGGGGCGCTCT CAC-C5-,FFFCAA,C>+A5+-5-AA--CA+C7A9...A-A.EEA,FEAA../A..CC@+@+@@=<+@@==+<,,5, XA:i:1MD:Z:76C0 NM:i:1

however if I run
bowtie-0.12.7/bowtie --phred33-quals -X 2000 --fr --chunkmbs 300 -p 4 -a -v 2 --sam -q -1 1.fq -2 2.fq > paired.sam

the sequence pair is said to be either unmapped or with an insert size of -1109, it should be 0 in this case?

paired.sam:MISEQ:2:000000000-A26AB:1:1101:17175:1762 1:N:0: 77 * 0 0 * * 0 0 GGACAATTAAAAAGCAACAACCACAATTAATACGGTTTACACAGGCAAAACTCATTAAGTGTGGGTTGGGGCGCTCT DDDDB9BFFFFF??C;ECFFFEHFFEEGHFGHGFHHHHHFFFGFCCACFHFHHHHHG-ECFBEEECE>>*5+CCHHH XM:i:0
raw.paired.sam:MISEQ:2:000000000-A26AB:1:1101:17175:1762 147 chr6 72678938 255 77M = 72677906 -1109 GGACAATTAAAAAGCAACAACCACAATTAATACGGTTTACACAGGCAAAACTCATTAAGTGTGGGTTGGGGCGCTCT CAC-C5-,FFFCAA,C>+A5+-5-AA--CA+C7A9...A-A.EEA,FEAA../A..CC@+@+@@=<+@@==+<,,5, XA:i:1 MD:Z:76C0 NM:i:1

If anybody has an idea what I have missed, I would be very grateful.
Leave a comment:
dpryan replied

05-28-2013, 03:01 AM
Originally posted by sinclaircooper View Post

Hi thanks for the advice, the problem that I'm working on isn't acutally a bisuphate treated sample but now that you mention it I think the matrix I'm trying to use is fairly similar. However I need the matrix to allow TC/GA pairing on one 'direction' (i.e. Database to query) but not in the other: a t in the DB sequence can align to either a T or a c in the query...Is this tha same as a bisulphate alignemnt?

Thanks

Ah, treating the data as if it were bisulfite might not be the best approach, then. There, one typically in silico converts (for example all C's become T's) both the genome (database) and the read (query) prior to alignment to avoid biased alignments. Depending on what you're really trying to do (giving a bit more detail could prove helpful), this might be the more correct way to go, depending upon the exact underlying nature of your problem.

Should that not prove to be the best option, presumably bowtie (or any other aligner) could be modified. I'm not particularly familiar with its internals, so I couldn't point you toward the right place in the code to start making changes.
Leave a comment:
sinclaircooper replied

05-28-2013, 02:30 AM
Hi thanks for the advice, the problem that I'm working on isn't acutally a bisuphate treated sample but now that you mention it I think the matrix I'm trying to use is fairly similar. However I need the matrix to allow TC/GA pairing on one 'direction' (i.e. Database to query) but not in the other: a t in the DB sequence can align to either a T or a c in the query...Is this tha same as a bisulphate alignemnt?

Thanks
Leave a comment:
dpryan replied

05-06-2013, 05:21 AM
Originally posted by sinclaircooper View Post

Hi all, do any of you know if it is possible to change the matrix which bowtie2 uses for local alignment? If I actually have to alter the source code which part should I be looking at?

I'm trying to use a nucleotide identity matrix that counts T-C and G-A as being the same as T-T and G-G matches.

It sounds like you need to align bisulfite converted reads. If so, you can use bismark, which is a front-end for bowtie.

If you have access to a computer cluster and are comfortable compiling source code, I can also send you a program that I wrote that is similar to bismark, but 5-10x faster (just send me a message with your email address). I hope to post the bismark replacement that I wrote this week.
Leave a comment:
sinclaircooper replied

05-06-2013, 05:18 AM
Hi all, do any of you know if it is possible to change the matrix which bowtie2 uses for local alignment? If I actually have to alter the source code which part should I be looking at?

I'm trying to use a nucleotide identity matrix that counts T-C and G-A as being the same as T-T and G-G matches.
Leave a comment:
kumarS_27 replied

05-04-2013, 06:53 AM
Originally posted by kumarS_27 View Post

Hi,

I checked the memory consumption by bowtie-align in the wrapping up stage and it was consuming CPU% 67 VIRT 173MB and RES 52MB, quite alot I would say...but this is with fastq format which was finished successfully. But when I used the fasta, it didnt even appeared in the terminal..and gave me the error.

Anyways, I try with chopping the long length contigs in to smaller one and then see..if it works, it will be clear that Bowtie does have an upper limit on the read length.

I shortened the fasta contigs to 500-1000 and it worked well. I did not try to find a threshold as what upper limit contig length is accepted by Bowtie.

Thanks for suggestions.
Leave a comment:
kumarS_27 replied

05-03-2013, 02:09 AM
Originally posted by mastal View Post

The amount of memory would be plenty if you were mapping short reads to a 3 Mb genome, but with the very long contigs, I don't know.

Can you monitor how much memory your PC is using before it produces the error?

I know Bowtie2 is supposed to not have an upper limit for length of reads, but you might be better off using blast to map the contigs back to the reference genome.

Hi,

I checked the memory consumption by bowtie-align in the wrapping up stage and it was consuming CPU% 67 VIRT 173MB and RES 52MB, quite alot I would say...but this is with fastq format which was finished successfully. But when I used the fasta, it didnt even appeared in the terminal..and gave me the error.

Anyways, I try with chopping the long length contigs in to smaller one and then see..if it works, it will be clear that Bowtie does have an upper limit on the read length.
Leave a comment:
mastal replied

05-02-2013, 11:01 AM
Bowtie, an ultrafast, memory-efficient, open source short read aligner

Originally posted by M4love View Post

Hey is there a small tutorial or a book which can teach me bowtie in general. I have read the tutorial which comes with the bowtie software. but that did not teach me the beginner things.
I would really appreciate if there is a beginners guide or something. Could you please help me out. Thanks a lot.

Have you tried working through the examples in the Getting Started section of the Bowtie website?

Bowtie: Tutorial

http://bowtie-bio.sourceforge.net/tutorial.shtml
Leave a comment:
M4love replied

05-02-2013, 08:12 AM
Originally posted by dpryan View Post

In the terminal, bowtie has nothing to do with R.

Hey is there a small tutorial or a book which can teach me bowtie in general. I have read the tutorial which comes with the bowtie software. but that did not teach me the beginner things.
I would really appreciate if there is a beginners guide or something. Could you please help me out. Thanks a lot.
Leave a comment:

Previous 1 2 3 4 5 12 34 template Next

Latest Developments in Precision Medicine

by seqadmin

Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

Somatic Genomics
“We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
- Channel: Articles
05-24-2024, 01:16 PM
Recent Advances in Sequencing Analysis Tools

by seqadmin

The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
- Channel: Articles
05-06-2024, 07:48 AM

Topics	Statistics	Last Post
SIX2 Protein Identified as a Key Player in Prostate Cancer Treatment Resistance by seqadmin Started by seqadmin, Yesterday, 06:55 AM	0 responses 12 views 0 likes	Last Post by seqadmin Yesterday, 06:55 AM
Genetic Mosaicism More Prevalent Than Previously Thought by seqadmin Started by seqadmin, 05-30-2024, 03:16 PM	0 responses 24 views 0 likes	Last Post by seqadmin 05-30-2024, 03:16 PM
Comprehensive Sequencing of Great Ape Sex Chromosomes Yields Insights into Evolution and Genetic Variability by seqadmin Started by seqadmin, 05-29-2024, 01:32 PM	0 responses 28 views 0 likes	Last Post by seqadmin 05-29-2024, 01:32 PM
New Toolkit Enhances Plant Mitochondrial Genome Research by seqadmin Started by seqadmin, 05-24-2024, 07:15 AM	0 responses 215 views 0 likes	Last Post by seqadmin 05-24-2024, 07:15 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News