Bowtie, an ultrafast, memory-efficient, open source short read aligner

What_Da_Seq replied

03-11-2009, 07:57 AM
How does Bowtie handle ambiguous bases in the refgenome

Does anybody have experience in preparing a Bowtie search index where certain bases have been modified with ambiguous bases like "Y" which stands for "C" or "T" and if so will these locations be called matches or missmatches if the to be aligned Solexa read has either a "C" or a "T" at that position.

Thanks
Leave a comment:
Ben Langmead replied

03-11-2009, 06:02 AM
My worry is that I will lose alignments which have perfect alignments if I also have other near matches. I guess I can make -n smaller (1 say). But I suppose the caveat is that if there is an exact match and a near match, you cannot say which is the correct one - sequencing error etc... - and so it is conservative to reject the read as having multiple alignments?

I see your concern. In your case, you may want to consider running Bowtie with -a --nostrata (or -k <some int> --nostrata) and then postprocessing the results in whatever way you think is appropriate for your application. If you'd like to reject reads on the basis of the number of alignments found in the *best* match stratum (as opposed to all strata), you can do that with a script.

Another alternative is to do multiple Bowtie runs with decreasingly stringent alignment policies (e.g. -n 0, then -n 1, etc). The input to each run might the the --unfq reads from the run before.

I guess what I really want to know is: how stringent are the default alignment settings? Can I make them more stringent without losing a lot of 'true' (but imperfect) alignments?

The default alignment policy is -n 2 -l 28 -e 70, which mimics Maq's defaults (with the caveat that Maq actually lets through some alignments with 3 mismatches in the seed). Whether you can make the policy more stringent without losing true alignments depends on how different your query organism is from the reference. Intuitively, the default policy has no problem finding alignments where there are 2 SNPs very close together, but might have a problem finding alignments where there are 3 SNPs very close together. The same goes for -n 1 and 1 SNP vs. 2 SNPs. It's up to you to determine how well those policies fit your problem.

Hope that helps,
Ben
Leave a comment:
ieuanclay replied

03-11-2009, 05:33 AM
Hi Ben,

Thanks for your help so far - I am relatively new to mapping (but not so new that I am not impressed by bowtie!), so please excuse any dopey questions...

My worry is that I will lose alignments which have perfect alignments if I also have other near matches. I guess I can make -n smaller (1 say). But I suppose the caveat is that if there is an exact match and a near match, you cannot say which is the correct one - sequencing error etc... - and so it is conservative to reject the read as having multiple alignments?

What I work on it is really important to be very sure about where the reads map... so maybe it would be good to keep -n at 1 and be more confident about the reads? I don't want to have to refer back to alignment confidences in analyses later on, but say that beyond a certain confidence threshold I am happy with them all. If I am going to reduce -n, should I also reduce -l to 20 or 25?

I guess what I really want to know is: how stringent are the default alignment settings? Can I make them more stringent without losing a lot of 'true' (but imperfect) alignments?

Thanks again,

Ieuan
Leave a comment:
Ben Langmead replied

03-10-2009, 11:01 AM
so if a read has multiple valid alignments, one of which is better than the others (fewer mismatches, though the others are still valid), and i specify -k 1 -m 1, will the best alignment be given, or will it be pumped into --maxfa ?

In that situation, no alignments will be printed and the read will go into the --maxfa/--maxfq file(s).

If I am worried about this sort of situation, should i specify --best?

That won't help in this case because --best doesn't change which alignments are considered valid; rather, it changes which valid alignments are reported by Bowtie. The -v/-n/-l/-e options are the only ones that change which alignments are considered valid by Bowtie. If the set of valid alignments happens to be stratified (e.g., there's an exact hit and a bunch of 1-mismatch hits), the existence of the better alignments doesn't invalidate the worse ones.

If this poses a problem, I'd be interested to hear more about what you're looking for...

Thanks,
Ben
Leave a comment:
ieuanclay replied

03-10-2009, 10:48 AM
Sorry to keep on about this, I just want to get it clear.

By default:
-k is 1, so only one (the best according -n-v-l-e) alignment is reported.
-m is unlimited

so if a read has multiple valid alignments, one of which is better than the others (fewer mismatches, though the others are still valid), and i specify -k 1 -m 1, will the best alignment be given, or will it be pumped into --maxfa ?

If I am worried about this sort of situation, should i specify --best?
Leave a comment:
ieuanclay replied

03-10-2009, 08:35 AM
Yes - great thank you!

Ieuan
Leave a comment:
Ben Langmead replied

03-10-2009, 08:31 AM
Yes - if you specify -k > 1 or -a, Bowtie will output the appropriate number of hits per read for reads with >=1 hit. If -m <int> is also specified, Bowtie will output no alignments for reads with > <int> alignments and, if --maxfa/--maxfq is specified, will dump those reads (the reads, not the alignments) to the specified file. For reads with <= <int> alignments, Bowtie behaves the same as if -m were not specified.

I hope that helps.
Leave a comment:
ieuanclay replied

03-10-2009, 07:35 AM
Doh... Your patience obviously exceeds mine...

I was confused by the default for -m being unlimited - does this mean that without --maxfa being set, your mapped output will include sequences with mulitple hits?

Last edited by ieuanclay; 03-10-2009, 08:02 AM.
Leave a comment:
Ben Langmead replied

03-10-2009, 07:15 AM
Thanks!

Have you checked out the -m and --maxfa/--maxfq options?
Leave a comment:
ieuanclay replied

03-10-2009, 06:22 AM
Hi Ben, Just to say bowtie is great work. Far outstrips any pipeline we have used previously!
One question though - is there a way to output reads with multiple hits to a separate file? We work on repetitive regions and with a little massaging, this data may still be useful to us.
Leave a comment:
danielsbrewer replied

03-05-2009, 02:49 AM
Is Bowtie suitable for miRNA detection

I am just playing around with bowtie along with other software (maq,novoalign) and was wondering whether bowtie is suitable for use with an miRNA detection experiment. In a previous post Ben states that:

Originally posted by Ben Langmead View Post

First, let me reemphasize that I think of Bowtie's target application as mammalian resequencing - that's how I characterize it in the manual and that's what we spend our time trying to optimize it for.

That hints to me that the default options might not be the best for experiments to compare miRNAs between samples. Does anyone have an opinion as to what the best options to use are?

I would think that you want to know all the alignments for each read above a certain quality threshold. At the moment I am thinking of using "--best -k 100", as if there is more that 100 hits then it is probably not a "real" alignement.

Any thoughts?
Leave a comment:
Ben Langmead replied

03-03-2009, 08:28 AM
That sounds like an issue with how the read file is formatted. Can you share that file with me, e.g. via email (langmead at umd dot edu)? I can take a quick look.
Leave a comment:

danielsbrewer replied

03-03-2009, 07:31 AM

bowtie error

I am just starting out with bowtie and I am getting the following error:

Code:

$ ./bowtie -p 4  -t h_sapiens ../GDB1.fastq GDB1.map
Time loading forward index: 00:00:01
Time loading mirror index: 00:00:02
Error: Read (Error: Read (Error: Read (Error: Read (HHHHWWWWIIII----EEEEAAAASSSS222266669999BBBB::::5555::::1464::::711344088362:84:1::1818164431573) is less than 2 characters long7) is less than 2 characters long) is less than 2 characters long) is less than 2 characters long

Has anyone seen anything similar or know what its actually saying. I am pretty sure that the smallest read size is something like 10.

Leave a comment:

doxologist replied

03-02-2009, 08:30 AM
Originally posted by danielsbrewer View Post

Does anyone know whether bowtie supports aligning multiple read lengths?

I am doing small RNA Solexa sequencing and so after the adapter has been removed I end up with variable length reads. With MAQ it appears that you have to run it multiple times for the different lengths, is bowtie the same?

we had a similar discussion in another thread: http://seqanswers.com/forums/showthread.php?p=3505

Basically, ELAND doesn't allow for different lengths and Bowtie does.
Leave a comment:
ewingad replied

03-02-2009, 08:07 AM
Originally posted by danielsbrewer View Post

Does anyone know whether bowtie supports aligning multiple read lengths?

I am doing small RNA Solexa sequencing and so after the adapter has been removed I end up with variable length reads. With MAQ it appears that you have to run it multiple times for the different lengths, is bowtie the same?

Yes it does!

-Adam
Leave a comment:

Previous 1 20 27 28 29 30 31 32 33 34 template Next

Recent Advances in Sequencing Analysis Tools

by seqadmin

The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
- Channel: Articles
05-06-2024, 07:48 AM
Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM

Topics	Statistics	Last Post
A Closer Look at the Enigmatic Genomes of Oikopleura dioica by seqadmin Started by seqadmin, Yesterday, 06:35 AM	0 responses 14 views 0 likes	Last Post by seqadmin Yesterday, 06:35 AM
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, 05-09-2024, 02:46 PM	0 responses 18 views 0 likes	Last Post by seqadmin 05-09-2024, 02:46 PM
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 17 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, 05-06-2024, 07:17 AM	0 responses 19 views 0 likes	Last Post by seqadmin 05-06-2024, 07:17 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News