Bowtie, an ultrafast, memory-efficient, open source short read aligner

Xi Wang replied

11-02-2009, 07:24 PM
Originally posted by Ben Langmead View Post

Hi Xi,

to consider qualities, use -n/-l/-e.

Thanks, Ben.
I am still wondering whether the seed region is defined only for counting the mismatches or not. If I want to just use the quality score criterion, and set -l equal to 0, does it work?

Best wishes,
Xi
Leave a comment:
Layla replied

11-02-2009, 09:16 AM
comparable parameters with maq

Hi Ben,

Excellent work with Bowtie - looking forward to cutting down data processing time. Working on a project in which I have used maq, but for subsequent paired end medip-seq of 45 bases I want to use Bowtie and parameters as close to maq as possible.

Using maq I eliminate reads with a maq quality < 10 (the same read mapped to >1 location and hence ambiguous) and output to another file.
I also keep only those flags 18 and 130 (correctly paired reads).
Using ad-hoc script I only keep one hit if the same read is mapped to the same start and stop location multiple times (pcr bias)

I'd like to create the same criteria using bowtie. Could you advise me? To begin with, the default in bowtie is good - 2MM in 28 base seed region with sum of e 70

thank you

Layla
Leave a comment:
Ben Langmead replied

11-02-2009, 05:45 AM
Originally posted by lindseyjane View Post

I cannot see an option for reporting an alignment for a read when its mate does not map? Is this possible?

Your best bet is to run Bowtie in paired-end mode while using --un to dump unaligned reads to files. Then run again in unpaired mode using the unaligned reads as input.

Let me know if that doesn't solve your problem.

Thanks,
Ben
Leave a comment:
Ben Langmead replied

11-02-2009, 05:43 AM
Hi Xi,

Originally posted by Xi Wang View Post

I noticed that there are two parameters related to this issue. First, -n/--seedmms <int> indicates the maximum mismatches in seed, meaning that if a hit with greater than the mismatch cutoff it will not be reported by Bowtie. And second, -e/--maqerr <int> indicates the maximum sum of quality scores allowed at the mismatched bases (is it right?). However, I don't know whether the two criteria are the same or complemental.

They're complementary. If either limit is exceeded, the alignment is invalid.

Originally posted by Xi Wang View Post

Further, the two measurements of mismatches are both counted in seed region. Even though the users can specify the seed length, I am wondering where does the seed locate: from the leftmost of a query (read) or a random region in the query.

From the leftmost end of the read. -e applies to the entire alignment, not just the seed, exactly as in Maq.

Originally posted by Xi Wang View Post

Besides, there is another parameter -v <int>, which takes care the end-to-end mismatches, but does not consider the quality scores. Is it possible to make this consider the quality scores?

No; to consider qualities, use -n/-l/-e.

Thanks,
Ben
Leave a comment:
lindseyjane replied

11-02-2009, 02:01 AM
Question regarding bwt paired end alignment

I am currently trying to aligned paired end Illumina reads using bowtie and I want to compare the results to those from maq.

I cannot see an option for reporting an alignment for a read when its mate does not map? Is this possible?

The maq software still reports alignments for a read even if its mate does not map and I wanted to do the same thing with bowtie. A lot of pairs end up unaligned (significantly more than with maq) if this is not possible.

If any one knows hows to do this I would really appreciate it, thanks.
Leave a comment:
Xi Wang replied

10-29-2009, 11:48 PM
Hi Ben,

I am confused how Bowtie deals with the quality scores when counting mismatches.

I noticed that there are two parameters related to this issue. First, -n/--seedmms <int> indicates the maximum mismatches in seed, meaning that if a hit with greater than the mismatch cutoff it will not be reported by Bowtie. And second, -e/--maqerr <int> indicates the maximum sum of quality scores allowed at the mismatched bases (is it right?). However, I don't know whether the two criteria are the same or complemental.

Further, the two measurements of mismatches are both counted in seed region. Even though the users can specify the seed length, I am wondering where does the seed locate: from the leftmost of a query (read) or a random region in the query.

Besides, there is another parameter -v <int>, which takes care the end-to-end mismatches, but does not consider the quality scores. Is it possible to make this consider the quality scores?

Best regards!
Xi
Leave a comment:
axiom7 replied

10-27-2009, 08:24 AM
Originally posted by axiom7 View Post

Hi Ben,

I just wanted to follow up on my original question regarding using -v n, where n>3 which I posted 10/16/09. I am satisfied with your solution. In fact I find that the results of using -v 3, vs, -n 1 -l 7 -e 90 yield pretty much identical results, and so I am comfortable using -3 210 to "simulate" -v 7.

Thanks.
Susan

Sorry, I meant using -e 210 to simulate - not -3 210.

Susan
Leave a comment:
Ben Langmead replied

10-27-2009, 06:10 AM
Originally posted by amaer View Post

What is the update on Bowtie doing gapped alignments?

Hi amaer,

Perhaps by end-of-year. It's very hard to say because most of my time goes to collaborators, and they don't have predictable schedules . But by end-of-year is a reasonable guess.

Thanks,
Ben
Leave a comment:
amaer replied

10-26-2009, 02:18 PM
Hi Ben,

What is the update on Bowtie doing gapped alignments?

Thanks!
Leave a comment:
Ben Langmead replied

10-23-2009, 08:55 AM
Hi Susan,

Originally posted by axiom7 View Post

I just wanted to follow up on my original question regarding using -v n, where n>3 which I posted 10/16/09. I am satisfied with your solution. In fact I find that the results of using -v 3, vs, -n 1 -l 7 -e 90 yield pretty much identical results, and so I am comfortable using -3 210 to "simulate" -v 7.

I'm glad! As I say, if the Pollinator can (eventually) be made to give you a longer stretch of unambiguous bases before the NNNNN gap, then you can bump -l up accordingly and performance should improve quite a bit.

Thanks,
Ben
Leave a comment:
Ben Langmead replied

10-23-2009, 08:52 AM
Originally posted by para_seq View Post

The mapped reads to each copy have approximately 3 to 1 ratio in + and - orientations.

Hi para_seq,

The bias you see may or may not be due to alignment. Bowtie does have options that seek to remove strand bias, e.g. the --best option. If you still see the bias using --best, then the bias is probably inherent in your reads.

Hope that helps,
Ben
Leave a comment:
axiom7 replied

10-23-2009, 07:18 AM
Hi Ben,

I just wanted to follow up on my original question regarding using -v n, where n>3 which I posted 10/16/09. I am satisfied with your solution. In fact I find that the results of using -v 3, vs, -n 1 -l 7 -e 90 yield pretty much identical results, and so I am comfortable using -3 210 to "simulate" -v 7.

Thanks.
Susan
Leave a comment:
para_seq replied

10-23-2009, 07:03 AM
Hi, Ben,

I got a question with using Bowtie to map Illumina transcriptom reads to a prokaryote genome. There are two copies of the identical gene encoded by '+' and '-' strands. What I don't understand is that both copies are able to be mapped with a large number of unique RNA_seq reads (value = 0 in column 7 of the bowie output) in both '+' and '-' orientations. The mapped reads to each copy have approximately 3 to 1 ratio in + and - orientations.

Anything I did was wrong?
Please help me to clarify my understanding. Thank you.
Leave a comment:
axiom7 replied

10-16-2009, 06:08 AM
Ben and sparks,

Thanks for all the input. I will be working on this today and will respond back to you.

Susan
Leave a comment:
Ben Langmead replied

10-15-2009, 05:21 PM
Hi Susan,

Originally posted by axiom7 View Post

I have some output from a Polonator. This is paired-end data with gaps. For instance, the raw data is 26 base pairs. The researcher asserts this to be 2x15mers with a gap of two nucleotides between base 7 and 8, and between 20 and 21. He also asserts that the spacing between the two 15mers is between 500 and 1500 bases. I used a perl script to insert "NN" in the two gaps, and to create two mated fasta files. Ran the following:

bowtie -t -p 8 -v 3 -m 100 -I 500 -X 1500 -f --ff -a -1 mate1.fa -2 mate2.fa

This seemed to run reasonably and I /think/ I am asking for alignments with 1 additional mismatch beyond the 2 gapped nucleotides.

Yes, I agree that this should work. And I agree that, because of the NNs, you are effectively asking for alignments with 1 additional mismatch.

Problem occurs in the second set of data. The researcher asserts the 26 base pairs to have a 6 nucleotide gap, but when I attempt to run the above bowtie command (after processing the raw data with my perl script) with "-v 7" I get an error message: "-v arg must be at most 3". Am I out of luck here? Am I asking bowtie to do something for which it is not designed?
Thank you.
Susan

The answer to whether you're asking bowtie to do something it was not designed to do is "yes" . But it is definitely still possible to use Bowtie. My suggestion would be to use, for instance -n 1 -l X -e Y, where -l X is set so that the "seed" falls just short of the string of Ns, and -e Y is set according to the number of Ns + the number of mismatches you would like to allow beyond the Ns. (Your input is fasta, so every mismatch incurs a quality penalty of 30. So for 6 Ns + 1 mismatch, -e 210 is appropriate.) Here is an example where I align a read of the format you describe to the human genome:

Code:

sycamore:~/research/bowtie $ cat tmp.fa >r CTTCGTGGGTATTNNNNNNGCGGAGCAGAGTT sycamore:~/research/bowtie $ ./bowtie --best -n 1 -l 13 -e 210 -f /fs/szasmg/langmead/ebwts/h_sapiens_asm tmp.fa r + gi|89161187|ref|NC_000010.9|NC_000010 135373946 CTTCGTGGGTATTNNNNNNGCGGAGCAGAGTT IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII 5:G>T,13:G>N,14:C>N,15:G>N,16:A>N,17:A>N,18:G>N # reads processed: 1 # reads with at least one reported alignment: 1 (100.00%) # reads that failed to align: 0 (0.00%) Reported 1 alignments to 1 output stream(s)

That set of parameters is designed to effectively allow 1 mismatch beyond the mismatches forced by the Ns, as you can see in the above alignment.

It's worth noting that if you can (eventually) get the Polonater to give you an anchor of, say, 20bp instead of 13bp, bowtie run in this mode will be substantially faster.

I hope that's helpful; if it's still unclear, please feel free to email me.

Thanks,
Ben
Leave a comment:

Previous 1 9 16 17 18 19 20 21 22 29 34 template Next

Recent Advances in Sequencing Analysis Tools

by seqadmin

The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
- Channel: Articles
05-06-2024, 07:48 AM
Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM

Topics	Statistics	Last Post
A Closer Look at the Enigmatic Genomes of Oikopleura dioica by seqadmin Started by seqadmin, 05-10-2024, 06:35 AM	0 responses 15 views 0 likes	Last Post by seqadmin 05-10-2024, 06:35 AM
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, 05-09-2024, 02:46 PM	0 responses 21 views 0 likes	Last Post by seqadmin 05-09-2024, 02:46 PM
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 18 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, 05-06-2024, 07:17 AM	0 responses 19 views 0 likes	Last Post by seqadmin 05-06-2024, 07:17 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News