Bowtie, an ultrafast, memory-efficient, open source short read aligner

Ben Langmead replied

02-16-2010, 06:17 PM
Originally posted by bloomfi1 View Post

Hi,

I just updated bowtie from version 0.11.3 to 0.12.2. With version 0.11.3 I was able to run the command "bowtie -m 25 -a -n 15 --un <file> -p 4 <ebwt> <infile> <outfile>". When I run this command in version 0.12.2, I get error "-n/--seedmms arg must be at least 0 and at most 3". Am I missing something in the change log about this parameter? Is the behavior of -n in version 0.11.3 accurate?

Thank you.

EDIT: I just realized that while version 0.11.3 will let me give -n greater than 3, it is still capped at -n 3. Is it possible to align with more than 3 mismatches? I am using bowtie to align 75bp reads to a genomic model (coding regions only) with the ultimate goal of calculating RPKM for each of the models. Is bowtie simply the wrong tool for this purpose?

Hi,

Yes, the problem was that versions < 0.12.2 were failing to check for a too-high input for -n and -v. The manual and the usage message both said max=3, but bowtie erroneously didn't enforce it.

Note that the -n option only constrains the number of mismatches in the seed, not in the entire alignment. The key is to set -n, -l and -e to reasonable numbers given your data. Since your reads are 75bp, I would suggest trying a few different settings, perhaps starting with -l 28 (the default) -n 2 and -e 180 and then adjusting all 3 until your getting your desired mix of speed and sensitivity.

Thanks,
Ben
Leave a comment:
lcollado replied

02-16-2010, 05:57 PM
I guess that you should trim your data and try to align your sequences again. Also, I don't think that "bowtie figures it out", though I'm no expert.
Leave a comment:
thinkRNA replied

02-16-2010, 11:23 AM
if you are using reads of length 75, would you change the seed length or bowtie figures that out?

I can only align around 50% of my single read Illumina data from this paper using bowtie default setting : http://www.nature.com/nmeth/journal/...meth.1226.html

Anyone knows what parameters to tweak to get more sequences aligned?
Leave a comment:
bloomfi1 replied

02-16-2010, 09:38 AM
Both -v and -n have a maximum size of 3. What is the reason for this restriction?
Leave a comment:
Xi Wang replied

02-15-2010, 08:12 PM
-v <int> report end-to-end hits w/ <=v mismatches; ignore qualities
or
-n/--seedmms <int> max mismatches in seed (can be 0-3, default: -n 2)
-e/--maqerr <int> max sum of mismatch quals across alignment for -n (def: 70)
-l/--seedlen <int> seed length for -n (default: 28)

-v for end-to-end mismatches
-n only for mismatches in the seed region, and you can specify the seed length by '-l'
Leave a comment:
bloomfi1 replied

02-15-2010, 05:01 PM
Hi,

I just updated bowtie from version 0.11.3 to 0.12.2. With version 0.11.3 I was able to run the command "bowtie -m 25 -a -n 15 --un <file> -p 4 <ebwt> <infile> <outfile>". When I run this command in version 0.12.2, I get error "-n/--seedmms arg must be at least 0 and at most 3". Am I missing something in the change log about this parameter? Is the behavior of -n in version 0.11.3 accurate?

Thank you.

EDIT: I just realized that while version 0.11.3 will let me give -n greater than 3, it is still capped at -n 3. Is it possible to align with more than 3 mismatches? I am using bowtie to align 75bp reads to a genomic model (coding regions only) with the ultimate goal of calculating RPKM for each of the models. Is bowtie simply the wrong tool for this purpose?

Last edited by bloomfi1; 02-15-2010, 05:34 PM.
Leave a comment:
Chipper replied

02-05-2010, 12:51 AM
Originally posted by jlmlj View Post

The result is as below:
Reads uniquely aligned was 45~%,
Reads multiple aligned was ~6%,
Read failed to align was ~49%.

51% aligned is not too bad, but yo could try also without the -v parameter to allow more mismatches in the 3' end.
Leave a comment:
Xi Wang replied

02-04-2010, 11:24 PM
Originally posted by jlmlj View Post

Thanks a lot for the reply, Xi. I checked the human genome that I used, it was un-masked version, and I did not mask repeat regions when mapping, so it may not be the case...

I meant here also the 'N's existing in the human reference genome. Our group have observed many cases where lots of reads packed at the neighbor of 'N' regions.
Hope this helps.
Leave a comment:
jlmlj replied

02-04-2010, 08:30 PM
"Maybe the rate is among the normal. If the ChIP-seq reads are from the repeat regions, and you masked the repeat regions when mapping, these reads will fail to map. And there are still quite a few 'N's in the human reference genome.[/QUOTE]"

Thanks a lot for the reply, Xi. I checked the human genome that I used, it was un-masked version, and I did not mask repeat regions when mapping, so it may not be the case...

I am thinking to try a couple of parameters, such as --strata, however it looks a bit tricky and I am not sure of the way to handle it yet
Leave a comment:
Xi Wang replied

01-29-2010, 10:30 PM
There are two questions bother me:
1. I have 76bp read length, however bowtie only allows me 3 mismatches at maximum, which I think it is too stringent. Do you think the bowtie will allow more mismatches (7-8) for 76bp or even longer 100bp read length?

There is another parameter set of bowtie to deal with the mismaches when mapping reads back to the reference genome: -n -e -l

2. I have ~45-49% reads failed to align (no repeats included) to human reference genome, which is very high. I thought the rate is too high to accept. Do you have any idea of how it happens?

Maybe the rate is among the normal. If the ChIP-seq reads are from the repeat regions, and you masked the repeat regions when mapping, these reads will fail to map. And there are still quite a few 'N's in the human reference genome.
Leave a comment:
jlmlj replied

01-29-2010, 12:34 PM
Hi Dr. lengmead,

I am doing data analysis for ChIP-seq experiments on transcription factor binding sites. I have 5 million raw reads (76 bp read length) per sample from Illumina platform. I used bowite 0.11.3 to align these reads to reference human genome.

The code I used for one high quality alignment was:
~/120809_ChiPseq/bowtie-0.11.3_linux_x86_64/bowtie --solexa1.3-quals -v 2 -a -m 1 -t -p 30 --un result_chipseq2/index2.hq.un --max result_chipseq2/index2.hq.max indexes_chipseq1/h_sapiens_asm reads/index2.fq > result_chipseq2/index2.hq.bt

The result is as below:
Reads uniquely aligned was 45~%,
Reads multiple aligned was ~6%,
Read failed to align was ~49%.

Then I increased mismatches to 3 (-v 3) and trimmed the low quality end (--trim3 22). However I still had ~45% reads failed to align.

There are two questions bother me:
1. I have 76bp read length, however bowtie only allows me 3 mismatches at maximum, which I think it is too stringent. Do you think the bowtie will allow more mismatches (7-8) for 76bp or even longer 100bp read length?

2. I have ~45-49% reads failed to align (no repeats included) to human reference genome, which is very high. I thought the rate is too high to accept. Do you have any idea of how it happens?

Many thanks for your help,
jlmlj
Leave a comment:
Ben Langmead replied

01-28-2010, 11:42 AM
I'm working on this now. I don't have any time estimates.

Thanks,
Ben
Leave a comment:
amaer replied

01-28-2010, 11:04 AM
Originally posted by Ben Langmead View Post

Hi amaer,

Perhaps by end-of-year. It's very hard to say because most of my time goes to collaborators, and they don't have predictable schedules . But by end-of-year is a reasonable guess.

Thanks,
Ben

Hi Ben,

What's the status of doing gapped alignments? Do you have an estimated date?

thanks, and keep up the great work!
Leave a comment:
malcook replied

01-22-2010, 10:35 AM
bowtie: should I mask the pseudoautosomal segments of human genome

What do you think of my plan to mask the pseudoautosomal segments of human Y chromosome prior to running bowtie on an RNASeq project.

Since pseudoautosomal portion of human genome chromosomes X & Y are sequence-wise identical, any alignment strategy that utilizes only unique alignments will discard all alignments to these regions, as each aligning read will have two matches. Thus the 24 known genes wont be counted.

I plan to use EMBOSS' `maskseq` to "hard mask" (replace with 'N') chrY prior to building the bowtie indices at:

chrY:10001-2649520
chrY:59034050-59363566

Does anyone see a problem with this approach?

I see the `--ntoa` option of the bowtie manual that explicitly states that "By default, Ns are simply excluded from the index and bowtie will not report alignments that overlap them." Does anyone know if the same is true for Xs?

Finally, do you agree that the ability to direct bowtie-build to ignore portions of <reference_in> would be a sensible feature to request?

Thanks for thinking!

Malcolm Cook
Stowers Institute for Medical Research
Leave a comment:
bekkari replied

01-21-2010, 01:45 PM
Hi Ben,
Can some one pleast let me know whether bowtie works with longer inserts (~20kb) between mate pairs?

Thanks
Leave a comment:

Previous 1 5 12 13 14 15 16 17 18 25 34 template Next

Recent Advances in Sequencing Analysis Tools

by seqadmin

The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
- Channel: Articles
05-06-2024, 07:48 AM
Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM

Topics	Statistics	Last Post
A Closer Look at the Enigmatic Genomes of Oikopleura dioica by seqadmin Started by seqadmin, Yesterday, 06:35 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 06:35 AM
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, 05-09-2024, 02:46 PM	0 responses 21 views 0 likes	Last Post by seqadmin 05-09-2024, 02:46 PM
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 18 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, 05-06-2024, 07:17 AM	0 responses 19 views 0 likes	Last Post by seqadmin 05-06-2024, 07:17 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News