Bowtie, an ultrafast, memory-efficient, open source short read aligner

mediator replied

12-14-2011, 04:43 PM
Hi All,
I am running bowtie on a pair end alignment (illumina hiseq). Here is the command and the output I got:

bowtie -p 4 -v 2 -k 11 -m 10 -t --best /bowtie/indexes/hg19 -1 /data/rna_seq/0916_1.fq -2 /data/rna_seq/0916_2.fq /data/rna_seq/0916.SAM -S

Time loading forward index: 00:00:08
Time loading mirror index: 00:00:08
Time loading reference: 00:00:03
End-to-end 2/3-mismatch full-index search: 04:48:55
# reads processed: 114497412
# reads with at least one reported alignment: 64037326 (55.93%)
# reads that failed to align: 50290127 (43.92%)
# reads with alignments suppressed due to -m: 169959 (0.15%)
Reported 94715801 paired-end alignments to 1 output stream(s)
Time searching: 04:49:14
Overall time: 04:49:14

Not sure why I have so many reads fail to align?
Leave a comment:
rahilsethi replied

10-12-2011, 09:15 AM
Re: Extra parameter(s) specified error

the reason why I did not give any value to -n and --maxbts because I am trying to use their default values. If I wouldn't mention -n then how would bowtie know whether I want to do mapping with -n or -v options? I will give it a try by giving numbers to all of them though, but I think it should not give any problem because I did not give value to -n and --maxbts
Leave a comment:
westerman replied

10-12-2011, 05:54 AM
There error messages makes me think that you are missing some parameter options. In particular '-n' should have a number after it; e.g., '-n 2' as should '--maxbts'.

What I think is happening in the successful line where you have '-n --maxbts' is that the 'n' parameter is reading in '--maxbts' as the number to use. Thus there is no problem.

Where as in the bad line you have '-n -l 20 --maxbts --chunkmbs' with the results that '-n' is swallowing (using) '-l' ... '20' is being skipped, '--maxbts' is swallowing '--chunkmbs' which then throws off the rest of the command line.

Anyway that is my guess. Please try your command either with numbers after '-n' and '--maxbts' or just get rid of those two parameters.
Leave a comment:
rahilsethi replied

10-12-2011, 05:20 AM
Extra parameter(s) specified error

I am running bowtie version 0.12.7 for mapping SOLiD (colorspace 50bp read length) data against human genome (hg19), on a linux platform (CentOS). When I run with the following parameters:

$bowtie -C -f -Q sample_QV.qual -a --best --strata -n -l 20 --maxbts --chunkmbs 1000 -t --al 50_mapped_reads.csfasta --sam -p 5 /bowtie-ref-build/hg19/hg19 sample.csfasta 50_mapping.sam

it gives me the following error

Extra parameter(s) specified: "sample.csfasta", "50_mapping.sam"

and when I was running with default seed-length(-l) value by not defining
-l 20 i.e.:

$bowtie -C -f -Q sample_QV.qual -a --best --strata -n --maxbts --chunkmbs 1000 -t --al 50_mapped_reads.csfasta --sam -p 5 /bowtie-ref-build/hg19/hg19 sample.csfasta 50_mapping.sam

it runs successfully, generating the number of reads mapped and unmapped
details on the screen.

How can I then run the program at different seed length when I run bowtie
since, as seen above, it does not run whenever I mention seed length
within permissible range (i.e. 20 > 5 for read length 50bp)?
Leave a comment:
oxydeepu replied

10-08-2011, 01:17 AM
Hi all,

I am running bowtie, i have this query that is there any way can we specify the mismatches to be at a particular end, say 3'...??
waiting for a reply
Thanking you

Deepak

Last edited by oxydeepu; 10-09-2011, 01:58 AM. Reason: did not get any reply
Leave a comment:
oxydeepu replied

10-08-2011, 01:15 AM
Hi all,

I am running bowtie, i have this query that can we specify the mismatches to be at a particular end, say 3'...??
waiting for a reply
Thanking you
Deepak
Leave a comment:
nemesis replied

10-03-2011, 12:53 AM
bowtie -e (--maqerr) parameter

Hi all,

According to the bowtie manual and some posts I've read here, the -e/--maqerr <int> option indicates the maximum sum of quality scores allowed at the mismatched bases throughout the entire alignment and as such can control the total number of mismatches over the entire read length.

I understand that the higher this option will be, the higher number of alignments I will obtain. But I still have trouble understanding the logic behind this parameter. Indeed let's say I set -e 70 with --nomaqround.
A read with an overall high quality (for ex. each of its base has a Phred score of 38) and 3 mismatched bases to the reference sequence will be excluded from the alignment, since (38 * 3) > 70. While another read with an overall poor quality (for instance, having a Phred score of 10 for each of its bases) and 5 mismatches will be kept, since (10 * 5) < 70. But if we suppose that bases with low quality have higher chance to be sequencing errors than true variations, I'd rather exclude the latter read and keep the former one... (No ?)

If anyone could help me understand this parameter and its usage I would be very grateful.

Cheers
Leave a comment:
belmax replied

09-29-2011, 11:09 PM
bowtie 0.12.7 & SOLiD PE reads

Hi all,
There is the problem for bowtie 0.12.7 & SOLiD mate pair reads.
bowtie (-C -f -I 1000 -X 4000 --ff <ebwt> -1 F3.csfasta -2 R3.csfasta ) maps 0.0%, while SOLiD`s Bioscope maps about 70%.
Insert size is about 2500.
Colorspace index is OK. Synthetic csfasta reads are mapped well by bowtie. Separately F3 or R3 are mapped well.
What is could be wrong? Is the problem of bowtie or mate pair reads?

cheers

Last edited by belmax; 09-30-2011, 12:50 AM.
Leave a comment:
phatjoe replied

09-21-2011, 08:45 PM
BOWTIE, shortreads with different length

Hi,

Just tried out BOWTIE today. May I know if BOWTIE supports the mapping for shortreads of different lengths? (e.g:for r1/#1 I have 96 bp whereas for the r1/#2, i have 86 bp.) The shortreads was trimmed with a different software prior to the alignment.

My bowtie version is 0.12.7

Thanks in advance!
Leave a comment:
[mic] replied

09-15-2011, 08:18 AM
Originally posted by Xi Wang View Post

If you are good at programming, you can check the source code of bow tie_build.

I still tried, but the code is very nested, which makes it difficult for me to get the all-over-picture. I would be grateful if someone can help me.

Last edited by [mic]; 09-19-2011, 05:55 AM.
Leave a comment:
Xi Wang replied

09-15-2011, 05:59 AM
Originally posted by [mic] View Post

Hi,

i try to analyse Bowtie for using GPGPUs through CUDA. Next to the limited Hardware ressources, I have one big problem. It seems that Bowtie relies on structs, using C++ datatypes (please correct me if I'm wrong), but i need C compatible datatypes to get them on the device memory (global memory of the graphic card) and also to work with.
On my walkthrough I noticed that the first bytes are used to store some extra information for the ebwt_params struct, but:

How do I get the BWT?
How is it stored? (I think either uint32 or uint64)
How do i "read" the nc values (0,1,2,3) from that?

Are there any additional information available how the files built? (Any files, slides,.. are welcome..)

The plan:
read the index file with my own code and store it into C compatible Datatypes, get them to the device and try to make an exact alignment on GPU.

Thank you
mic

If you are good at programming, you can check the source code of bow tie_build.
Leave a comment:
[mic] replied

09-15-2011, 02:19 AM
Additional Index information

Hi,

i try to analyse Bowtie for using GPGPUs through CUDA. Next to the limited Hardware ressources, I have one big problem. It seems that Bowtie relies on structs, using C++ datatypes (please correct me if I'm wrong), but i need C compatible datatypes to get them on the device memory (global memory of the graphic card) and also to work with.
On my walkthrough I noticed that the first bytes are used to store some extra information for the ebwt_params struct, but:

How do I get the BWT?
How is it stored? (I think either uint32 or uint64)
How do i "read" the nc values (0,1,2,3) from that?

Are there any additional information available how the files built? (Any files, slides,.. are welcome..)

The plan:
read the index file with my own code and store it into C compatible Datatypes, get them to the device and try to make an exact alignment on GPU.

Thank you
mic
Leave a comment:
vebaev replied

08-11-2011, 09:06 AM
Hi, again
as I told before I'm trying to map my cleaned reads to hg19

If I use -a -v 0 my output is like 2GB and I see that many seq with low read counts like 1 or 2 can align ten of thousands of time onto human genome?! and it is messy...

I can use the option -k 100 -v 0, but If I want to know how many times a seq is mapping in the genome how to be sure as I artifivially put a threshold?
As I want to annotate also repeat-assosiated and other RNAs how to do that and escape from the mess of the above?

or beter to discard these by -m 100?

Best

Last edited by vebaev; 08-11-2011, 09:45 AM.
Leave a comment:
vebaev replied

08-10-2011, 04:01 PM
hi cswarth
You are quite right!
My main concerns are for example in this case:
I want to annotate where in the genome are mapping 2 reads. If I do not allow mistmaches the first read will have 1 hit in intron and the second will not align to the genome at all. In the option with 1 mismatch the first read will map in the intron perfectly and in intergenic region with 1 mismatch, in other hand now the second read can map to the genome in one place as mismatching is allowed.
In the second scenario we are happy because the secong read can align, but then how to annotate the first read which hits are increased

If you followed me my point is that if I want to map more reads that cannot map with zero mismatches I will lose the "sensitivity" of my reads that are already mapped

I hope you got it

Last edited by vebaev; 08-10-2011, 04:06 PM.
Leave a comment:
cswarth replied

08-10-2011, 03:43 PM
I am new to this, but it seems to me that if you allow mismatches, you absolutely can get alignments that aren't real. You can also get alignments that aren't real if you don't allow mismatches!

There are several sources of false-positives and false-negative alignments. The reference sequence you are aligning to is the consensus from probably many replicates of a particular lineage of organism. Your experimental sequences may come from a slightly different lineage of organism with a slightly different genome. If you do not allow mismatches, you will miss valid alignments that differ only by an expected polymorphic site.

There are also several sources of error in the sequencing itself. If you're using an illumina machine, there are at least four sources of error that may mis-call a base in the sequence. If you don't allow mismatches, those reads that have an error in sequencing might not align to your genome at all.

On the other hand, if you allow mismatches, your reads may align to several places on the genome, and how do you know which one is valid? There is a really no good answer. You could do some further processing and only consider reads that land inside exons of known genes. Or maybe you want to allow mismatches but only use those reads that match a single place on the genome.

In our experiment we are starting with the most conservative assumptions and slowly loosening the criteria as we gain more confidence in our methodology. So we only consider reads that match perfectly against mm9 genome and which fall inside of known exons with a coverage of at least 10 reads. We'll start to loosen the criteria and see how that affects our results.
Leave a comment:

Previous 1 2 3 4 5 6 7 8 15 34 template Next

Recent Advances in Sequencing Analysis Tools

by seqadmin

The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
- Channel: Articles
05-06-2024, 07:48 AM

Topics	Statistics	Last Post
New Milestone for COSMIC with Extensive Cancer Mutation Data by seqadmin Started by seqadmin, Yesterday, 02:06 PM	0 responses 7 views 0 likes	Last Post by seqadmin Yesterday, 02:06 PM
The Role of Spliceosomes in RNA Splicing and Genome Evolution by seqadmin Started by seqadmin, 05-14-2024, 07:03 AM	0 responses 27 views 0 likes	Last Post by seqadmin 05-14-2024, 07:03 AM
A Closer Look at the Enigmatic Genomes of Oikopleura dioica by seqadmin Started by seqadmin, 05-10-2024, 06:35 AM	0 responses 47 views 0 likes	Last Post by seqadmin 05-10-2024, 06:35 AM
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, 05-09-2024, 02:46 PM	0 responses 59 views 0 likes	Last Post by seqadmin 05-09-2024, 02:46 PM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News