Bowtie, an ultrafast, memory-efficient, open source short read aligner

betty replied

04-07-2010, 01:39 AM
mismatches in color space

hi,
I have a doubt about color space mismatches, and I don't know whether my understanding is correct.

I set -C -n 2 -l 20 for SOLiD data alignment. So, it permits at most 2 color space mismatches in the first 20 characters of the read (trimming the tag "T" and the first base).
In the output file, the single mismatch was treated as system error and was ignored. Only the adjacent mismatches which could be correctly explained by SNP were reported.

So, there may be many many single mismatches ignored in Bowtie because of system error? Is there any other consideration besides single and adjacent mismatches?

Any suggestions would be appreciated.

Regards,

Betty
Leave a comment:
Lien replied

04-06-2010, 05:14 AM
Originally posted by JimC View Post

Ben,
I've tried to read all the posts, but I may have missed this answer if posted.

I'm having problem with running bowtie on a mouse genome dataset.
The error is a large number of reads giving the warning:

Warning: Exhausted best-first chunk memory for read .....

current command line: bowtie -S -p 1 --solexa1.3-quals --un unmapped.fq -m 10 --max maxmapped.fq -n 3 -X 600 /ccmb/CoreBA/Data/BowtieData/mm9 -1 ../s_7_1_sequence.txt -2 ../s_7_2_sequence.txt mm9_align.sam

version: bowtie --version
bowtie version 0.12.3
64-bit
Built on ccmb-comp1.umms.med.umich.edu
Tue Mar 2 12:33:36 EST 2010
Compiler: gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)
Options: -O3
Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}

Any suggestions would be helpful as I feel that I'm not getting the level of alignment I should be seeing with this data.

Thanks !
Jim

Hello,

I have the same problems using a 64-bit computer ('Warning: Exhausted best-first chunk memory for read HWUSI_ ...; skipping read). My paired-end data are in .txt format. Could this have anything to do with the problem? Otherwise, as I'm only starting to work with these data, I have no clue to other things that could be causing this problem.

Thanks a lot!
Lien
Leave a comment:
Xi Wang replied

04-01-2010, 08:42 AM
Originally posted by betty View Post

hello everyone,
I am using Bowtie for SOLiD based metagenomic mapping. The references are millions of 1kb-length fragments, I don't know wheather it is the reason that index-building spent quite a lot of time (Total time for backward call to driver() for mirror index: 01:56:46).

My former analysis using AB software indicated that most of the reads could aligned to many references, and the mappable reads were not considerable.

For my case, could you give some suggestion on index-building and alignment parameters setting?

Thanks a lot,

Betty

I think it could be such long time. For the human genome (~ 3Gb and 25 chromosomes), it takes about 3-4 hours.
Leave a comment:
betty replied

04-01-2010, 12:06 AM
Bowtie for meta data

hello everyone,
I am using Bowtie for SOLiD based metagenomic mapping. The references are millions of 1kb-length fragments, I don't know wheather it is the reason that index-building spent quite a lot of time (Total time for backward call to driver() for mirror index: 01:56:46).

My former analysis using AB software indicated that most of the reads could aligned to many references, and the mappable reads were not considerable.

For my case, could you give some suggestion on index-building and alignment parameters setting?

Thanks a lot,

Betty
Leave a comment:
varunkilaru replied

03-30-2010, 01:00 PM
Out of Memory allocating the offs[] array for bowtie index

Hi Ben
First of all,Bowtie is great. Thanks for that.I am running into a few problems when I was working with the human index.

The command I used was bowtie -C hg19_c reads/e_coli_1000.fq. This fails with the message : Out of Memory allocating the offs[] array for bowtie index.

I am using a 8GB Windows 64 bit processor with a 3 ghz quad core processor. I am not understanding why it is running out of memory. Can you kindly let me know what could be the problem? Thanks a lot and sorry for the inconvenience
Leave a comment:
Xi Wang replied

03-30-2010, 07:19 AM
Originally posted by James View Post

Hi guys,

I'm new to seqanswers and to alignment. So I started using bowtie as it seems to have a good reputation manual and tutorial.

I am working through the tutorial, I have got to the point where I am supposed to build a new index using the E. coli strain O157:H7 downloaded from ncbi. However when I run the command I get this error

could not open NC_002127.fna

I'm using terminal in Mac OSX 10.5.8. Anybody had the same problem? Am I not putting the NC-002127.fna file in the correct directory?

I would move on in the tutorial however I need to use the build option to create an index for the organism I work on.

Thanks in advance.

Is your file NC_002127.fna in the work directory, which means the current directory where you type the command. If not, please give the path to the file, then the program can find the file.
Leave a comment:
James replied

03-30-2010, 12:38 AM
Tutorial - Build a new index

Hi guys,

I'm new to seqanswers and to alignment. So I started using bowtie as it seems to have a good reputation manual and tutorial.

I am working through the tutorial, I have got to the point where I am supposed to build a new index using the E. coli strain O157:H7 downloaded from ncbi. However when I run the command I get this error

could not open NC_002127.fna

I'm using terminal in Mac OSX 10.5.8. Anybody had the same problem? Am I not putting the NC-002127.fna file in the correct directory?

I would move on in the tutorial however I need to use the build option to create an index for the organism I work on.

Thanks in advance.
Leave a comment:
JimC replied

03-29-2010, 05:34 PM
best-first chunk memory problem

Ben,
I've tried to read all the posts, but I may have missed this answer if posted.

I'm having problem with running bowtie on a mouse genome dataset.
The error is a large number of reads giving the warning:

Warning: Exhausted best-first chunk memory for read .....

current command line: bowtie -S -p 1 --solexa1.3-quals --un unmapped.fq -m 10 --max maxmapped.fq -n 3 -X 600 /ccmb/CoreBA/Data/BowtieData/mm9 -1 ../s_7_1_sequence.txt -2 ../s_7_2_sequence.txt mm9_align.sam

version: bowtie --version
bowtie version 0.12.3
64-bit
Built on ccmb-comp1.umms.med.umich.edu
Tue Mar 2 12:33:36 EST 2010
Compiler: gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)
Options: -O3
Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}

Any suggestions would be helpful as I feel that I'm not getting the level of alignment I should be seeing with this data.

Thanks !

Jim
Leave a comment:
mattanswers replied

03-19-2010, 09:49 AM
Originally posted by dukevn View Post

bowtie simply does not search/map on those extra bases and hence there is no mismatches applied there.
D.

So, if I put in sequences that are 36 bases, bowtie is only looking at the first 28 and I might just as well have put in sequences with only 28 ?

I don't think that is the case, because I have had sequences where tiles 28 and 32 were inadvertently left out and alignment was only 17% due to the sequence shifting (base 29 becomes 28, 30 becomes 29, etc). However, when I use the -3/ trim function the % alignment gradually increases with each base trimmed until at 6 bases trimmed from the 3' end I get the same %alignment as when tiles 28 and 32 are present. With 6 bases trimmed, that would leave 30, and room for two mismatches (28, 29).

So, that would suggest that bowtie does look beyond the 28th base and use these bases for alignment, but is it allowing for mismatches ( I don't want to control the behavior, I just want to know what is going on.)
Leave a comment:
Xi Wang replied

03-19-2010, 09:04 AM
Originally posted by dukevn View Post

I dont think there is any option to control that. If you are in -n <int> mode, then the <int> maximum mismatches will be applied for length specified by -l only, and everything after that will be ignored: bowtie simply does not search/map on those extra bases and hence there is no mismatches applied there.

In the other case, if you use -v, then the seed will be applied for the whole read's length. -l will be ignored.

Cheers,

D.

Please note that there is another option:

-e/--maqerr <int>
Maximum permitted total of quality values at all mismatched read positions throughout the entire alignment, not just in the "seed". The default is 70. Like Maq, bowtie rounds quality values to the nearest 10 and saturates at 30; rounding can be disabled with --nomaqround.
Leave a comment:
dukevn replied

03-18-2010, 09:22 PM
Originally posted by mattanswers View Post

Thanks for your reply. I can use your suggestions to control mismatches, but I was interested in knowing for the sake of understanding my results using the default how many, if any, mismatches were allowed after the 28th base.

I dont think there is any option to control that. If you are in -n <int> mode, then the <int> maximum mismatches will be applied for length specified by -l only, and everything after that will be ignored: bowtie simply does not search/map on those extra bases and hence there is no mismatches applied there.

In the other case, if you use -v, then the seed will be applied for the whole read's length. -l will be ignored.

Cheers,

D.
Leave a comment:
mattanswers replied

03-18-2010, 12:15 PM
Isn't -l option (http://bowtie-bio.sourceforge.net/ma...wtie-options-l) for that purpose? Why cant you try -l 36 -n 2 (or -v 2)?[/QUOTE]

Thanks for your reply. I can use your suggestions to control mismatches, but I was interested in knowing for the sake of understanding my results using the default how many, if any, mismatches were allowed after the 28th base.
Leave a comment:
dukevn replied

03-18-2010, 11:19 AM
Originally posted by mattanswers View Post

I think if you use -m 1 and then --max filename, you will select only sequences with 1 match and then in the file 'filename', specified after --max, will be all sequences that were filtered out, i.e. sequences with more than 1 match.

Yeah I thought of using -m 1 and filtering out. But I have a feeling of doing that will filter out a lot of valuable information. I am not sure, maybe advanced and more experienced people will have good advice about this.

Originally posted by mattanswers View Post

I also had a question. I believe the default behavior of bowtie is 2 mismatches in the first 28 bases. Are mismatches allowed after the 28th base ? So, if my sequences are 36 bases, there can be up to two mismatches in the first 28 bases, but how many mismatches are allowed from 28 to 36 ?

Isn't -l option (http://bowtie-bio.sourceforge.net/ma...wtie-options-l) for that purpose? Why cant you try -l 36 -n 2 (or -v 2)?
Leave a comment:
mattanswers replied

03-18-2010, 11:04 AM
Originally posted by dukevn View Post

Hi collective brains,

I am getting confused with some Bowtie's options. I did read the manual carefully and also read most of the reads in this thread, but still confused.

* What is difference between -m and -M? From the manual, it seems to me that -M is equivalent to -m --best --strata?

* What option I should use to filter out matches from repeated reads?

Thanks,

D.

I think if you use -m 1 and then --max filename, you will select only sequences with 1 match and then in the file 'filename', specified after --max, will be all sequences that were filtered out, i.e. sequences with more than 1 match.

I also had a question. I believe the default behavior of bowtie is 2 mismatches in the first 28 bases. Are mismatches allowed after the 28th base ? So, if my sequences are 36 bases, there can be up to two mismatches in the first 28 bases, but how many mismatches are allowed from 28 to 36 ?
Leave a comment:
dukevn replied

03-17-2010, 05:43 AM
Hi again,

Looks like I already missed some very important posts in the beginning of the thread, especially Heng Li's post quoted below:

Originally posted by lh3 View Post

2. My main concern about bowtie is actually related to the column 7. I think by default (no --best), bowtie just outputs the first group of hits it meets. Users would not know whether it is the best or whether it is a repeat or not. I think (maybe wrong) this behaviour is only useful for screening human contaminations. With "--best", user would know the output is the best hit, but whether it is a repeat is still unknown in some cases. I know the "unknown" cases should be rare, but it would be necessary to convince users that the rare cases would not affect accuracy. Only with "--best -k 2", a user may know whether it is a repeat or not, although he/she would not know the number of occurrences. I think the "--best -k2" is the most desired behaviour and should become the default. Bowtie is fast enough. Slowing it down by a factor 3 will still make most users quite happy (see also below). Also quoting the speed under the default option would be unfair to others.

I totally don't get this. I think --best -k 2 means to report maximum 2 best alignments, isn't it? How does one know if it is a repeat or not by using this option?

Originally posted by lh3 View Post

5. back to how the alignments are reported. I think the bwa behaviour is useful if people do not care too much about speed. Knowing the number of suboptimal hits would help us to decide which alignments are reliable. I know this is important to some (not all) SV detection algorithms. If you think the bwa behaviour is costly (possibly it is), I would recommend the soap2's one. Frequently, we may want to know the exact number of occurences (no need to output the detailed aligments). I am sure having the soap2 behaviour would make bowtie more popular.

Heng Li, do you mean bwa's default is doing this? Can you elaborate a little more?

Also, anybody knows what option of bowtie I should use to archive this behavior?

Thanks,

D.
Leave a comment:

Previous 1 3 10 11 12 13 14 15 16 23 34 template Next

Recent Advances in Sequencing Analysis Tools

by seqadmin

The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
- Channel: Articles
05-06-2024, 07:48 AM
Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM

Topics	Statistics	Last Post
A Closer Look at the Enigmatic Genomes of Oikopleura dioica by seqadmin Started by seqadmin, Yesterday, 06:35 AM	0 responses 15 views 0 likes	Last Post by seqadmin Yesterday, 06:35 AM
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, 05-09-2024, 02:46 PM	0 responses 21 views 0 likes	Last Post by seqadmin 05-09-2024, 02:46 PM
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 18 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, 05-06-2024, 07:17 AM	0 responses 19 views 0 likes	Last Post by seqadmin 05-06-2024, 07:17 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News