Seqanswers Leaderboard Ad

**mskonan** · 03-11-2010, 06:30 PM

The difference between sequence and quality might not matter for the error

Originally posted by Ben Langmead View Post

Hi Rich,

Another user just contacted me via email and described something similar. When I ran their reads through bowtie, I realized that part of the problem is that Bowtie is printing the wrong error message. In their case, the error message should have been something more like "Too many quality values for read..." because they had a fastq entry where the quality string was 2 characters longer than the sequence string. Do you notice any inconsistencies like that in your input?

I'll fix the error-message bug.

Thanks,
Ben

Hi All,
I got the same error "Reads file contained a pattern with more than 1024 quality values." with Bowtie 0.12.3
My data have 76bp / the same length in quality:
ILLUMINA-1A5BF1 1 8 61 12450 2086 0 1 TGCTGCGCTGTGATTTCTCGCTGGCAGACTTGGGTTGGCTTTGCTGAGGGGACGTGAGACATTGTATCAGGGGCCA bbbbbbbbbbbbbbbbbbbbbbbbbbbcbcbbbbbbbbbbbbbbbbbbb`bbbIbbbb_bbbbabbb]bbbbbbbb 1

After I convert them to fastq format (76/76) like this:
@ILLUMINA-1A5BF1:8:1:1303:18887#0/1
TAGGAGGGTGACCTGAAGAGTGGAAGGAAGAGTCAGGAATACTCAGAAGAACCTGTGCATATAGGCCAGGCCCGAC
+ILLUMINA-1A5BF1:8:1:1303:18887#0/1
aaaa_aaaaaaaaaa]aaYaaaaaaaa_`a_a_a_aaXaa_a`[_aa_`N`aa_`]a]`aXHVV]a`^X]YQHYVa

I got the error.
I guess that the count difference between sequence and quality might not matter for the error.

It'd be greatly appreciated if someone can help me.

Cheers,

KJ

**Xi Wang** · 03-11-2010, 06:39 PM

Originally posted by mskonan View Post

Hi All,
I got the same error "Reads file contained a pattern with more than 1024 quality values." with Bowtie 0.12.3
My data have 76bp / the same length in quality:
ILLUMINA-1A5BF1 1 8 61 12450 2086 0 1 TGCTGCGCTGTGATTTCTCGCTGGCAGACTTGGGTTGGCTTTGCTGAGGGGACGTGAGACATTGTATCAGGGGCCA bbbbbbbbbbbbbbbbbbbbbbbbbbbcbcbbbbbbbbbbbbbbbbbbb`bbbIbbbb_bbbbabbb]bbbbbbbb 1

After I convert them to fastq format (76/76) like this:
@ILLUMINA-1A5BF1:8:1:1303:18887#0/1
TAGGAGGGTGACCTGAAGAGTGGAAGGAAGAGTCAGGAATACTCAGAAGAACCTGTGCATATAGGCCAGGCCCGAC
+ILLUMINA-1A5BF1:8:1:1303:18887#0/1
aaaa_aaaaaaaaaa]aaYaaaaaaaa_`a_a_a_aaXaa_a`[_aa_`N`aa_`]a]`aXHVV]a`^X]YQHYVa

I got the error.
I guess that the count difference between sequence and quality might not matter for the error.

It'd be greatly appreciated if someone can help me.

Cheers,

KJ

I guess the program just does not recognize the sign for carriage return?

**kevpar** · 03-15-2010, 12:57 PM

Ben,

I have just downloaded Bowtie and can't get it to run. A window opens for bowtie.exe, but then quickly closes down again. This occurs in both Ubuntu and Windows. I suspect I am missing something simple, but would appreciate your help.

**RichEast** · 03-16-2010, 10:08 AM

Originally posted by mskonan View Post

Hi All,
I got the same error "Reads file contained a pattern with more than 1024 quality values." with Bowtie 0.12.3
My data have 76bp / the same length in quality:
ILLUMINA-1A5BF1 1 8 61 12450 2086 0 1 TGCTGCGCTGTGATTTCTCGCTGGCAGACTTGGGTTGGCTTTGCTGAGGGGACGTGAGACATTGTATCAGGGGCCA bbbbbbbbbbbbbbbbbbbbbbbbbbbcbcbbbbbbbbbbbbbbbbbbb`bbbIbbbb_bbbbabbb]bbbbbbbb 1

After I convert them to fastq format (76/76) like this:
@ILLUMINA-1A5BF1:8:1:1303:18887#0/1
TAGGAGGGTGACCTGAAGAGTGGAAGGAAGAGTCAGGAATACTCAGAAGAACCTGTGCATATAGGCCAGGCCCGAC
+ILLUMINA-1A5BF1:8:1:1303:18887#0/1
aaaa_aaaaaaaaaa]aaYaaaaaaaa_`a_a_a_aaXaa_a`[_aa_`N`aa_`]a]`aXHVV]a`^X]YQHYVa

I got the error.
I guess that the count difference between sequence and quality might not matter for the error.

It'd be greatly appreciated if someone can help me.

Cheers,

KJ

mskonan,

We actually found that we missed some reads in our initial filtering for removing reads with multiple uncalled bases (denoted with "."). It seems that if the read has multiple uncalled bases this is a problem for bowtie and it gives the "Reads file contained a pattern with more than 1024 quality values" error. Once these are removed the program works fine with the same file command.

rich

**Thomas Doktor** · 03-16-2010, 10:22 AM

kevpar, you need to start Bowtie from a terminal.
In windows hit the windows-key+R and type "cmd" then hit run, in Ubuntu start a terminal via Programs>Accessories>Terminal.
If you have installed Bowtie in your path you can simply type "Bowtie" and hit enter, otherwise go to the path where you installed Bowtie first.

Running Bowtie with no arguments will give you a manual page describing which options are available. An easier way is to examine the online Bowtie manual at http://bowtie-bio.sourceforge.net/manual.shtml .

**dukevn** · 03-16-2010, 08:37 PM

bowtie options

Hi collective brains,

I am getting confused with some Bowtie's options. I did read the manual carefully and also read most of the reads in this thread, but still confused.

* What is difference between -m and -M? From the manual, it seems to me that -M is equivalent to -m --best --strata?

* What option I should use to filter out matches from repeated reads?

Thanks,

D.

**dukevn** · 03-17-2010, 05:43 AM

Hi again,

Looks like I already missed some very important posts in the beginning of the thread, especially Heng Li's post quoted below:

Originally posted by lh3 View Post

2. My main concern about bowtie is actually related to the column 7. I think by default (no --best), bowtie just outputs the first group of hits it meets. Users would not know whether it is the best or whether it is a repeat or not. I think (maybe wrong) this behaviour is only useful for screening human contaminations. With "--best", user would know the output is the best hit, but whether it is a repeat is still unknown in some cases. I know the "unknown" cases should be rare, but it would be necessary to convince users that the rare cases would not affect accuracy. Only with "--best -k 2", a user may know whether it is a repeat or not, although he/she would not know the number of occurrences. I think the "--best -k2" is the most desired behaviour and should become the default. Bowtie is fast enough. Slowing it down by a factor 3 will still make most users quite happy (see also below). Also quoting the speed under the default option would be unfair to others.

I totally don't get this. I think --best -k 2 means to report maximum 2 best alignments, isn't it? How does one know if it is a repeat or not by using this option?

Originally posted by lh3 View Post

5. back to how the alignments are reported. I think the bwa behaviour is useful if people do not care too much about speed. Knowing the number of suboptimal hits would help us to decide which alignments are reliable. I know this is important to some (not all) SV detection algorithms. If you think the bwa behaviour is costly (possibly it is), I would recommend the soap2's one. Frequently, we may want to know the exact number of occurences (no need to output the detailed aligments). I am sure having the soap2 behaviour would make bowtie more popular.

Heng Li, do you mean bwa's default is doing this? Can you elaborate a little more?

Also, anybody knows what option of bowtie I should use to archive this behavior?

Thanks,

D.

**mattanswers** · 03-18-2010, 11:04 AM

Originally posted by dukevn View Post

Hi collective brains,

I am getting confused with some Bowtie's options. I did read the manual carefully and also read most of the reads in this thread, but still confused.

* What is difference between -m and -M? From the manual, it seems to me that -M is equivalent to -m --best --strata?

* What option I should use to filter out matches from repeated reads?

Thanks,

D.

I think if you use -m 1 and then --max filename, you will select only sequences with 1 match and then in the file 'filename', specified after --max, will be all sequences that were filtered out, i.e. sequences with more than 1 match.

I also had a question. I believe the default behavior of bowtie is 2 mismatches in the first 28 bases. Are mismatches allowed after the 28th base ? So, if my sequences are 36 bases, there can be up to two mismatches in the first 28 bases, but how many mismatches are allowed from 28 to 36 ?

**dukevn** · 03-18-2010, 11:19 AM

Originally posted by mattanswers View Post

I think if you use -m 1 and then --max filename, you will select only sequences with 1 match and then in the file 'filename', specified after --max, will be all sequences that were filtered out, i.e. sequences with more than 1 match.

Yeah I thought of using -m 1 and filtering out. But I have a feeling of doing that will filter out a lot of valuable information. I am not sure, maybe advanced and more experienced people will have good advice about this.

Originally posted by mattanswers View Post

I also had a question. I believe the default behavior of bowtie is 2 mismatches in the first 28 bases. Are mismatches allowed after the 28th base ? So, if my sequences are 36 bases, there can be up to two mismatches in the first 28 bases, but how many mismatches are allowed from 28 to 36 ?

Isn't -l option (http://bowtie-bio.sourceforge.net/ma...wtie-options-l) for that purpose? Why cant you try -l 36 -n 2 (or -v 2)?

**mattanswers** · 03-18-2010, 12:15 PM

Isn't -l option (http://bowtie-bio.sourceforge.net/ma...wtie-options-l) for that purpose? Why cant you try -l 36 -n 2 (or -v 2)?[/QUOTE]

Thanks for your reply. I can use your suggestions to control mismatches, but I was interested in knowing for the sake of understanding my results using the default how many, if any, mismatches were allowed after the 28th base.

**dukevn** · 03-18-2010, 09:22 PM

Originally posted by mattanswers View Post

Thanks for your reply. I can use your suggestions to control mismatches, but I was interested in knowing for the sake of understanding my results using the default how many, if any, mismatches were allowed after the 28th base.

I dont think there is any option to control that. If you are in -n <int> mode, then the <int> maximum mismatches will be applied for length specified by -l only, and everything after that will be ignored: bowtie simply does not search/map on those extra bases and hence there is no mismatches applied there.

In the other case, if you use -v, then the seed will be applied for the whole read's length. -l will be ignored.

Cheers,

D.

**Xi Wang** · 03-19-2010, 09:04 AM

Originally posted by dukevn View Post

I dont think there is any option to control that. If you are in -n <int> mode, then the <int> maximum mismatches will be applied for length specified by -l only, and everything after that will be ignored: bowtie simply does not search/map on those extra bases and hence there is no mismatches applied there.

In the other case, if you use -v, then the seed will be applied for the whole read's length. -l will be ignored.

Cheers,

D.

Please note that there is another option:

-e/--maqerr <int>
Maximum permitted total of quality values at all mismatched read positions throughout the entire alignment, not just in the "seed". The default is 70. Like Maq, bowtie rounds quality values to the nearest 10 and saturates at 30; rounding can be disabled with --nomaqround.

**mattanswers** · 03-19-2010, 09:49 AM

Originally posted by dukevn View Post

bowtie simply does not search/map on those extra bases and hence there is no mismatches applied there.
D.

So, if I put in sequences that are 36 bases, bowtie is only looking at the first 28 and I might just as well have put in sequences with only 28 ?

I don't think that is the case, because I have had sequences where tiles 28 and 32 were inadvertently left out and alignment was only 17% due to the sequence shifting (base 29 becomes 28, 30 becomes 29, etc). However, when I use the -3/ trim function the % alignment gradually increases with each base trimmed until at 6 bases trimmed from the 3' end I get the same %alignment as when tiles 28 and 32 are present. With 6 bases trimmed, that would leave 30, and room for two mismatches (28, 29).

So, that would suggest that bowtie does look beyond the 28th base and use these bases for alignment, but is it allowing for mismatches ( I don't want to control the behavior, I just want to know what is going on.)

**JimC** · 03-29-2010, 05:34 PM

best-first chunk memory problem

Ben,
I've tried to read all the posts, but I may have missed this answer if posted.

I'm having problem with running bowtie on a mouse genome dataset.
The error is a large number of reads giving the warning:

Warning: Exhausted best-first chunk memory for read .....

current command line: bowtie -S -p 1 --solexa1.3-quals --un unmapped.fq -m 10 --max maxmapped.fq -n 3 -X 600 /ccmb/CoreBA/Data/BowtieData/mm9 -1 ../s_7_1_sequence.txt -2 ../s_7_2_sequence.txt mm9_align.sam

version: bowtie --version
bowtie version 0.12.3
64-bit
Built on ccmb-comp1.umms.med.umich.edu
Tue Mar 2 12:33:36 EST 2010
Compiler: gcc version 4.1.2 20080704 (Red Hat 4.1.2-46)
Options: -O3
Sizeof {int, long, long long, void*, size_t, off_t}: {4, 8, 8, 8, 8, 8}

Any suggestions would be helpful as I feel that I'm not getting the level of alignment I should be seeing with this data.

Thanks !

Jim

**James** · 03-30-2010, 12:38 AM

Tutorial - Build a new index

Hi guys,

I'm new to seqanswers and to alignment. So I started using bowtie as it seems to have a good reputation manual and tutorial.

I am working through the tutorial, I have got to the point where I am supposed to build a new index using the E. coli strain O157:H7 downloaded from ncbi. However when I run the command I get this error

could not open NC_002127.fna

I'm using terminal in Mac OSX 10.5.8. Anybody had the same problem? Am I not putting the NC-002127.fna file in the correct directory?

I would move on in the tutorial however I need to use the build option to create an index for the organism I work on.

Thanks in advance.

Topics	Statistics	Last Post
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, Yesterday, 05:31 AM	0 responses 10 views 0 likes	Last Post by seqadmin Yesterday, 05:31 AM
Small Blood Stem Cell Subset Linked to Immune System Aging by seqadmin Started by seqadmin, 10-24-2024, 06:58 AM	0 responses 20 views 0 likes	Last Post by seqadmin 10-24-2024, 06:58 AM
New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types by seqadmin Started by seqadmin, 10-23-2024, 08:43 AM	0 responses 48 views 0 likes	Last Post by seqadmin 10-23-2024, 08:43 AM
Microbes in Urban Spaces Adapt to Disinfectants and Scarce Resources by seqadmin Started by seqadmin, 10-17-2024, 07:29 AM	0 responses 58 views 0 likes	Last Post by seqadmin 10-17-2024, 07:29 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News