Seqanswers Leaderboard Ad

**Ben Langmead** · 07-13-2009, 07:57 AM

Originally posted by apostrophe View Post

...does Bowtie support FASTA nucleic acid codes that code for two bases, such as Y = T or C for the genome? Thanks in advance.

Bowtie will index and align against references containing non-A/C/G/T characters, but alignments overlapping non-A/C/G/T characters in the reference are invalid and won't be reported.

Out of curiosity, what's the behavior you would like? E.g. if a C in a read were to align against a Y in the genome, would you like that to be considered a match, incurring no penalty against the alignment?

Thanks,
Ben

**apostrophe** · 07-13-2009, 08:13 AM

I was hoping to use Bowtie in order to align a large amount of reads against a genome that has SNPs in the stated format above. If not, I suppose I'll have to figure out some other method of alignment.

Thanks for your quick reply!

**seq_GA** · 07-14-2009, 01:51 AM

Hi Ben,

Thanks for support.

I am trying to compare the eland and Bowtie results. Many reads are not getting mapped using Bowtie where as eland reports as unique tags without any mismatch. An example would be as follows:

Code:

>read1 AGTCTGTTTATGTTCAGCACAATTTTTTTTTTTTG  U0  1   0  0  chr8.fa 37178235  R DD

Where as Bowtie result for the above read is as follows:

Code:

./bowtie -a -m 10 -n 2 --strata --best -p 15 ../Genome/hg18/hg18 -c AGTCTGTTTATGTTCAGCACAATTTTTTTTTTTTG
No results

I have build the reference genome with default parameters.

Code:

./bowtie-build <reference_in> <index_baename>

Why Bowtie is not reporting the mapping?
Please let me know whether any changes in the parameters needs to be done.

And also my query would be how Bowtie handles if there are "N"s in the query reads?

Thanks.

**Ben Langmead** · 07-14-2009, 05:14 AM

Hi seq_GA,

Originally posted by seq_GA View Post

I am trying to compare the eland and Bowtie results. Many reads are not getting mapped using Bowtie where as eland reports as unique tags without any mismatch. An example would be as follows:

Code:

>read1 AGTCTGTTTATGTTCAGCACAATTTTTTTTTTTTG  U0  1   0  0  chr8.fa 37178235  R DD

Where as Bowtie result for the above read is as follows:

Code:

./bowtie -a -m 10 -n 2 --strata --best -p 15 ../Genome/hg18/hg18 -c AGTCTGTTTATGTTCAGCACAATTTTTTTTTTTTG
No results

Can you confirm that it ought to align by looking at the reference? I don't have the hg18 index lying around, but in the h_sapiens_asm index, your example aligns uniquely with 3 mismatches:

Code:

./bowtie -a -v 3 /fs/szasmg/langmead/ebwts/h_sapiens_asm -c AGTCTGTTTATGTTCAGCACAATTTTTTTTTTTTG
0	-	gi|51511724|ref|NC_000008.9|NC_000008	37178227	CAAAAAAAAAAAATTGTGCTGAACATAAACAGACT	IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII	0	31:G>A,33:C>A,34:T>C
Reported 1 alignments to 1 output stream(s)

Also, if you want the output to look like Eland, you should use -v 2 instead of -n 2. -n 2 activates a Maq-like alignment policy.

And also my query would be how Bowtie handles if there are "N"s in the query reads?

An N in a read counts always counts as a mismatch in the alignment.

Thanks,
Ben

**plattsa** · 07-14-2009, 07:21 AM

>read1 AGTCTGTTTATGTTCAGCACAATTTTTTTTTTTTG U0 1 0 0 chr8.fa 37178235 R DD

I'm not sure, but didn't the earlier eland only report mismatches over the first 32 bases? Hence mismatches in the final base reads would still allow a U0?

**seq_GA** · 07-14-2009, 11:01 PM

Hi Ben,

Thanks for your prompt response.

with -v 3, Bowtie is also reporting one mapping location.

I want to use seedlength as 28(default) with 2 mismatches. hence I used -n 2 since I am comparing eland_28 and Bowtie results.

But still why Bowtie is not reporting?

**seq_GA** · 07-15-2009, 12:31 AM

Hi Ben,
I did a quick comparison on with -v 2 and -n 2.

The reads are 35bps length and i used -3 6 to trim 3` sequences and hence my mappabale reads would be 28 in size in order for me to compare eland_28 results.

Code:

 bowtie -a -m 10 -v 2 --strata --best --solexa-quals  -p 15 -3 ../../Genome/hg18/hg18 ../s_1_sequence.txt out_aln.txt

When I look at the unque ly mapped tags with -v 2 is more than with -n 2.

Can you please explain me why there are more number of mapping when -v 2?

Thanks.

**Ben Langmead** · 07-15-2009, 06:53 AM

Originally posted by seq_GA View Post

with -v 3, Bowtie is also reporting one mapping location.

I want to use seedlength as 28(default) with 2 mismatches. hence I used -n 2 since I am comparing eland_28 and Bowtie results.

But still why Bowtie is not reporting?

Probably because the -e limit is disqualifying that alignment. If you'd like Bowtie to report alignments like that, try setting a higher -e than the default (70). -e is described in the Maq-like Policy section of the manual.

Ben

**Ben Langmead** · 07-15-2009, 06:54 AM

Originally posted by seq_GA View Post

Can you please explain me why there are more number of mapping when -v 2?

Probably the -e limit again. See my previous post.

Ben

**frozenlyse** · 07-16-2009, 02:46 AM

Originally posted by Ben Langmead View Post

Bowtie will index and align against references containing non-A/C/G/T characters, but alignments overlapping non-A/C/G/T characters in the reference are invalid and won't be reported.

Out of curiosity, what's the behavior you would like? E.g. if a C in a read were to align against a Y in the genome, would you like that to be considered a match, incurring no penalty against the alignment?

Thanks,
Ben

The reason out group would like this functionality is because we are investigating performing DNA methylation analysis via illumina bisulfite sequencing -> in this case C nucleotides in the normal genome will either be C or T nucleotides in the bisulfute converted genome.

So our preferred behavior would be to not penalise either the C or T (if the reference contained a Y at this position)

Anyway I find bowtie very useful, thanks for all your work!

**Ben Langmead** · 07-23-2009, 04:58 AM

Hi Chuck,

Originally posted by chuck View Post

I tried bowtie remade with extraflags but it just did the same thing. Would there be a log file somewhere or something in the map file? I can't seem to find any additional output.

If you have a moment, could you try your run again using the latest version of Bowtie (0.10.1, released on Monday).

Thanks,
Ben

**seq_GA** · 07-27-2009, 07:48 PM

Originally posted by Ben Langmead View Post

Probably the -e limit again. See my previous post.

Ben

Hi Ben,

I am trying to get as many mapping as eland reports and trying to play around with Bowtie's parameters.
As you had suggested earlier, I tried using -e till 2000 to increase the mapping as good as eland but still Bowtie misses a lot of mappings when compared to eland.

-v option would give a comparable results ( I tested for read length 28 which is also the seed length) as eland but with the increasing number of Ns in the 3`end, it would be good to use -n option and try to allow any number of mismatches beyond seed length.

And hence any suggestions to increase the mapping rate of Bowtie using -n options?

Thanks.

**Ben Langmead** · 07-29-2009, 07:38 AM

Originally posted by seq_GA View Post

And hence any suggestions to increase the mapping rate of Bowtie using -n options?

The main options used to adjust the sensitivity of mapping in Maq-like alignment mode are -n, -l, -e, --maxbts/-y. If there is a particular alignment you think Bowtie should be finding but isn't, please let me know and I can take a look.

Thanks,
Ben

**chuck** · 07-30-2009, 01:49 AM

Hi Ben,

I've been teaching and not working on the data lately. I will give it a try soon.

I have a question for you about assembly quality evaluation, in two contexts.

1) to simply evaluate the quality of the assembly of the short reads against the reference sequences, beyond simple coverage
2) when there are actual differences between the sequenced genome and the reference genome, in finding indels and whatnot

I am looking AMOS, which seems to be one of the few that provide some kind of quality score for the assembly. Are you aware of others?

I am trying to quickly narrow my analysis down to those de novo contigs with good assembly scores. I proposed a simple metric in a manuscript and the reviewer suggested I use other 'standard' measures but gave no pointers as to which ones I should be using. Things are changing so fast it is hard to keep track of the 'standard'...

Thanks,
Chuck

**chuck** · 07-31-2009, 01:49 PM

Ben,

I used the latest version 0.10.1 and it still hangs. It seems to complete the job (or almost, I haven't verified that fact yet) and stops writing to the output file but then it never closes.

Do you want me to run the debug version or try the extra flags again?

Chuck

Topics	Statistics	Last Post
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, Yesterday, 05:31 AM	0 responses 10 views 0 likes	Last Post by seqadmin Yesterday, 05:31 AM
Small Blood Stem Cell Subset Linked to Immune System Aging by seqadmin Started by seqadmin, 10-24-2024, 06:58 AM	0 responses 20 views 0 likes	Last Post by seqadmin 10-24-2024, 06:58 AM
New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types by seqadmin Started by seqadmin, 10-23-2024, 08:43 AM	0 responses 48 views 0 likes	Last Post by seqadmin 10-23-2024, 08:43 AM
Microbes in Urban Spaces Adapt to Disinfectants and Scarce Resources by seqadmin Started by seqadmin, 10-17-2024, 07:29 AM	0 responses 58 views 0 likes	Last Post by seqadmin 10-17-2024, 07:29 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News