Seqanswers Leaderboard Ad

**jmj1091** · 10-06-2009, 09:56 AM

Hi Ben,
I was playing around with the new bowtie version, and I noticed that specifying the --best flag resulted in an output that had a different number of mappings than when the flag is omitted. However, in the bowtie documentation, it says that --best does not change which alignments are considered valid. I tried this with multiple sets of reads, and each time the number of alignments differed. Is this a bug or am I misunderstanding the effects of --best? The other flags I was using (in both cases) were -S, -p, --solexa1.3-quals, --al, and --un.
Thanks!

**kraigrs** · 10-08-2009, 12:53 PM

Trouble with bowtie printing SAM output

When I attempt to align mated paired-end sequence reads and output the file
in SAM format, I receive a segmentation fault. If I try the same thing
without the -S/--sam option, it works fine. Here is what I am getting:

EEB-WITT5:Bowtie wittkopp-lab$ ./bowtie -q -k 1 --sam --best
--solexa1.3-quals dmel-all-CDS-r5.21 -1
./mel_sim_data/Hybrids/s_2_1_sequence.txt -2
./mel_sim_data/Hybrids/s_2_2_sequence.txt > s_2_sequence.sam

Segmentation fault

Any help in this matter would be greatly appreciated! Again, I would like
this output to be in SAM format. I tried converting the bowtie output to
SAM but the bowtie2sam.pl script doesn't do that.

**Ben Langmead** · 10-08-2009, 01:32 PM

Originally posted by kraigrs View Post

When I attempt to align mated paired-end sequence reads and output the file
in SAM format, I receive a segmentation fault.

Hi kraigrs. I'll definitely take a look. Is it possible for me to get those input files from you? I can't reproduce that with the reads and indexes that I've tried on my end.

Thanks,
Ben

**Ben Langmead** · 10-08-2009, 01:34 PM

Originally posted by jmj1091 View Post

I was playing around with the new bowtie version, and I noticed that specifying the --best flag resulted in an output that had a different number of mappings than when the flag is omitted.

I'll take a look. You used --al and --un. Is the number of reads different in those files, or in the alignment output? Or both?

**axiom7** · 10-15-2009, 11:19 AM

Polonator data with gaps

Hi Ben,
I have some output from a Polonator. This is paired-end data with gaps. For instance, the raw data is 26 base pairs. The researcher asserts this to be 2x15mers with a gap of two nucleotides between base 7 and 8, and between 20 and 21. He also asserts that the spacing between the two 15mers is between 500 and 1500 bases. I used a perl script to insert "NN" in the two gaps, and to create two mated fasta files. Ran the following:

bowtie -t -p 8 -v 3 -m 100 -I 500 -X 1500 -f --ff -a -1 mate1.fa -2 mate2.fa

This seemed to run reasonably and I /think/ I am asking for alignments with 1 additional mismatch beyond the 2 gapped nucleotides.

Problem occurs in the second set of data. The researcher asserts the 26 base pairs to have a 6 nucleotide gap, but when I attempt to run the above bowtie command (after processing the raw data with my perl script) with "-v 7" I get an error message: "-v arg must be at most 3". Am I out of luck here? Am I asking bowtie to do something for which it is not designed?
Thank you.
Susan

**sparks** · 10-15-2009, 03:42 PM

Hi Susan,
I think you could align this using Novoalign as the N's won't count as full mismatches, only P=0.25 of mismatch and hence penalty of 6. Building the index with a k-mer length of 7 might improve performance. If you'd like to discuss further you can contact me via email at colin at novocraft <.>com

Colin

**Ben Langmead** · 10-15-2009, 05:21 PM

Hi Susan,

Originally posted by axiom7 View Post

I have some output from a Polonator. This is paired-end data with gaps. For instance, the raw data is 26 base pairs. The researcher asserts this to be 2x15mers with a gap of two nucleotides between base 7 and 8, and between 20 and 21. He also asserts that the spacing between the two 15mers is between 500 and 1500 bases. I used a perl script to insert "NN" in the two gaps, and to create two mated fasta files. Ran the following:

bowtie -t -p 8 -v 3 -m 100 -I 500 -X 1500 -f --ff -a -1 mate1.fa -2 mate2.fa

This seemed to run reasonably and I /think/ I am asking for alignments with 1 additional mismatch beyond the 2 gapped nucleotides.

Yes, I agree that this should work. And I agree that, because of the NNs, you are effectively asking for alignments with 1 additional mismatch.

Problem occurs in the second set of data. The researcher asserts the 26 base pairs to have a 6 nucleotide gap, but when I attempt to run the above bowtie command (after processing the raw data with my perl script) with "-v 7" I get an error message: "-v arg must be at most 3". Am I out of luck here? Am I asking bowtie to do something for which it is not designed?
Thank you.
Susan

The answer to whether you're asking bowtie to do something it was not designed to do is "yes"

. But it is definitely still possible to use Bowtie. My suggestion would be to use, for instance -n 1 -l X -e Y, where -l X is set so that the "seed" falls just short of the string of Ns, and -e Y is set according to the number of Ns + the number of mismatches you would like to allow beyond the Ns. (Your input is fasta, so every mismatch incurs a quality penalty of 30. So for 6 Ns + 1 mismatch, -e 210 is appropriate.) Here is an example where I align a read of the format you describe to the human genome:

Code:

sycamore:~/research/bowtie $ cat tmp.fa
>r
CTTCGTGGGTATTNNNNNNGCGGAGCAGAGTT
sycamore:~/research/bowtie $ ./bowtie --best -n 1 -l 13 -e 210 -f /fs/szasmg/langmead/ebwts/h_sapiens_asm tmp.fa
r	+	gi|89161187|ref|NC_000010.9|NC_000010	135373946	CTTCGTGGGTATTNNNNNNGCGGAGCAGAGTT	IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII	5:G>T,13:G>N,14:C>N,15:G>N,16:A>N,17:A>N,18:G>N
# reads processed: 1
# reads with at least one reported alignment: 1 (100.00%)
# reads that failed to align: 0 (0.00%)
Reported 1 alignments to 1 output stream(s)

That set of parameters is designed to effectively allow 1 mismatch beyond the mismatches forced by the Ns, as you can see in the above alignment.

It's worth noting that if you can (eventually) get the Polonater to give you an anchor of, say, 20bp instead of 13bp, bowtie run in this mode will be substantially faster.

I hope that's helpful; if it's still unclear, please feel free to email me.

Thanks,
Ben

**axiom7** · 10-16-2009, 06:08 AM

Ben and sparks,

Thanks for all the input. I will be working on this today and will respond back to you.

Susan

**para_seq** · 10-23-2009, 07:03 AM

Hi, Ben,

I got a question with using Bowtie to map Illumina transcriptom reads to a prokaryote genome. There are two copies of the identical gene encoded by '+' and '-' strands. What I don't understand is that both copies are able to be mapped with a large number of unique RNA_seq reads (value = 0 in column 7 of the bowie output) in both '+' and '-' orientations. The mapped reads to each copy have approximately 3 to 1 ratio in + and - orientations.

Anything I did was wrong?
Please help me to clarify my understanding. Thank you.

**axiom7** · 10-23-2009, 07:18 AM

Hi Ben,

I just wanted to follow up on my original question regarding using -v n, where n>3 which I posted 10/16/09. I am satisfied with your solution. In fact I find that the results of using -v 3, vs, -n 1 -l 7 -e 90 yield pretty much identical results, and so I am comfortable using -3 210 to "simulate" -v 7.

Thanks.
Susan

**Ben Langmead** · 10-23-2009, 08:52 AM

Originally posted by para_seq View Post

The mapped reads to each copy have approximately 3 to 1 ratio in + and - orientations.

Hi para_seq,

The bias you see may or may not be due to alignment. Bowtie does have options that seek to remove strand bias, e.g. the --best option. If you still see the bias using --best, then the bias is probably inherent in your reads.

Hope that helps,
Ben

**Ben Langmead** · 10-23-2009, 08:55 AM

Hi Susan,

Originally posted by axiom7 View Post

I just wanted to follow up on my original question regarding using -v n, where n>3 which I posted 10/16/09. I am satisfied with your solution. In fact I find that the results of using -v 3, vs, -n 1 -l 7 -e 90 yield pretty much identical results, and so I am comfortable using -3 210 to "simulate" -v 7.

I'm glad! As I say, if the Pollinator can (eventually) be made to give you a longer stretch of unambiguous bases before the NNNNN gap, then you can bump -l up accordingly and performance should improve quite a bit.

Thanks,
Ben

**amaer** · 10-26-2009, 02:18 PM

Hi Ben,

What is the update on Bowtie doing gapped alignments?

Thanks!

**Ben Langmead** · 10-27-2009, 06:10 AM

Originally posted by amaer View Post

What is the update on Bowtie doing gapped alignments?

Hi amaer,

Perhaps by end-of-year. It's very hard to say because most of my time goes to collaborators, and they don't have predictable schedules

. But by end-of-year is a reasonable guess.

Thanks,
Ben

**axiom7** · 10-27-2009, 08:24 AM

Originally posted by axiom7 View Post

Hi Ben,

I just wanted to follow up on my original question regarding using -v n, where n>3 which I posted 10/16/09. I am satisfied with your solution. In fact I find that the results of using -v 3, vs, -n 1 -l 7 -e 90 yield pretty much identical results, and so I am comfortable using -3 210 to "simulate" -v 7.

Thanks.
Susan

Sorry, I meant using -e 210 to simulate - not -3 210.

Susan

Topics	Statistics	Last Post
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, Yesterday, 05:31 AM	0 responses 10 views 0 likes	Last Post by seqadmin Yesterday, 05:31 AM
Small Blood Stem Cell Subset Linked to Immune System Aging by seqadmin Started by seqadmin, 10-24-2024, 06:58 AM	0 responses 20 views 0 likes	Last Post by seqadmin 10-24-2024, 06:58 AM
New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types by seqadmin Started by seqadmin, 10-23-2024, 08:43 AM	0 responses 50 views 0 likes	Last Post by seqadmin 10-23-2024, 08:43 AM
Microbes in Urban Spaces Adapt to Disinfectants and Scarce Resources by seqadmin Started by seqadmin, 10-17-2024, 07:29 AM	0 responses 58 views 0 likes	Last Post by seqadmin 10-17-2024, 07:29 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News