Seqanswers Leaderboard Ad

**chuck** · 06-06-2009, 02:08 AM

Using PET files as SET files in bowtie

Hello - thanks for bowtie - I like it and the output is handy for me to analyse.

I have a bit of odd behavior to report that I can't understand or figure out. I have lots of little contigs (100-1000 bp) that I am aligning against and I have both SET and PET files.

When I align the SET against the short contigs, everything works great. <example command follows>

./bowtie -f shortcontigs_index lane1.fa lane1vreference.map

When I align both files for the PET data, everything works great but obviously my results are strongly biased towards those pairs which are very close together and many of the alignments are rejected because one of the pairs is sticking out into 'space'...

./bowtie -f shortcontigs_index -1 lane1_1.fa -2 lane1_2.fa lane1vreference.map

When I try to use one of the PET files as a singles file, bowtie runs for just a second, usually reporting that one of my reads is less than 2 base pairs long and then quits.

./bowtie -f shortcontigs_index lane1_1.fa lane1vreference.map

Does bowtie somehow detect that the original file is a PET file and will not let me run it by itself?

**chuck** · 06-06-2009, 02:53 AM

more on using PET as SET files in bowtie

Hi - I just stripped all of the >tags off the reads and used one of the PET pairs as a -r raw file and it works fine...

so, I guess that bowtie is detecting that the data is supposed to be PET from the >tag info?

**Ben Langmead** · 06-06-2009, 04:24 AM

Hi Chuck,

When running in unpaired mode, Bowtie doesn't try to detect whether a file is part of a pair or not. It simply treats it as a plain-old unpaired fasta file. Have you checked to see whether any of the mates really are 1-bp in that file? Are there any other peculiarities in how that file is formatted?

If neither of those are the issue, could you let me borrow that file so I can try to diagnose myself?

Thanks,
Ben

**chuck** · 06-06-2009, 02:24 PM

PET as SET

Hi Ben,

I've tried this for a number of different files and the result is always the same.

Yes, there are reads that only have a single base but in PET mode, it skips them. There is a long list of errors as it rejects short reads but it does the alignment job.

In singles mode, it seems to hit the first error and quit.

Perhaps that is the difference? How it deals with the error?

What's the best way to send them to you? I guess I could just take the first few thousand reads of each pair along with a reference? That should do it and avoid sending massive data files.

Chuck

**Ben Langmead** · 06-08-2009, 06:48 PM

Hi Chuck,

OK - so you do have 1-bp reads. That explains the error in unpaired mode. Given that, would you rather Bowtie rejected your 1-bp reads in paired-end mode (as it currently does in unpaired mode), or would you rather Bowtie accepted (but skipped) your 1-bp reads in unpaired mode? My feeling is that Bowtie should at least print a warning by default in both cases, since 1-bp reads are usually a sign that something went wrong upstream of the aligner. If there's a good reason why 1-bp reads should be tolerated, then maybe Bowtie should also provide a command-line option that suppresses the warning in cases where the user would like to tolerate it.

Ben

**-daf-** · 06-09-2009, 03:33 AM

Hello, thanks for bowtie
I've problem with downloading bowtie index for human genome from ftp://ftp.cbcb.umd.edu/pub/data/bowt...s_asm.ebwt.zip. I have no problem with smaller indexes such as g_gallus.ebwt.zip.
Is it possible to split file for downloading?

**polsum** · 06-09-2009, 10:32 AM

Originally posted by Ben Langmead View Post

For now, the way to do that is via options like -k/-a/--nostrata/-m. You can count the number of alignments from the output bowtie generates.

Bowtie aligns the entire read with a certain number of mismatches.

Bowtie's job is to find legal alignments subject to the constraints imposed by the alignment and reporting policies specified by the user (see manual for info about -k/-m/-a/--nostrata, etc). Any additional filtering you might want to perform will have to be done externally, say, in a script.

No - you'll have to do vector trimming ahead of time.

Hope that helps,
Ben

Thanks a lot for the replies.

**polsum** · 06-09-2009, 11:35 AM

hey Ben, another question. When I try to execute "/bowtie-0.9.9.3/bowtie e_coli reads/e_coli_1000.fq" in my Mac, I get a response like this: "Warning: Could not open file "reads/e_coli_1000.fq" for reading". What could be the reason for this? I downloaded "bowtie-0.9.9.3-bin-macos-10.5-i386.zip" and my mac is OSX10.5.6 with intel.

thanks in advance.

**chuck** · 06-10-2009, 01:04 AM

PET as SET

Originally posted by Ben Langmead View Post

Given that, would you rather Bowtie rejected your 1-bp reads in paired-end mode (as it currently does in unpaired mode), or would you rather Bowtie accepted (but skipped) your 1-bp reads in unpaired mode? My feeling is that Bowtie should at least print a warning by default in both cases, since 1-bp reads are usually a sign that something went wrong upstream of the aligner. If there's a good reason why 1-bp reads should be tolerated, then maybe Bowtie should also provide a command-line option that suppresses the warning in cases where the user would like to tolerate it.

Ben

Ben, thanks for the reply. I agree with you - no, there is no compelling reason that 1 bp reads should be accepted. They do not add anything to the alignment of these short reads but it would be useful if they were just skipped and a warning was printed. Currently, the alignment fails completely.

Oh, one more thing I forgot to mention, when I converted the PET files to a 'raw' format, I actually changed all of the "." in the original fa file with "N" - this might also be the reason it worked, if bowtie counts the Ns as a base, just an unknown one, but the . is a missing position.

Thanks again!

Chuck

**-daf-** · 06-10-2009, 04:03 AM

Originally posted by -daf- View Post

Hello, thanks for bowtie
I've problem with downloading bowtie index for human genome from ftp://ftp.cbcb.umd.edu/pub/data/bowt...s_asm.ebwt.zip. I have no problem with smaller indexes such as g_gallus.ebwt.zip.
Is it possible to split file for downloading?

Sorry for the inconvenience, i have achieved success with linux ftp command

**Ben Langmead** · 06-10-2009, 01:04 PM

Originally posted by -daf- View Post

Sorry for the inconvenience, i have achieved success with linux ftp command

Hi daf,

I've heard that complaint from others as well. I think that the unzip programs on some platforms (e.g Mac) cannot necessarily handle extracting > 2 GB archives. I went ahead and split the large archives into 2 each. See Bowtie page for changes.

Thanks,
Ben

**Ben Langmead** · 06-10-2009, 01:05 PM

Originally posted by polsum View Post

hey Ben, another question. When I try to execute "/bowtie-0.9.9.3/bowtie e_coli reads/e_coli_1000.fq" in my Mac, I get a response like this: "Warning: Could not open file "reads/e_coli_1000.fq" for reading". What could be the reason for this? I downloaded "bowtie-0.9.9.3-bin-macos-10.5-i386.zip" and my mac is OSX10.5.6 with intel.

thanks in advance.

Hi polsum,

Does the "reads/e_coli_1000.fq" file exist, relative to your current working directory when you issue that command?

Ben

**inesdesantiago** · 06-12-2009, 04:44 PM

Why is Bowtie Fast?

I am very impressed with Bowtie!
It is mega-ultra-fast, and runs on my [windows] laptop!

Does anyone knows why it is so fast? Comparing with Eland and MAQ which do exactly the same?
These informatic 'tricks' are everything that we need to handle such ammount of data.
I would like to apply the principles of bowtie to my own scripts, but have no idea what makes it so fast!

Any comments?
Thanks
Ines de Santiago

**Ben Langmead** · 06-12-2009, 07:27 PM

Hi Ines,

The Bowtie paper has details about the algorithm. You can find more visual discussions in the slides linked to from the Bowtie website (see Other Documentation section in the right-hand sidebar).

Thanks,
Ben

**inesdesantiago** · 06-13-2009, 07:24 AM

Bowtie BWT indexing

Thanks Ben!
I see that the BWT-based indexing of the reference genome is a great advantage. It allows Bowtie to do its searches with very small memory footprint. But does it mean that, because it uses less memory to index the reference genome, it will be faster? Is less memory == Fast Search?
Ines

Topics	Statistics	Last Post
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, Yesterday, 05:31 AM	0 responses 10 views 0 likes	Last Post by seqadmin Yesterday, 05:31 AM
Small Blood Stem Cell Subset Linked to Immune System Aging by seqadmin Started by seqadmin, 10-24-2024, 06:58 AM	0 responses 20 views 0 likes	Last Post by seqadmin 10-24-2024, 06:58 AM
New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types by seqadmin Started by seqadmin, 10-23-2024, 08:43 AM	0 responses 50 views 0 likes	Last Post by seqadmin 10-23-2024, 08:43 AM
Microbes in Urban Spaces Adapt to Disinfectants and Scarce Resources by seqadmin Started by seqadmin, 10-17-2024, 07:29 AM	0 responses 58 views 0 likes	Last Post by seqadmin 10-17-2024, 07:29 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News