Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • chuck
    Member
    • Apr 2009
    • 13

    Using PET files as SET files in bowtie

    Hello - thanks for bowtie - I like it and the output is handy for me to analyse.

    I have a bit of odd behavior to report that I can't understand or figure out. I have lots of little contigs (100-1000 bp) that I am aligning against and I have both SET and PET files.

    When I align the SET against the short contigs, everything works great. <example command follows>

    ./bowtie -f shortcontigs_index lane1.fa lane1vreference.map

    When I align both files for the PET data, everything works great but obviously my results are strongly biased towards those pairs which are very close together and many of the alignments are rejected because one of the pairs is sticking out into 'space'...

    ./bowtie -f shortcontigs_index -1 lane1_1.fa -2 lane1_2.fa lane1vreference.map

    When I try to use one of the PET files as a singles file, bowtie runs for just a second, usually reporting that one of my reads is less than 2 base pairs long and then quits.

    ./bowtie -f shortcontigs_index lane1_1.fa lane1vreference.map

    Does bowtie somehow detect that the original file is a PET file and will not let me run it by itself?

    Comment

    • chuck
      Member
      • Apr 2009
      • 13

      more on using PET as SET files in bowtie

      Hi - I just stripped all of the >tags off the reads and used one of the PET pairs as a -r raw file and it works fine...

      so, I guess that bowtie is detecting that the data is supposed to be PET from the >tag info?

      Comment

      • Ben Langmead
        Senior Member
        • Sep 2008
        • 200

        Hi Chuck,

        When running in unpaired mode, Bowtie doesn't try to detect whether a file is part of a pair or not. It simply treats it as a plain-old unpaired fasta file. Have you checked to see whether any of the mates really are 1-bp in that file? Are there any other peculiarities in how that file is formatted?

        If neither of those are the issue, could you let me borrow that file so I can try to diagnose myself?

        Thanks,
        Ben

        Comment

        • chuck
          Member
          • Apr 2009
          • 13

          PET as SET

          Hi Ben,

          I've tried this for a number of different files and the result is always the same.

          Yes, there are reads that only have a single base but in PET mode, it skips them. There is a long list of errors as it rejects short reads but it does the alignment job.

          In singles mode, it seems to hit the first error and quit.

          Perhaps that is the difference? How it deals with the error?

          What's the best way to send them to you? I guess I could just take the first few thousand reads of each pair along with a reference? That should do it and avoid sending massive data files.

          Chuck

          Comment

          • Ben Langmead
            Senior Member
            • Sep 2008
            • 200

            Hi Chuck,

            OK - so you do have 1-bp reads. That explains the error in unpaired mode. Given that, would you rather Bowtie rejected your 1-bp reads in paired-end mode (as it currently does in unpaired mode), or would you rather Bowtie accepted (but skipped) your 1-bp reads in unpaired mode? My feeling is that Bowtie should at least print a warning by default in both cases, since 1-bp reads are usually a sign that something went wrong upstream of the aligner. If there's a good reason why 1-bp reads should be tolerated, then maybe Bowtie should also provide a command-line option that suppresses the warning in cases where the user would like to tolerate it.

            Ben

            Comment

            • -daf-
              Junior Member
              • Feb 2009
              • 6

              Hello, thanks for bowtie
              I've problem with downloading bowtie index for human genome from ftp://ftp.cbcb.umd.edu/pub/data/bowt...s_asm.ebwt.zip. I have no problem with smaller indexes such as g_gallus.ebwt.zip.
              Is it possible to split file for downloading?

              Comment

              • polsum
                Member
                • May 2009
                • 32

                Originally posted by Ben Langmead View Post
                For now, the way to do that is via options like -k/-a/--nostrata/-m. You can count the number of alignments from the output bowtie generates.



                Bowtie aligns the entire read with a certain number of mismatches.



                Bowtie's job is to find legal alignments subject to the constraints imposed by the alignment and reporting policies specified by the user (see manual for info about -k/-m/-a/--nostrata, etc). Any additional filtering you might want to perform will have to be done externally, say, in a script.



                No - you'll have to do vector trimming ahead of time.

                Hope that helps,
                Ben
                Thanks a lot for the replies.

                Comment

                • polsum
                  Member
                  • May 2009
                  • 32

                  hey Ben, another question. When I try to execute "/bowtie-0.9.9.3/bowtie e_coli reads/e_coli_1000.fq" in my Mac, I get a response like this: "Warning: Could not open file "reads/e_coli_1000.fq" for reading". What could be the reason for this? I downloaded "bowtie-0.9.9.3-bin-macos-10.5-i386.zip" and my mac is OSX10.5.6 with intel.

                  thanks in advance.

                  Comment

                  • chuck
                    Member
                    • Apr 2009
                    • 13

                    PET as SET

                    Originally posted by Ben Langmead View Post

                    Given that, would you rather Bowtie rejected your 1-bp reads in paired-end mode (as it currently does in unpaired mode), or would you rather Bowtie accepted (but skipped) your 1-bp reads in unpaired mode? My feeling is that Bowtie should at least print a warning by default in both cases, since 1-bp reads are usually a sign that something went wrong upstream of the aligner. If there's a good reason why 1-bp reads should be tolerated, then maybe Bowtie should also provide a command-line option that suppresses the warning in cases where the user would like to tolerate it.

                    Ben
                    Ben, thanks for the reply. I agree with you - no, there is no compelling reason that 1 bp reads should be accepted. They do not add anything to the alignment of these short reads but it would be useful if they were just skipped and a warning was printed. Currently, the alignment fails completely.

                    Oh, one more thing I forgot to mention, when I converted the PET files to a 'raw' format, I actually changed all of the "." in the original fa file with "N" - this might also be the reason it worked, if bowtie counts the Ns as a base, just an unknown one, but the . is a missing position.

                    Thanks again!

                    Chuck

                    Comment

                    • -daf-
                      Junior Member
                      • Feb 2009
                      • 6

                      Originally posted by -daf- View Post
                      Hello, thanks for bowtie
                      I've problem with downloading bowtie index for human genome from ftp://ftp.cbcb.umd.edu/pub/data/bowt...s_asm.ebwt.zip. I have no problem with smaller indexes such as g_gallus.ebwt.zip.
                      Is it possible to split file for downloading?
                      Sorry for the inconvenience, i have achieved success with linux ftp command

                      Comment

                      • Ben Langmead
                        Senior Member
                        • Sep 2008
                        • 200

                        Originally posted by -daf- View Post
                        Sorry for the inconvenience, i have achieved success with linux ftp command
                        Hi daf,

                        I've heard that complaint from others as well. I think that the unzip programs on some platforms (e.g Mac) cannot necessarily handle extracting > 2 GB archives. I went ahead and split the large archives into 2 each. See Bowtie page for changes.

                        Thanks,
                        Ben

                        Comment

                        • Ben Langmead
                          Senior Member
                          • Sep 2008
                          • 200

                          Originally posted by polsum View Post
                          hey Ben, another question. When I try to execute "/bowtie-0.9.9.3/bowtie e_coli reads/e_coli_1000.fq" in my Mac, I get a response like this: "Warning: Could not open file "reads/e_coli_1000.fq" for reading". What could be the reason for this? I downloaded "bowtie-0.9.9.3-bin-macos-10.5-i386.zip" and my mac is OSX10.5.6 with intel.

                          thanks in advance.
                          Hi polsum,

                          Does the "reads/e_coli_1000.fq" file exist, relative to your current working directory when you issue that command?

                          Ben

                          Comment

                          • inesdesantiago
                            Member
                            • Jan 2009
                            • 44

                            Why is Bowtie Fast?

                            I am very impressed with Bowtie!
                            It is mega-ultra-fast, and runs on my [windows] laptop!

                            Does anyone knows why it is so fast? Comparing with Eland and MAQ which do exactly the same?
                            These informatic 'tricks' are everything that we need to handle such ammount of data.
                            I would like to apply the principles of bowtie to my own scripts, but have no idea what makes it so fast!

                            Any comments?
                            Thanks
                            Ines de Santiago
                            Last edited by inesdesantiago; 06-12-2009, 04:46 PM. Reason: typo

                            Comment

                            • Ben Langmead
                              Senior Member
                              • Sep 2008
                              • 200

                              Hi Ines,

                              The Bowtie paper has details about the algorithm. You can find more visual discussions in the slides linked to from the Bowtie website (see Other Documentation section in the right-hand sidebar).

                              Thanks,
                              Ben

                              Comment

                              • inesdesantiago
                                Member
                                • Jan 2009
                                • 44

                                Bowtie BWT indexing

                                Thanks Ben!
                                I see that the BWT-based indexing of the reference genome is a great advantage. It allows Bowtie to do its searches with very small memory footprint. But does it mean that, because it uses less memory to index the reference genome, it will be faster? Is less memory == Fast Search?
                                Ines
                                Last edited by inesdesantiago; 06-13-2009, 07:26 AM. Reason: typo

                                Comment

                                Latest Articles

                                Collapse

                                • SEQadmin2
                                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                  by SEQadmin2


                                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                  ...
                                  06-02-2026, 10:05 AM
                                • SEQadmin2
                                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                  by SEQadmin2


                                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                  Introduction

                                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                  05-22-2026, 06:42 AM
                                • SEQadmin2
                                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                  by SEQadmin2

                                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                  05-06-2026, 09:04 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by SEQadmin2, 06-02-2026, 12:03 PM
                                0 responses
                                19 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-02-2026, 11:40 AM
                                0 responses
                                14 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-28-2026, 11:40 AM
                                0 responses
                                29 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-26-2026, 10:12 AM
                                0 responses
                                31 views
                                0 reactions
                                Last Post SEQadmin2  
                                Working...