Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • jmj1091
    Junior Member
    • Sep 2009
    • 8

    Hi Ben,
    I was playing around with the new bowtie version, and I noticed that specifying the --best flag resulted in an output that had a different number of mappings than when the flag is omitted. However, in the bowtie documentation, it says that --best does not change which alignments are considered valid. I tried this with multiple sets of reads, and each time the number of alignments differed. Is this a bug or am I misunderstanding the effects of --best? The other flags I was using (in both cases) were -S, -p, --solexa1.3-quals, --al, and --un.
    Thanks!

    Comment

    • kraigrs
      Junior Member
      • Oct 2009
      • 3

      Trouble with bowtie printing SAM output

      When I attempt to align mated paired-end sequence reads and output the file
      in SAM format, I receive a segmentation fault. If I try the same thing
      without the -S/--sam option, it works fine. Here is what I am getting:

      EEB-WITT5:Bowtie wittkopp-lab$ ./bowtie -q -k 1 --sam --best
      --solexa1.3-quals dmel-all-CDS-r5.21 -1
      ./mel_sim_data/Hybrids/s_2_1_sequence.txt -2
      ./mel_sim_data/Hybrids/s_2_2_sequence.txt > s_2_sequence.sam

      Segmentation fault

      Any help in this matter would be greatly appreciated! Again, I would like
      this output to be in SAM format. I tried converting the bowtie output to
      SAM but the bowtie2sam.pl script doesn't do that.

      Comment

      • Ben Langmead
        Senior Member
        • Sep 2008
        • 200

        Originally posted by kraigrs View Post
        When I attempt to align mated paired-end sequence reads and output the file
        in SAM format, I receive a segmentation fault.
        Hi kraigrs. I'll definitely take a look. Is it possible for me to get those input files from you? I can't reproduce that with the reads and indexes that I've tried on my end.

        Thanks,
        Ben

        Comment

        • Ben Langmead
          Senior Member
          • Sep 2008
          • 200

          Originally posted by jmj1091 View Post
          I was playing around with the new bowtie version, and I noticed that specifying the --best flag resulted in an output that had a different number of mappings than when the flag is omitted.
          I'll take a look. You used --al and --un. Is the number of reads different in those files, or in the alignment output? Or both?

          Comment

          • axiom7
            Member
            • Aug 2009
            • 14

            Polonator data with gaps

            Hi Ben,
            I have some output from a Polonator. This is paired-end data with gaps. For instance, the raw data is 26 base pairs. The researcher asserts this to be 2x15mers with a gap of two nucleotides between base 7 and 8, and between 20 and 21. He also asserts that the spacing between the two 15mers is between 500 and 1500 bases. I used a perl script to insert "NN" in the two gaps, and to create two mated fasta files. Ran the following:

            bowtie -t -p 8 -v 3 -m 100 -I 500 -X 1500 -f --ff -a -1 mate1.fa -2 mate2.fa

            This seemed to run reasonably and I /think/ I am asking for alignments with 1 additional mismatch beyond the 2 gapped nucleotides.

            Problem occurs in the second set of data. The researcher asserts the 26 base pairs to have a 6 nucleotide gap, but when I attempt to run the above bowtie command (after processing the raw data with my perl script) with "-v 7" I get an error message: "-v arg must be at most 3". Am I out of luck here? Am I asking bowtie to do something for which it is not designed?
            Thank you.
            Susan

            Comment

            • sparks
              Senior Member
              • Mar 2008
              • 126

              Hi Susan,
              I think you could align this using Novoalign as the N's won't count as full mismatches, only P=0.25 of mismatch and hence penalty of 6. Building the index with a k-mer length of 7 might improve performance. If you'd like to discuss further you can contact me via email at colin at novocraft <.>com

              Colin

              Comment

              • Ben Langmead
                Senior Member
                • Sep 2008
                • 200

                Hi Susan,

                Originally posted by axiom7 View Post
                I have some output from a Polonator. This is paired-end data with gaps. For instance, the raw data is 26 base pairs. The researcher asserts this to be 2x15mers with a gap of two nucleotides between base 7 and 8, and between 20 and 21. He also asserts that the spacing between the two 15mers is between 500 and 1500 bases. I used a perl script to insert "NN" in the two gaps, and to create two mated fasta files. Ran the following:

                bowtie -t -p 8 -v 3 -m 100 -I 500 -X 1500 -f --ff -a -1 mate1.fa -2 mate2.fa

                This seemed to run reasonably and I /think/ I am asking for alignments with 1 additional mismatch beyond the 2 gapped nucleotides.
                Yes, I agree that this should work. And I agree that, because of the NNs, you are effectively asking for alignments with 1 additional mismatch.

                Problem occurs in the second set of data. The researcher asserts the 26 base pairs to have a 6 nucleotide gap, but when I attempt to run the above bowtie command (after processing the raw data with my perl script) with "-v 7" I get an error message: "-v arg must be at most 3". Am I out of luck here? Am I asking bowtie to do something for which it is not designed?
                Thank you.
                Susan
                The answer to whether you're asking bowtie to do something it was not designed to do is "yes" . But it is definitely still possible to use Bowtie. My suggestion would be to use, for instance -n 1 -l X -e Y, where -l X is set so that the "seed" falls just short of the string of Ns, and -e Y is set according to the number of Ns + the number of mismatches you would like to allow beyond the Ns. (Your input is fasta, so every mismatch incurs a quality penalty of 30. So for 6 Ns + 1 mismatch, -e 210 is appropriate.) Here is an example where I align a read of the format you describe to the human genome:

                Code:
                sycamore:~/research/bowtie $ cat tmp.fa
                >r
                CTTCGTGGGTATTNNNNNNGCGGAGCAGAGTT
                sycamore:~/research/bowtie $ ./bowtie --best -n 1 -l 13 -e 210 -f /fs/szasmg/langmead/ebwts/h_sapiens_asm tmp.fa
                r	+	gi|89161187|ref|NC_000010.9|NC_000010	135373946	CTTCGTGGGTATTNNNNNNGCGGAGCAGAGTT	IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII	5:G>T,13:G>N,14:C>N,15:G>N,16:A>N,17:A>N,18:G>N
                # reads processed: 1
                # reads with at least one reported alignment: 1 (100.00%)
                # reads that failed to align: 0 (0.00%)
                Reported 1 alignments to 1 output stream(s)
                That set of parameters is designed to effectively allow 1 mismatch beyond the mismatches forced by the Ns, as you can see in the above alignment.

                It's worth noting that if you can (eventually) get the Polonater to give you an anchor of, say, 20bp instead of 13bp, bowtie run in this mode will be substantially faster.

                I hope that's helpful; if it's still unclear, please feel free to email me.

                Thanks,
                Ben

                Comment

                • axiom7
                  Member
                  • Aug 2009
                  • 14

                  Ben and sparks,

                  Thanks for all the input. I will be working on this today and will respond back to you.

                  Susan

                  Comment

                  • para_seq
                    Member
                    • Aug 2009
                    • 12

                    Hi, Ben,

                    I got a question with using Bowtie to map Illumina transcriptom reads to a prokaryote genome. There are two copies of the identical gene encoded by '+' and '-' strands. What I don't understand is that both copies are able to be mapped with a large number of unique RNA_seq reads (value = 0 in column 7 of the bowie output) in both '+' and '-' orientations. The mapped reads to each copy have approximately 3 to 1 ratio in + and - orientations.

                    Anything I did was wrong?
                    Please help me to clarify my understanding. Thank you.

                    Comment

                    • axiom7
                      Member
                      • Aug 2009
                      • 14

                      Hi Ben,

                      I just wanted to follow up on my original question regarding using -v n, where n>3 which I posted 10/16/09. I am satisfied with your solution. In fact I find that the results of using -v 3, vs, -n 1 -l 7 -e 90 yield pretty much identical results, and so I am comfortable using -3 210 to "simulate" -v 7.

                      Thanks.
                      Susan

                      Comment

                      • Ben Langmead
                        Senior Member
                        • Sep 2008
                        • 200

                        Originally posted by para_seq View Post
                        The mapped reads to each copy have approximately 3 to 1 ratio in + and - orientations.
                        Hi para_seq,

                        The bias you see may or may not be due to alignment. Bowtie does have options that seek to remove strand bias, e.g. the --best option. If you still see the bias using --best, then the bias is probably inherent in your reads.

                        Hope that helps,
                        Ben

                        Comment

                        • Ben Langmead
                          Senior Member
                          • Sep 2008
                          • 200

                          Hi Susan,

                          Originally posted by axiom7 View Post
                          I just wanted to follow up on my original question regarding using -v n, where n>3 which I posted 10/16/09. I am satisfied with your solution. In fact I find that the results of using -v 3, vs, -n 1 -l 7 -e 90 yield pretty much identical results, and so I am comfortable using -3 210 to "simulate" -v 7.
                          I'm glad! As I say, if the Pollinator can (eventually) be made to give you a longer stretch of unambiguous bases before the NNNNN gap, then you can bump -l up accordingly and performance should improve quite a bit.

                          Thanks,
                          Ben

                          Comment

                          • amaer
                            Member
                            • Oct 2009
                            • 15

                            Hi Ben,

                            What is the update on Bowtie doing gapped alignments?

                            Thanks!

                            Comment

                            • Ben Langmead
                              Senior Member
                              • Sep 2008
                              • 200

                              Originally posted by amaer View Post
                              What is the update on Bowtie doing gapped alignments?
                              Hi amaer,

                              Perhaps by end-of-year. It's very hard to say because most of my time goes to collaborators, and they don't have predictable schedules . But by end-of-year is a reasonable guess.

                              Thanks,
                              Ben

                              Comment

                              • axiom7
                                Member
                                • Aug 2009
                                • 14

                                Originally posted by axiom7 View Post
                                Hi Ben,

                                I just wanted to follow up on my original question regarding using -v n, where n>3 which I posted 10/16/09. I am satisfied with your solution. In fact I find that the results of using -v 3, vs, -n 1 -l 7 -e 90 yield pretty much identical results, and so I am comfortable using -3 210 to "simulate" -v 7.

                                Thanks.
                                Susan
                                Sorry, I meant using -e 210 to simulate - not -3 210.

                                Susan

                                Comment

                                Latest Articles

                                Collapse

                                • SEQadmin2
                                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                  by SEQadmin2


                                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                  ...
                                  06-02-2026, 10:05 AM
                                • SEQadmin2
                                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                  by SEQadmin2


                                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                  Introduction

                                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                  05-22-2026, 06:42 AM
                                • SEQadmin2
                                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                  by SEQadmin2

                                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                  05-06-2026, 09:04 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by SEQadmin2, Today, 08:59 AM
                                0 responses
                                11 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-02-2026, 12:03 PM
                                0 responses
                                21 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-02-2026, 11:40 AM
                                0 responses
                                17 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-28-2026, 11:40 AM
                                0 responses
                                31 views
                                0 reactions
                                Last Post SEQadmin2  
                                Working...