Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Hi Ben,
    I was playing around with the new bowtie version, and I noticed that specifying the --best flag resulted in an output that had a different number of mappings than when the flag is omitted. However, in the bowtie documentation, it says that --best does not change which alignments are considered valid. I tried this with multiple sets of reads, and each time the number of alignments differed. Is this a bug or am I misunderstanding the effects of --best? The other flags I was using (in both cases) were -S, -p, --solexa1.3-quals, --al, and --un.
    Thanks!

    Comment


    • Trouble with bowtie printing SAM output

      When I attempt to align mated paired-end sequence reads and output the file
      in SAM format, I receive a segmentation fault. If I try the same thing
      without the -S/--sam option, it works fine. Here is what I am getting:

      EEB-WITT5:Bowtie wittkopp-lab$ ./bowtie -q -k 1 --sam --best
      --solexa1.3-quals dmel-all-CDS-r5.21 -1
      ./mel_sim_data/Hybrids/s_2_1_sequence.txt -2
      ./mel_sim_data/Hybrids/s_2_2_sequence.txt > s_2_sequence.sam

      Segmentation fault

      Any help in this matter would be greatly appreciated! Again, I would like
      this output to be in SAM format. I tried converting the bowtie output to
      SAM but the bowtie2sam.pl script doesn't do that.

      Comment


      • Originally posted by kraigrs View Post
        When I attempt to align mated paired-end sequence reads and output the file
        in SAM format, I receive a segmentation fault.
        Hi kraigrs. I'll definitely take a look. Is it possible for me to get those input files from you? I can't reproduce that with the reads and indexes that I've tried on my end.

        Thanks,
        Ben

        Comment


        • Originally posted by jmj1091 View Post
          I was playing around with the new bowtie version, and I noticed that specifying the --best flag resulted in an output that had a different number of mappings than when the flag is omitted.
          I'll take a look. You used --al and --un. Is the number of reads different in those files, or in the alignment output? Or both?

          Comment


          • Polonator data with gaps

            Hi Ben,
            I have some output from a Polonator. This is paired-end data with gaps. For instance, the raw data is 26 base pairs. The researcher asserts this to be 2x15mers with a gap of two nucleotides between base 7 and 8, and between 20 and 21. He also asserts that the spacing between the two 15mers is between 500 and 1500 bases. I used a perl script to insert "NN" in the two gaps, and to create two mated fasta files. Ran the following:

            bowtie -t -p 8 -v 3 -m 100 -I 500 -X 1500 -f --ff -a -1 mate1.fa -2 mate2.fa

            This seemed to run reasonably and I /think/ I am asking for alignments with 1 additional mismatch beyond the 2 gapped nucleotides.

            Problem occurs in the second set of data. The researcher asserts the 26 base pairs to have a 6 nucleotide gap, but when I attempt to run the above bowtie command (after processing the raw data with my perl script) with "-v 7" I get an error message: "-v arg must be at most 3". Am I out of luck here? Am I asking bowtie to do something for which it is not designed?
            Thank you.
            Susan

            Comment


            • Hi Susan,
              I think you could align this using Novoalign as the N's won't count as full mismatches, only P=0.25 of mismatch and hence penalty of 6. Building the index with a k-mer length of 7 might improve performance. If you'd like to discuss further you can contact me via email at colin at novocraft <.>com

              Colin

              Comment


              • Hi Susan,

                Originally posted by axiom7 View Post
                I have some output from a Polonator. This is paired-end data with gaps. For instance, the raw data is 26 base pairs. The researcher asserts this to be 2x15mers with a gap of two nucleotides between base 7 and 8, and between 20 and 21. He also asserts that the spacing between the two 15mers is between 500 and 1500 bases. I used a perl script to insert "NN" in the two gaps, and to create two mated fasta files. Ran the following:

                bowtie -t -p 8 -v 3 -m 100 -I 500 -X 1500 -f --ff -a -1 mate1.fa -2 mate2.fa

                This seemed to run reasonably and I /think/ I am asking for alignments with 1 additional mismatch beyond the 2 gapped nucleotides.
                Yes, I agree that this should work. And I agree that, because of the NNs, you are effectively asking for alignments with 1 additional mismatch.

                Problem occurs in the second set of data. The researcher asserts the 26 base pairs to have a 6 nucleotide gap, but when I attempt to run the above bowtie command (after processing the raw data with my perl script) with "-v 7" I get an error message: "-v arg must be at most 3". Am I out of luck here? Am I asking bowtie to do something for which it is not designed?
                Thank you.
                Susan
                The answer to whether you're asking bowtie to do something it was not designed to do is "yes" . But it is definitely still possible to use Bowtie. My suggestion would be to use, for instance -n 1 -l X -e Y, where -l X is set so that the "seed" falls just short of the string of Ns, and -e Y is set according to the number of Ns + the number of mismatches you would like to allow beyond the Ns. (Your input is fasta, so every mismatch incurs a quality penalty of 30. So for 6 Ns + 1 mismatch, -e 210 is appropriate.) Here is an example where I align a read of the format you describe to the human genome:

                Code:
                sycamore:~/research/bowtie $ cat tmp.fa
                >r
                CTTCGTGGGTATTNNNNNNGCGGAGCAGAGTT
                sycamore:~/research/bowtie $ ./bowtie --best -n 1 -l 13 -e 210 -f /fs/szasmg/langmead/ebwts/h_sapiens_asm tmp.fa
                r	+	gi|89161187|ref|NC_000010.9|NC_000010	135373946	CTTCGTGGGTATTNNNNNNGCGGAGCAGAGTT	IIIIIIIIIIIIIIIIIIIIIIIIIIIIIIII	5:G>T,13:G>N,14:C>N,15:G>N,16:A>N,17:A>N,18:G>N
                # reads processed: 1
                # reads with at least one reported alignment: 1 (100.00%)
                # reads that failed to align: 0 (0.00%)
                Reported 1 alignments to 1 output stream(s)
                That set of parameters is designed to effectively allow 1 mismatch beyond the mismatches forced by the Ns, as you can see in the above alignment.

                It's worth noting that if you can (eventually) get the Polonater to give you an anchor of, say, 20bp instead of 13bp, bowtie run in this mode will be substantially faster.

                I hope that's helpful; if it's still unclear, please feel free to email me.

                Thanks,
                Ben

                Comment


                • Ben and sparks,

                  Thanks for all the input. I will be working on this today and will respond back to you.

                  Susan

                  Comment


                  • Hi, Ben,

                    I got a question with using Bowtie to map Illumina transcriptom reads to a prokaryote genome. There are two copies of the identical gene encoded by '+' and '-' strands. What I don't understand is that both copies are able to be mapped with a large number of unique RNA_seq reads (value = 0 in column 7 of the bowie output) in both '+' and '-' orientations. The mapped reads to each copy have approximately 3 to 1 ratio in + and - orientations.

                    Anything I did was wrong?
                    Please help me to clarify my understanding. Thank you.

                    Comment


                    • Hi Ben,

                      I just wanted to follow up on my original question regarding using -v n, where n>3 which I posted 10/16/09. I am satisfied with your solution. In fact I find that the results of using -v 3, vs, -n 1 -l 7 -e 90 yield pretty much identical results, and so I am comfortable using -3 210 to "simulate" -v 7.

                      Thanks.
                      Susan

                      Comment


                      • Originally posted by para_seq View Post
                        The mapped reads to each copy have approximately 3 to 1 ratio in + and - orientations.
                        Hi para_seq,

                        The bias you see may or may not be due to alignment. Bowtie does have options that seek to remove strand bias, e.g. the --best option. If you still see the bias using --best, then the bias is probably inherent in your reads.

                        Hope that helps,
                        Ben

                        Comment


                        • Hi Susan,

                          Originally posted by axiom7 View Post
                          I just wanted to follow up on my original question regarding using -v n, where n>3 which I posted 10/16/09. I am satisfied with your solution. In fact I find that the results of using -v 3, vs, -n 1 -l 7 -e 90 yield pretty much identical results, and so I am comfortable using -3 210 to "simulate" -v 7.
                          I'm glad! As I say, if the Pollinator can (eventually) be made to give you a longer stretch of unambiguous bases before the NNNNN gap, then you can bump -l up accordingly and performance should improve quite a bit.

                          Thanks,
                          Ben

                          Comment


                          • Hi Ben,

                            What is the update on Bowtie doing gapped alignments?

                            Thanks!

                            Comment


                            • Originally posted by amaer View Post
                              What is the update on Bowtie doing gapped alignments?
                              Hi amaer,

                              Perhaps by end-of-year. It's very hard to say because most of my time goes to collaborators, and they don't have predictable schedules . But by end-of-year is a reasonable guess.

                              Thanks,
                              Ben

                              Comment


                              • Originally posted by axiom7 View Post
                                Hi Ben,

                                I just wanted to follow up on my original question regarding using -v n, where n>3 which I posted 10/16/09. I am satisfied with your solution. In fact I find that the results of using -v 3, vs, -n 1 -l 7 -e 90 yield pretty much identical results, and so I am comfortable using -3 210 to "simulate" -v 7.

                                Thanks.
                                Susan
                                Sorry, I meant using -e 210 to simulate - not -3 210.

                                Susan

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Choosing Between NGS and qPCR
                                  by seqadmin



                                  Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                                  10-18-2024, 07:11 AM
                                • seqadmin
                                  Non-Coding RNA Research and Technologies
                                  by seqadmin




                                  Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                                  Nobel Prize for MicroRNA Discovery
                                  This week,...
                                  10-07-2024, 08:07 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 05:31 AM
                                0 responses
                                10 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-24-2024, 06:58 AM
                                0 responses
                                20 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-23-2024, 08:43 AM
                                0 responses
                                50 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-17-2024, 07:29 AM
                                0 responses
                                58 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X