Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Hi dara,

    Yes, you have to build separate index files and query them separately. You'll have to synthesize the per-index results into an overall set of results, e.g., with some scripts. Bowtie doesn't currently know how to query multiple indexes as part of a single alignment run.

    Thanks,
    Ben

    Comment


    • Now that paired-end is substantially done, we'll be embarking on gapped alignment soon. I'll probably start on that in June. Hopefully by the end of the summer you'll see at least initial gapped-alignment support. That's a guess though .

      Thanks,
      Ben

      Originally posted by dara View Post
      Also another question for you:

      Any updates on plans for bowtie supporting gapped alignment?

      thanks

      Comment


      • Hello Ben,

        Thank you for your quick response. However, I'm a little puzzled because I was looking at the script that comes along with genome index on the Bowtie website (make_h_sapiens_asm.sh) and it seems to build just one index by providing all the chunks to the bowtie-build executable at once. Here's the line I'm talking about:

        INPUTS=hs_ref_chr1.fa,hs_ref_chr2.fa,hs_ref_chr3.fa,hs_ref_chr4.fa,hs_ref_chr5.fa,hs_ref_chr6.fa,hs_ref_chr7.fa,hs_ref_chr8.fa,hs_ref_chr9.fa,hs_ref_chr10.fa,hs_ref_chr11.fa,hs_ref_chr12.fa,hs_ref_chr13.fa,hs_ref_chr14.fa,hs_ref_chr15.fa,hs_ref_chr16.fa,hs_ref_chr17.fa,hs_ref_chr18.fa,hs_ref_chr19.fa,hs_ref_chr20.fa,hs_ref_chr21.fa,hs_ref_chr22.fa,hs_ref_chrMT.fa,hs_ref_chrX.fa,hs_ref_chrY.fa

        ${BOWTIE_BUILD_EXE} ${INPUTS} h_sapiens_asm

        I was trying the same thing- providing individual chromosome splits to the indexer and it complained.

        Thanks again

        Comment


        • Hi dara,

          It complained that the total sequence length of all the reference strings was too big to fit in a single index, right? I didn't mean to imply that you can't feed multiple fasta files to bowtie-build; you certainly can. But if the total total length of all the sequence you're supplying is too big, you'll have to break the input up into chunks somehow and build separate indexes for each chunk. You might try feeding the fasta files in smaller bundles, or you might redistribute sequences throughout the fasta files, or both. If you've got chromosomes, you probably just want to try bundling together as many chromosome fasta files as you can get away with in a single invocation of bowtie-build.

          Does that make sense?

          Thanks,
          Ben

          Comment


          • yes that makes sense. Thank you

            Comment


            • This has probably been answered already, so apologies in advance.

              Does anyone know if Bowtie by default filters the input on the basis of quality? I'm getting a strange result. When I perfectly sample random 32mers from the mouse genome, and then align them back to the same genome, most aligners align ~83% uniquely. However, Bowtie is only aligning ~77%.

              Where are the missing reads going? It can't be mismatch qualities, since there are no mismatches in the sampled 'reads'. These are the options I'm using:

              ./bowtie -q --solexa-quals -m 2 --best -p 2

              Comment


              • Hi Shaun,

                No, Bowtie does not filter on the basis of quality by default. Can you pick an example 32-mer that you think should align but that doesn't? There are a few possibilities for why it's happening.

                THanks,
                Ben

                Comment


                • Hi Ben,
                  Here's one, but I can send you a whole file if you like:

                  >Test:chr5:15656372:15656404
                  CTGAGCAAGGGGACCCCAATGGAAAAGTTAGG

                  This is aligned uniquely (and correctly) by most aligners, but is not aligned by Bowtie with the above arguments. I just noticed that when I remove the "-m 2" option, this read is aligned uniquely. This is counter-intuitive.

                  What arguments do you recommend if I just want to report the unique alignments? I have been using -m 2.

                  Comment


                  • I want to ask how I can obtain same alignment file from Bowtie. That is equivalent to ELAND options. I tried -v 2 -l 32 which means maximum 2 mismatch in the first 32 seed which are the default parameters in ELAND and I am still getting more alignment reads in Bowtie by 20 %

                    Comment


                    • I need some splaining (prolly cuz I don't understand all the inside baseball terms).

                      Does bowtie detect short insertions? 1bp? 2bp? what limits are there?

                      Does bowtie detect short deletions? 1bp deletion, 2bp? etc.?

                      thanks.

                      Comment


                      • Re: insertions and deletions: no support yet. It's on the TODO list. We'll probably tackle it this summer.

                        Thanks,
                        Ben

                        Comment


                        • I'm new to next-gen sequencing and have started playing around with different alignment tools. I have used Bowtie and it is works very fine. I have used M musculus pre-index database. I have a quick question about the output file results particularly the chromosome location. As, you used the NCBI genome database; we have gi accession (gi|149233633|ref|NT_039169.7|Mm1_39209_37) for the chromosome location instead of the chromosome number (chr1...). Do you have any tools or option to make the mapping between them or I have to write my own tools.

                          Thanks

                          Arnaud.

                          Comment


                          • No, we have not written such a converter. If you write one and think that others may benefit from it (and don't mind sharing it), perhaps we can include it in a future release.

                            Thanks,
                            Ben

                            Comment


                            • Hi, In bowtie output,

                              1. Is there a way to know how many times a particular sequence is mapped to the reference genome?

                              2. How do I specify the minimum length of matching? FOr example I want only >20 nt mapping of input sequences to the reference genome. Is there a way to specify that number?

                              3. How do I control the quality of mapping? For example, How do I eliminate a match of a sequence such as TAAAAAAAAAAAAAAAAAAAAGC to the reference genome?, because it is not a specific match.

                              4. Finally, IS there a way to trim solexa adapters from the input sequences?

                              I am newbie in this field , so please pardon if the questions seem stupid.

                              Many thanks in advance.

                              Comment


                              • Originally posted by polsum View Post
                                1. Is there a way to know how many times a particular sequence is mapped to the reference genome?
                                For now, the way to do that is via options like -k/-a/--nostrata/-m. You can count the number of alignments from the output bowtie generates.

                                Originally posted by polsum View Post
                                2. How do I specify the minimum length of matching? FOr example I want only >20 nt mapping of input sequences to the reference genome. Is there a way to specify that number?
                                Bowtie aligns the entire read with a certain number of mismatches.

                                Originally posted by polsum View Post
                                3. How do I control the quality of mapping? For example, How do I eliminate a match of a sequence such as TAAAAAAAAAAAAAAAAAAAAGC to the reference genome?, because it is not a specific match.
                                Bowtie's job is to find legal alignments subject to the constraints imposed by the alignment and reporting policies specified by the user (see manual for info about -k/-m/-a/--nostrata, etc). Any additional filtering you might want to perform will have to be done externally, say, in a script.

                                Originally posted by polsum View Post
                                4. Finally, IS there a way to trim solexa adapters from the input sequences?
                                No - you'll have to do vector trimming ahead of time.

                                Hope that helps,
                                Ben

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Essential Discoveries and Tools in Epitranscriptomics
                                  by seqadmin




                                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                  04-22-2024, 07:01 AM
                                • seqadmin
                                  Current Approaches to Protein Sequencing
                                  by seqadmin


                                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                  04-04-2024, 04:25 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 04-25-2024, 11:49 AM
                                0 responses
                                19 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-24-2024, 08:47 AM
                                0 responses
                                17 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-11-2024, 12:08 PM
                                0 responses
                                62 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 04-10-2024, 10:19 PM
                                0 responses
                                60 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X