Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Brian Bushnell
    Super Moderator
    • Jan 2014
    • 2709

    Oh, that's kind of irritating. BBMap as currently structured has a maximum reference sequence length of 500Mbp. I designed it that way because I was unaware of any chromosomes longer than that, and I believed the reason to be that 500Mbp was above the maximum stable length of an individual chromosome... looks like I may have been wrong!

    I'll have to think about how to resolve this; there's no simple setting for it. Thanks for bringing it to my attention.

    Comment

    • parulagwl
      Junior Member
      • Jul 2014
      • 6

      Thank you Brian for the quick response.
      We would really appreciate your thoughts/inputs on how to work around our issue.

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        Purely speculating. Don't know where the centromere is in this chromosome but you could split it in a region where there are long stretches of N's (and the pieces remain smaller than 500 mb) that way chances of reads needing to map across this break would be small.

        Comment

        • Thias
          Member
          • Mar 2013
          • 45

          Just because I sometimes stumble over that issue in tutorials (which don't seem to bother) and also saw it again in the recent question....

          I once was thaugt (and got a deduction of points in a test for not knowing it) that using even k-mer sizes is frowned upon. The comprehensible rationale behind is, that only odd k-mer sizes ensure a kmer can never be its own reverse complement in the de Bruijn Graph. Such ambiguity created by palindromic k-mers in the de Bruijn graph supposedly make its resolution difficult.

          So to settle that question once and for good: Does it really have an impact on mapping efficiency, if I chose an even or its neighboring odd k-mer?

          Comment

          • Brian Bushnell
            Super Moderator
            • Jan 2014
            • 2709

            No. The longer the kmer, the greater the speed (and memory consumption); even versus odd is not important.

            Additionally, I don't see that even-length kmers cause problems in assembly, either. Genomic palindromes of kmer length or longer cause problems whether you are using an even or odd kmer length. These palindromes always have an even length, but - say you have a genomic palindrome of length 22. Using K=22, you will not (trivially) be able to resolve it. Nor will you with K=21. You will with K=23, and you will with K=24. It's not clear to me in this situation why K=23 would be preferable of K=24 with regards to palindromes, but K=24 can resolve longer repeats than K=23.

            Comment

            • HESmith
              Senior Member
              • Oct 2009
              • 512

              Actually, an odd k-mer ensures that the strand orientation can be determined, since the central nucleotide cannot be identical due to complementarity (an even k-mer can be a perfect palindrome in both orientations).

              But the point about longer k-mers is spot-on.

              Comment

              • Thias
                Member
                • Mar 2013
                • 45

                Thanks a lot for your answers! Your exemplified replies were really helpful for some more insight.

                Comment

                • darthsequencer
                  Member
                  • Feb 2012
                  • 35

                  Hi I have a couple questions on the terminology used for retaining ambiguous sites using bbmap.

                  If "ambiguous=best" this means that if there are a bunch of reads all the with the sam score only the first match will be retained? Or does it mean that of all the reads mapping above a score cutoff the first one will be picked?

                  Along the same lines - for "ambiguous=all" does this mean that if say 5 locations all share the same highest score that they will be reported or does it mean that all locations above the score cutoff will be retained?

                  Comment

                  • Brian Bushnell
                    Super Moderator
                    • Jan 2014
                    • 2709

                    "ambiguous=best" is a bit misleading, but it means the genomically first location with a maxmimum score will be used. "ambiguous=all" will report all locations within the ambiguity threshold of the first. This does not mean they need exactly the same score; it means that they are very close, so much so that none can be confidently determined to be the correct mapping location. Normally they're identical, but if for example one mapping had a single 1bp deletion and another mapping had two 1bp substitutions, the scores would be different, but would be close enough to be both reported. But if there was a third potential mapping with, say, 5 substitutions, that would be excluded. This can be controlled with the "secondarysitescoreratio" flag; if you set it to 1.0, only mappings with identical scores to the best score will be reported.

                    Comment

                    • lankage
                      Member
                      • Oct 2014
                      • 20

                      Hi, Brian

                      We recently increased our PacBio amplicon size from ~1100 to 3kb. With the smaller amplicon size we were able to map reads to our allele reference sequence library of non-full length allele sequences using "semiperfectmode" to allow for soft-clipping. Im now looking to map ~3kb read sequences obtained from gDNA sequencing to exon reference sequences of ~270 bases a piece and not able to tune the settings to get any mapping results. Is there a way to tune mapPacBio.sh to get hits for regions within long reads to short exon sequences that perfectly match?

                      Comment

                      • susanklein
                        Senior Member
                        • Feb 2014
                        • 116

                        Hi,

                        couldn't you just do it the other way around, Have the pacbio as ref and may your short refs to it?

                        Although I don't understand why you refs are so short.


                        S.

                        Comment

                        • Brian Bushnell
                          Super Moderator
                          • Jan 2014
                          • 2709

                          I agree with Susan. BBMap is a global aligner, and not really designed to map reads to substantially shorter reference sequences. But you could try with the flags "minid=0 local", which might work. Note that "semiperfectmode" will not allow a single mismatch or indel, so it's really only useful in special situations; "local" is more appropriate in this situation.

                          Comment

                          • GenoMax
                            Senior Member
                            • Feb 2008
                            • 7142

                            @lankage: You don't have to align to the short amplicon regions. You could align to the genome (and find out if you have any non-specific amplification along the way).

                            Comment

                            • GenoMax
                              Senior Member
                              • Feb 2008
                              • 7142

                              @moistplus: If you were to use bbmap.sh to do the alignments then you would get that information in the alignment report along with the bam file (as long as you have samtools available in $PATH).

                              Comment

                              • Shini Sunagawa
                                Junior Member
                                • Jan 2016
                                • 8

                                Hi Brian,

                                Since I saw increased activity lately again, I was wondering if you might have thought about the issue we discussed back in January (~post #300). It was about dedupe not writing out exact matched and contained sequence identifiers.

                                As mentioned before, solving this would make this tool very competitive to existing ones, due to the immense speed-up.

                                Thanks for your consideration!

                                Best wishes,
                                Shini

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Pathogen Surveillance with Advanced Genomic Tools
                                  by seqadmin




                                  The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                                  03-24-2025, 11:48 AM
                                • seqadmin
                                  New Genomics Tools and Methods Shared at AGBT 2025
                                  by seqadmin


                                  This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                                  The Headliner
                                  The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                                  03-03-2025, 01:39 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 03-20-2025, 05:03 AM
                                0 responses
                                49 views
                                0 reactions
                                Last Post seqadmin  
                                Started by seqadmin, 03-19-2025, 07:27 AM
                                0 responses
                                57 views
                                0 reactions
                                Last Post seqadmin  
                                Started by seqadmin, 03-18-2025, 12:50 PM
                                0 responses
                                50 views
                                0 reactions
                                Last Post seqadmin  
                                Started by seqadmin, 03-03-2025, 01:15 PM
                                0 responses
                                201 views
                                0 reactions
                                Last Post seqadmin  
                                Working...