Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • jweger1988
    Member
    • Apr 2017
    • 37

    Hi Brian,

    I'm using BBtools to call indels from small RNA virus genomes using both bbmap and callvariants. It's doing a great job of calling these. We are interested in the possibility of also calling inversion events from this data. I know this is somewhat common in transcriptomics data. Do these programs have this functionality?

    To be more specific, is it possible to identify reads that are aligning to both the sense and antisense of the given reference?

    Thanks in advance!

    James

    Comment

    • mcmc
      Member
      • Jan 2016
      • 14

      bbmap local + idfilter

      Brian et al., I am using the 'local' option with bbmap because I sometimes have a couple bases of adapter remaining (after bbduk mink=6) and my understanding is that 'local' will trim the ends to achieve a better score.

      My question: does the idfilter option work on the local or global alignment? I have reads with 100% ID after trimming ends but they are not passing the idfilter. I'm curious whether there is another way to filter based on the local alignment percent id?

      thanks!
      MCMC

      Comment

      • sdriscoll
        I like code
        • Sep 2009
        • 436

        mcmc- what are you setting the idfilter value to? Because reads still take a hit to their alignment score for soft-clipping. That is to say a read that's aligned with no mismatches but it was soft-clipped should have a lower score than a read that was mapped perfectly without soft-clipping. This strategy doesn't make sense when you're read is being soft-clipped due to adapter sequence at the end of the read...but for biological purposes it does make sense. Maybe a more appropriate question should be how to get rid of those "couple bases of adapter remaining" after trimming them.
        /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
        Salk Institute for Biological Studies, La Jolla, CA, USA */

        Comment

        • mcmc
          Member
          • Jan 2016
          • 14

          Originally posted by sdriscoll View Post
          mcmc- what are you setting the idfilter value to? Because reads still take a hit to their alignment score for soft-clipping. That is to say a read that's aligned with no mismatches but it was soft-clipped should have a lower score than a read that was mapped perfectly without soft-clipping. This strategy doesn't make sense when you're read is being soft-clipped due to adapter sequence at the end of the read...but for biological purposes it does make sense. Maybe a more appropriate question should be how to get rid of those "couple bases of adapter remaining" after trimming them.
          yes I know the scores may be different... but I'm simply wondering whether the idfilter and subfilter post-filtering options work on the soft-clipped alignments or the non-clipped alignments. I want to be able to filter with, say, 2 MM or 98% ID *after* the soft clipping.

          Thanks,
          MCMC

          Comment

          • kimng
            Junior Member
            • Dec 2014
            • 5

            Bug in call variants with relation to sam files created from minimap2

            Hello, I've been using different parts of the BBMap suite and mixing them with other softwares for WGS of bacterial samples (primarily for the purpose of automated QC). I'm a fan of callvariants.sh as it works quickly and doesn't require me to sort .sam files before hand and it's simple to clearfilters to return all values. Unfortunately when using callvariants.sh with .sam files created from minimap2 (https://github.com/lh3/minimap2) using illumina nextseq reads. I receive the error seen below. I was curious if this was a minor bug that could be corrected?

            I'm running callvariants.sh from bbmap v37.90 obtained from conda (https://anaconda.org/bioconda/bbmap). Appreciate any assistance given.

            -Kim

            Error:
            java -ea -Xmx48779m -Xms48779m -cp /home/ssi.ad/kimn/.conda/envs/env_kim/opt/bbmap-37.90/current/ var2.CallVariants in=aln.sam ref=contigs.fasta vcf=minimap.vcf ploidy=1 clearfilters
            Executing var2.CallVariants [in=aln.sam, ref=contigs.fasta, vcf=minimap.vcf, ploidy=1, clearfilters]

            Loading reference.
            Time: 0.304 seconds.
            Processing input files.
            Exception in thread "Thread-3" Exception in thread "Thread-4" Exception in thread "Thread-5" java.lang.AssertionError: TODO: Encountered a read with 'M' in cigar string but no MD tag.
            at stream.SamLine.toShortMatch(SamLine.java:1200)
            at stream.SamLine.toRead(SamLine.java:1972)
            at stream.SamLine.toRead(SamLine.java:1832)
            at stream.SamReadStreamer$ProcessThread.makeReads(SamReadStreamer.java:206)
            at stream.SamReadStreamer$ProcessThread.run(SamReadStreamer.java:135)
            java.lang.AssertionError: TODO: Encountered a read with 'M' in cigar string but no MD tag.
            at stream.SamLine.toShortMatch(SamLine.java:1200)
            at stream.SamLine.toRead(SamLine.java:1972)
            at stream.SamLine.toRead(SamLine.java:1832)
            at stream.SamReadStreamer$ProcessThread.makeReads(SamReadStreamer.java:206)
            at stream.SamReadStreamer$ProcessThread.run(SamReadStreamer.java:135)
            java.lang.AssertionError: TODO: Encountered a read with 'M' in cigar string but no MD tag.
            at stream.SamLine.toShortMatch(SamLine.java:1200)
            at stream.SamLine.toRead(SamLine.java:1972)
            at stream.SamLine.toRead(SamLine.java:1832)
            at stream.SamReadStreamer$ProcessThread.makeReads(SamReadStreamer.java:206)
            at stream.SamReadStreamer$ProcessThread.run(SamReadStreamer.java:135)

            Comment

            • GenoMax
              Senior Member
              • Feb 2008
              • 7142

              You can try reformatting your alignment file using "reformat.sh in=you.bam out=new.bam sam=1.4" to see if that helps.

              Comment

              • kimng
                Junior Member
                • Dec 2014
                • 5

                Appreciate the reply but unfortunately it leads to the same error:

                java -ea -Xmx200m -cp /home/ssi.ad/kimn/.conda/envs/env_kim/opt/bbmap-37.90/current/ jgi.ReformatReads in=minimap.sam out=new.bam sam=1.4
                Executing jgi.ReformatReads [in=minimap.sam, out=new.bam, sam=1.4]

                Input is being processed as unpaired
                java.lang.AssertionError: TODO: Encountered a read with 'M' in cigar string but no MD tag.
                java.lang.AssertionError: TODO: Encountered a read with 'M' in cigar string but no MD tag.
                at stream.SamLine.toShortMatch(SamLine.java:1200)
                at stream.SamLine.toRead(SamLine.java:1972)
                at stream.SamLine.toRead(SamLine.java:1832)
                at stream.SamReadInputStream.toReadList(SamReadInputStream.java:118)
                at stream.SamReadInputStream.fillBuffer(SamReadInputStream.java:89)
                at stream.SamReadInputStream.hasMore(SamReadInputStream.java:53)
                at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:664)
                at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:653)

                Comment

                • GenoMax
                  Senior Member
                  • Feb 2008
                  • 7142

                  It was worth a try.

                  Based on the `TODO` tag I see perhaps this is something @Brian intends to work on. Unfortunately he has not been participating in forums of late so there is no guarantee as to when this question may be addressed. You could try creating a ticket at BBMap SF repository for this question.

                  Comment

                  • kimng
                    Junior Member
                    • Dec 2014
                    • 5

                    Thanks for the info didn't realise that was the site for tickets so appreciate it, I'll create a ticket there.

                    Comment

                    • HESmith
                      Senior Member
                      • Oct 2009
                      • 512

                      You can try the reformat.sh command per @GenoMax but with 'sam=1.3' flag. That worked for me with a different variant caller (FreeBayes).

                      Comment

                      • kimng
                        Junior Member
                        • Dec 2014
                        • 5

                        Thanks for the input but same issue using sam=1.3, I think Geno's comment on something he intended to do is on the mark.

                        Comment

                        • darthsequencer
                          Member
                          • Feb 2012
                          • 35

                          Hi Brian - What are the right flags to set if I want all of the top scoring mappings for paired end reads?

                          Thanks

                          Comment

                          • GenoMax
                            Senior Member
                            • Feb 2008
                            • 7142

                            Have you tried ambig=all?

                            Comment

                            • darthsequencer
                              Member
                              • Feb 2012
                              • 35

                              Originally posted by GenoMax View Post
                              Have you tried ambig=all?
                              Yeah - I have but from what I can tell it maxes out at 6 reports unless maxsites is increased. I am wondering if there is a setting that avoids maxsites and reports everything above some score.

                              Comment

                              • SNPsaurus
                                Registered Vendor
                                • May 2013
                                • 525

                                Originally posted by darthsequencer View Post
                                Yeah - I have but from what I can tell it maxes out at 6 reports unless maxsites is increased. I am wondering if there is a setting that avoids maxsites and reports everything above some score.
                                You might try the align2.BBMapPacBioSkimmer mapper with ambig=all and expectedsites= some high number. Not exactly what you were asking but I get dozens of hits with that.
                                Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Pathogen Surveillance with Advanced Genomic Tools
                                  by seqadmin




                                  The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                                  03-24-2025, 11:48 AM
                                • seqadmin
                                  New Genomics Tools and Methods Shared at AGBT 2025
                                  by seqadmin


                                  This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                                  The Headliner
                                  The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                                  03-03-2025, 01:39 PM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 03-20-2025, 05:03 AM
                                0 responses
                                41 views
                                0 reactions
                                Last Post seqadmin  
                                Started by seqadmin, 03-19-2025, 07:27 AM
                                0 responses
                                50 views
                                0 reactions
                                Last Post seqadmin  
                                Started by seqadmin, 03-18-2025, 12:50 PM
                                0 responses
                                38 views
                                0 reactions
                                Last Post seqadmin  
                                Started by seqadmin, 03-03-2025, 01:15 PM
                                0 responses
                                192 views
                                0 reactions
                                Last Post seqadmin  
                                Working...