Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Hi Brian,

    I'm using BBtools to call indels from small RNA virus genomes using both bbmap and callvariants. It's doing a great job of calling these. We are interested in the possibility of also calling inversion events from this data. I know this is somewhat common in transcriptomics data. Do these programs have this functionality?

    To be more specific, is it possible to identify reads that are aligning to both the sense and antisense of the given reference?

    Thanks in advance!

    James

    Comment


    • bbmap local + idfilter

      Brian et al., I am using the 'local' option with bbmap because I sometimes have a couple bases of adapter remaining (after bbduk mink=6) and my understanding is that 'local' will trim the ends to achieve a better score.

      My question: does the idfilter option work on the local or global alignment? I have reads with 100% ID after trimming ends but they are not passing the idfilter. I'm curious whether there is another way to filter based on the local alignment percent id?

      thanks!
      MCMC

      Comment


      • mcmc- what are you setting the idfilter value to? Because reads still take a hit to their alignment score for soft-clipping. That is to say a read that's aligned with no mismatches but it was soft-clipped should have a lower score than a read that was mapped perfectly without soft-clipping. This strategy doesn't make sense when you're read is being soft-clipped due to adapter sequence at the end of the read...but for biological purposes it does make sense. Maybe a more appropriate question should be how to get rid of those "couple bases of adapter remaining" after trimming them.
        /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
        Salk Institute for Biological Studies, La Jolla, CA, USA */

        Comment


        • Originally posted by sdriscoll View Post
          mcmc- what are you setting the idfilter value to? Because reads still take a hit to their alignment score for soft-clipping. That is to say a read that's aligned with no mismatches but it was soft-clipped should have a lower score than a read that was mapped perfectly without soft-clipping. This strategy doesn't make sense when you're read is being soft-clipped due to adapter sequence at the end of the read...but for biological purposes it does make sense. Maybe a more appropriate question should be how to get rid of those "couple bases of adapter remaining" after trimming them.
          yes I know the scores may be different... but I'm simply wondering whether the idfilter and subfilter post-filtering options work on the soft-clipped alignments or the non-clipped alignments. I want to be able to filter with, say, 2 MM or 98% ID *after* the soft clipping.

          Thanks,
          MCMC

          Comment


          • Bug in call variants with relation to sam files created from minimap2

            Hello, I've been using different parts of the BBMap suite and mixing them with other softwares for WGS of bacterial samples (primarily for the purpose of automated QC). I'm a fan of callvariants.sh as it works quickly and doesn't require me to sort .sam files before hand and it's simple to clearfilters to return all values. Unfortunately when using callvariants.sh with .sam files created from minimap2 (https://github.com/lh3/minimap2) using illumina nextseq reads. I receive the error seen below. I was curious if this was a minor bug that could be corrected?

            I'm running callvariants.sh from bbmap v37.90 obtained from conda (https://anaconda.org/bioconda/bbmap). Appreciate any assistance given.

            -Kim

            Error:
            java -ea -Xmx48779m -Xms48779m -cp /home/ssi.ad/kimn/.conda/envs/env_kim/opt/bbmap-37.90/current/ var2.CallVariants in=aln.sam ref=contigs.fasta vcf=minimap.vcf ploidy=1 clearfilters
            Executing var2.CallVariants [in=aln.sam, ref=contigs.fasta, vcf=minimap.vcf, ploidy=1, clearfilters]

            Loading reference.
            Time: 0.304 seconds.
            Processing input files.
            Exception in thread "Thread-3" Exception in thread "Thread-4" Exception in thread "Thread-5" java.lang.AssertionError: TODO: Encountered a read with 'M' in cigar string but no MD tag.
            at stream.SamLine.toShortMatch(SamLine.java:1200)
            at stream.SamLine.toRead(SamLine.java:1972)
            at stream.SamLine.toRead(SamLine.java:1832)
            at stream.SamReadStreamer$ProcessThread.makeReads(SamReadStreamer.java:206)
            at stream.SamReadStreamer$ProcessThread.run(SamReadStreamer.java:135)
            java.lang.AssertionError: TODO: Encountered a read with 'M' in cigar string but no MD tag.
            at stream.SamLine.toShortMatch(SamLine.java:1200)
            at stream.SamLine.toRead(SamLine.java:1972)
            at stream.SamLine.toRead(SamLine.java:1832)
            at stream.SamReadStreamer$ProcessThread.makeReads(SamReadStreamer.java:206)
            at stream.SamReadStreamer$ProcessThread.run(SamReadStreamer.java:135)
            java.lang.AssertionError: TODO: Encountered a read with 'M' in cigar string but no MD tag.
            at stream.SamLine.toShortMatch(SamLine.java:1200)
            at stream.SamLine.toRead(SamLine.java:1972)
            at stream.SamLine.toRead(SamLine.java:1832)
            at stream.SamReadStreamer$ProcessThread.makeReads(SamReadStreamer.java:206)
            at stream.SamReadStreamer$ProcessThread.run(SamReadStreamer.java:135)

            Comment


            • You can try reformatting your alignment file using "reformat.sh in=you.bam out=new.bam sam=1.4" to see if that helps.

              Comment


              • Appreciate the reply but unfortunately it leads to the same error:

                java -ea -Xmx200m -cp /home/ssi.ad/kimn/.conda/envs/env_kim/opt/bbmap-37.90/current/ jgi.ReformatReads in=minimap.sam out=new.bam sam=1.4
                Executing jgi.ReformatReads [in=minimap.sam, out=new.bam, sam=1.4]

                Input is being processed as unpaired
                java.lang.AssertionError: TODO: Encountered a read with 'M' in cigar string but no MD tag.
                java.lang.AssertionError: TODO: Encountered a read with 'M' in cigar string but no MD tag.
                at stream.SamLine.toShortMatch(SamLine.java:1200)
                at stream.SamLine.toRead(SamLine.java:1972)
                at stream.SamLine.toRead(SamLine.java:1832)
                at stream.SamReadInputStream.toReadList(SamReadInputStream.java:118)
                at stream.SamReadInputStream.fillBuffer(SamReadInputStream.java:89)
                at stream.SamReadInputStream.hasMore(SamReadInputStream.java:53)
                at stream.ConcurrentGenericReadInputStream$ReadThread.readLists(ConcurrentGenericReadInputStream.java:664)
                at stream.ConcurrentGenericReadInputStream$ReadThread.run(ConcurrentGenericReadInputStream.java:653)

                Comment


                • It was worth a try.

                  Based on the `TODO` tag I see perhaps this is something @Brian intends to work on. Unfortunately he has not been participating in forums of late so there is no guarantee as to when this question may be addressed. You could try creating a ticket at BBMap SF repository for this question.

                  Comment


                  • Thanks for the info didn't realise that was the site for tickets so appreciate it, I'll create a ticket there.

                    Comment


                    • You can try the reformat.sh command per @GenoMax but with 'sam=1.3' flag. That worked for me with a different variant caller (FreeBayes).

                      Comment


                      • Thanks for the input but same issue using sam=1.3, I think Geno's comment on something he intended to do is on the mark.

                        Comment


                        • Hi Brian - What are the right flags to set if I want all of the top scoring mappings for paired end reads?

                          Thanks

                          Comment


                          • Have you tried ambig=all?

                            Comment


                            • Originally posted by GenoMax View Post
                              Have you tried ambig=all?
                              Yeah - I have but from what I can tell it maxes out at 6 reports unless maxsites is increased. I am wondering if there is a setting that avoids maxsites and reports everything above some score.

                              Comment


                              • Originally posted by darthsequencer View Post
                                Yeah - I have but from what I can tell it maxes out at 6 reports unless maxsites is increased. I am wondering if there is a setting that avoids maxsites and reports everything above some score.
                                You might try the align2.BBMapPacBioSkimmer mapper with ambig=all and expectedsites= some high number. Not exactly what you were asking but I get dozens of hits with that.
                                Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Choosing Between NGS and qPCR
                                  by seqadmin



                                  Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                                  10-18-2024, 07:11 AM
                                • seqadmin
                                  Non-Coding RNA Research and Technologies
                                  by seqadmin




                                  Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                                  Nobel Prize for MicroRNA Discovery
                                  This week,...
                                  10-07-2024, 08:07 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 05:31 AM
                                0 responses
                                10 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-24-2024, 06:58 AM
                                0 responses
                                20 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-23-2024, 08:43 AM
                                0 responses
                                48 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-17-2024, 07:29 AM
                                0 responses
                                58 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X