Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Brian Bushnell
    Super Moderator
    • Jan 2014
    • 2709

    #31
    Thanks for noting this. Some tools (like reformat) support the "extin" and "extout" flags which let you override the default, so you could do this:

    reformat.sh in=file.dat out=file2.dat extin=.sam extout=.fq

    But, BBMap doesn't support that right now. I'll add it. And I don't particularly recommend sam -> fastq conversion because the names change, since in sam format read 1 and read 2 must have identical names, whereas in fastq format they will typically have "/1" and "/2" or similar to differentiate them. Though you can do that conversion if you want.

    I have not used Galaxy and don't know what's possible, but until I make this change, I would suggest one of these:

    1) Use BBDuk for this filtering; its default output format is fastq and it's probably faster than BBMap anyway in this case. The syntax is very similar. On the command line, it would be something like "bbduk.sh in=reads.fq outu=clean.fq ref=ecoli.fasta".

    2) Tell BBMap "outu=stdout.fq" and pipe that to a file, if Galaxy supports pipes.

    As for your question about pairing, the normal behavior in paired-mapping mode is:

    "out=" will get everything.
    "outm=" will get all pairs in which either of the reads mapped to the reference.
    "outu=" will get all pairs in which neither read mapped to the reference.

    For BBDuk, it's slightly different but essentially the same:
    "out=" is the same as "outu=".
    "outu", aka "out", will get all pairs in which neither had a kmer match to the reference.
    "outm" will get all pairs in which either had a kmer match to the reference.
    For BBDuk, this behavior can be changed with the "reib" (removeIfEitherBad) flag. The assumption of that flag's name is that the reference is contaminants being filtered against, so the default "reib=true" means any pair where either matches the contaminant is removed.

    So, for both tools, if the input data is paired, the output data will also be paired - pairs are always kept together in all streams.

    Comment

    • Thowell
      Junior Member
      • Nov 2011
      • 3

      #32
      Thank you for the quick reply Brian.

      I was able to get things working with a pipe. I'm guessing the reads have to be interleaved with this method, but that will work fine for me until you can implement the alternate output flag.

      Thanks again!

      Comment

      • mendezg
        Junior Member
        • Oct 2014
        • 2

        #33
        How would you like us to cite bbduk in papers?

        Comment

        • GenoMax
          Senior Member
          • Feb 2008
          • 7142

          #34
          Originally posted by mendezg View Post
          How would you like us to cite bbduk in papers?
          In the past Brian has asked people to link to the project site on source forge.

          Comment

          • Brian Bushnell
            Super Moderator
            • Jan 2014
            • 2709

            #35
            Hi - sorry I somehow missed this question! Yes, as Genomax stated, please just cite it something like this (altered according the format of the journal):

            "BBDuk - Bushnell B. - sourceforge.net/projects/bbmap/"

            Comment

            • vmikk
              Junior Member
              • Jun 2015
              • 3

              #36
              Hello!
              Is it possible to use cutprimers.sh to cut the sequence AND to preserve the primer sites around?

              Comment

              • Brian Bushnell
                Super Moderator
                • Jan 2014
                • 2709

                #37
                Not currently... but I'll plan to add a flag for that.

                Comment

                • Brian Bushnell
                  Super Moderator
                  • Jan 2014
                  • 2709

                  #38
                  I added the "include" flag to cutprimers. Default is "include=f". If you set "include=t" the primers will be retained for the output.

                  Comment

                  • vmikk
                    Junior Member
                    • Jun 2015
                    • 3

                    #39
                    Hello Brian! Thanks a lot for the implementation of this feature!
                    Meanwhile I thought to modify sam files from msa.sh, but the out of the box functionality is much more convenient!
                    Thanks again!

                    Comment

                    • dkainer
                      Junior Member
                      • May 2015
                      • 9

                      #40
                      Brian,

                      is there a way with the BB Suite to demultiplex paired-end reads based on inline barcodes, like Flexbar does?

                      I can see it can be done one barcode at a time by outputting matching reads based on the first 6 left bases. But can it be done in one command to demultiplex for multiple barcodes?

                      cheers
                      DK

                      Comment

                      • Brian Bushnell
                        Super Moderator
                        • Jan 2014
                        • 2709

                        #41
                        It is almost possible to do this with Seal, which outputs reads into bins based on kmer matching.

                        seal.sh in=reads.fq pattern=%.fq k=6 restrictleft=6 mm=f ref=barcodes.fa rcomp=f

                        That would require a file "barcodes.fa" like this:
                        >AACTGA
                        AACTGA
                        >GGCCTT
                        GGCCTT

                        etc., with one fasta entry per barcode, so the output reads would be in file AACTGA.fq and so forth. This is sort of a common request, so maybe I will make it unnecessary to provide a fasta file of the barcodes. Does that matter to you either way?

                        However, BBDuk has the flags "skipr1" and "skipr2", which allow it to only do kmer operations on one read or the other. Seal currently lacks this, but it's essential for processing inline barcodes. I'll add it for the next release.

                        Comment

                        • dkainer
                          Junior Member
                          • May 2015
                          • 9

                          #42
                          i hadn't noticed the Seal command. Thanks for responding so fast!

                          So i assume that if I were to input paired-end reads to Seal with a barcodes.fa as the ref, it would try and match the barcodes in both the R1 and R2 reads? Hence the need for skipr1 and skipr2...?

                          Additionally, would seal let you left trim off the barcode bases from the R1 read?

                          Comment

                          • Brian Bushnell
                            Super Moderator
                            • Jan 2014
                            • 2709

                            #43
                            Originally posted by dkainer View Post
                            i hadn't noticed the Seal command. Thanks for responding so fast!

                            So i assume that if I were to input paired-end reads to Seal with a barcodes.fa as the ref, it would try and match the barcodes in both the R1 and R2 reads? Hence the need for skipr1 and skipr2...?
                            That's correct.

                            Additionally, would seal let you left trim off the barcode bases from the R1 read?
                            Yes, it has a flag "ftl" (forcetrimleft) for doing that... "ftl=6" would remove the first 6 bases of all reads. Unfortunately it would do that for both read 1 and read 2. So... if you have reads in 2 files, that's fine; you just process the read1 file with "ftl=6". If they are interleaved it's more complicated - you'd have to split them first (for example, reformat.sh in=reads.fq out=read#.fq). I'll consider adding that the ability to only do all operations on left or right reads... it seems useful.

                            Comment

                            • habib
                              Junior Member
                              • Feb 2012
                              • 3

                              #44
                              Hi Brian,

                              I produced the following graph using khist.sh for my 100bp PE Illumina reads. Could you help me interpret the graph please? What is the difference between raw_count and unique_kmers?

                              Habib

                              Life does have an instruction...

                              Comment

                              • westerman
                                Rick Westerman
                                • Jun 2008
                                • 1104

                                #45
                                Try dividing the raw count by the depth and see that the result equals unique_kmers. That might give you a clue as to what everything means.

                                Comment

                                Latest Articles

                                Collapse

                                • SEQadmin2
                                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                  by SEQadmin2


                                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                  ...
                                  Yesterday, 10:05 AM
                                • SEQadmin2
                                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                  by SEQadmin2


                                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                  Introduction

                                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                  05-22-2026, 06:42 AM
                                • SEQadmin2
                                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                  by SEQadmin2

                                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                  05-06-2026, 09:04 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by SEQadmin2, Yesterday, 12:03 PM
                                0 responses
                                17 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, Yesterday, 11:40 AM
                                0 responses
                                13 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-28-2026, 11:40 AM
                                0 responses
                                29 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-26-2026, 10:12 AM
                                0 responses
                                31 views
                                0 reactions
                                Last Post SEQadmin2  
                                Working...