Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Thanks for noting this. Some tools (like reformat) support the "extin" and "extout" flags which let you override the default, so you could do this:

    reformat.sh in=file.dat out=file2.dat extin=.sam extout=.fq

    But, BBMap doesn't support that right now. I'll add it. And I don't particularly recommend sam -> fastq conversion because the names change, since in sam format read 1 and read 2 must have identical names, whereas in fastq format they will typically have "/1" and "/2" or similar to differentiate them. Though you can do that conversion if you want.

    I have not used Galaxy and don't know what's possible, but until I make this change, I would suggest one of these:

    1) Use BBDuk for this filtering; its default output format is fastq and it's probably faster than BBMap anyway in this case. The syntax is very similar. On the command line, it would be something like "bbduk.sh in=reads.fq outu=clean.fq ref=ecoli.fasta".

    2) Tell BBMap "outu=stdout.fq" and pipe that to a file, if Galaxy supports pipes.

    As for your question about pairing, the normal behavior in paired-mapping mode is:

    "out=" will get everything.
    "outm=" will get all pairs in which either of the reads mapped to the reference.
    "outu=" will get all pairs in which neither read mapped to the reference.

    For BBDuk, it's slightly different but essentially the same:
    "out=" is the same as "outu=".
    "outu", aka "out", will get all pairs in which neither had a kmer match to the reference.
    "outm" will get all pairs in which either had a kmer match to the reference.
    For BBDuk, this behavior can be changed with the "reib" (removeIfEitherBad) flag. The assumption of that flag's name is that the reference is contaminants being filtered against, so the default "reib=true" means any pair where either matches the contaminant is removed.

    So, for both tools, if the input data is paired, the output data will also be paired - pairs are always kept together in all streams.

    Comment


    • #32
      Thank you for the quick reply Brian.

      I was able to get things working with a pipe. I'm guessing the reads have to be interleaved with this method, but that will work fine for me until you can implement the alternate output flag.

      Thanks again!

      Comment


      • #33
        How would you like us to cite bbduk in papers?

        Comment


        • #34
          Originally posted by mendezg View Post
          How would you like us to cite bbduk in papers?
          In the past Brian has asked people to link to the project site on source forge.

          Comment


          • #35
            Hi - sorry I somehow missed this question! Yes, as Genomax stated, please just cite it something like this (altered according the format of the journal):

            "BBDuk - Bushnell B. - sourceforge.net/projects/bbmap/"

            Comment


            • #36
              Hello!
              Is it possible to use cutprimers.sh to cut the sequence AND to preserve the primer sites around?

              Comment


              • #37
                Not currently... but I'll plan to add a flag for that.

                Comment


                • #38
                  I added the "include" flag to cutprimers. Default is "include=f". If you set "include=t" the primers will be retained for the output.

                  Comment


                  • #39
                    Hello Brian! Thanks a lot for the implementation of this feature!
                    Meanwhile I thought to modify sam files from msa.sh, but the out of the box functionality is much more convenient!
                    Thanks again!

                    Comment


                    • #40
                      Brian,

                      is there a way with the BB Suite to demultiplex paired-end reads based on inline barcodes, like Flexbar does?

                      I can see it can be done one barcode at a time by outputting matching reads based on the first 6 left bases. But can it be done in one command to demultiplex for multiple barcodes?

                      cheers
                      DK

                      Comment


                      • #41
                        It is almost possible to do this with Seal, which outputs reads into bins based on kmer matching.

                        seal.sh in=reads.fq pattern=%.fq k=6 restrictleft=6 mm=f ref=barcodes.fa rcomp=f

                        That would require a file "barcodes.fa" like this:
                        >AACTGA
                        AACTGA
                        >GGCCTT
                        GGCCTT

                        etc., with one fasta entry per barcode, so the output reads would be in file AACTGA.fq and so forth. This is sort of a common request, so maybe I will make it unnecessary to provide a fasta file of the barcodes. Does that matter to you either way?

                        However, BBDuk has the flags "skipr1" and "skipr2", which allow it to only do kmer operations on one read or the other. Seal currently lacks this, but it's essential for processing inline barcodes. I'll add it for the next release.

                        Comment


                        • #42
                          i hadn't noticed the Seal command. Thanks for responding so fast!

                          So i assume that if I were to input paired-end reads to Seal with a barcodes.fa as the ref, it would try and match the barcodes in both the R1 and R2 reads? Hence the need for skipr1 and skipr2...?

                          Additionally, would seal let you left trim off the barcode bases from the R1 read?

                          Comment


                          • #43
                            Originally posted by dkainer View Post
                            i hadn't noticed the Seal command. Thanks for responding so fast!

                            So i assume that if I were to input paired-end reads to Seal with a barcodes.fa as the ref, it would try and match the barcodes in both the R1 and R2 reads? Hence the need for skipr1 and skipr2...?
                            That's correct.

                            Additionally, would seal let you left trim off the barcode bases from the R1 read?
                            Yes, it has a flag "ftl" (forcetrimleft) for doing that... "ftl=6" would remove the first 6 bases of all reads. Unfortunately it would do that for both read 1 and read 2. So... if you have reads in 2 files, that's fine; you just process the read1 file with "ftl=6". If they are interleaved it's more complicated - you'd have to split them first (for example, reformat.sh in=reads.fq out=read#.fq). I'll consider adding that the ability to only do all operations on left or right reads... it seems useful.

                            Comment


                            • #44
                              Hi Brian,

                              I produced the following graph using khist.sh for my 100bp PE Illumina reads. Could you help me interpret the graph please? What is the difference between raw_count and unique_kmers?

                              https://raw.githubusercontent.com/ha...hist.sh_output
                              Habib

                              Life does have an instruction...

                              Comment


                              • #45
                                Try dividing the raw count by the depth and see that the result equals unique_kmers. That might give you a clue as to what everything means.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Choosing Between NGS and qPCR
                                  by seqadmin



                                  Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                                  10-18-2024, 07:11 AM
                                • seqadmin
                                  Non-Coding RNA Research and Technologies
                                  by seqadmin




                                  Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                                  Nobel Prize for MicroRNA Discovery
                                  This week,...
                                  10-07-2024, 08:07 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 05:31 AM
                                0 responses
                                10 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-24-2024, 06:58 AM
                                0 responses
                                20 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-23-2024, 08:43 AM
                                0 responses
                                48 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-17-2024, 07:29 AM
                                0 responses
                                58 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X