Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Thanks for noting this. Some tools (like reformat) support the "extin" and "extout" flags which let you override the default, so you could do this:

    reformat.sh in=file.dat out=file2.dat extin=.sam extout=.fq

    But, BBMap doesn't support that right now. I'll add it. And I don't particularly recommend sam -> fastq conversion because the names change, since in sam format read 1 and read 2 must have identical names, whereas in fastq format they will typically have "/1" and "/2" or similar to differentiate them. Though you can do that conversion if you want.

    I have not used Galaxy and don't know what's possible, but until I make this change, I would suggest one of these:

    1) Use BBDuk for this filtering; its default output format is fastq and it's probably faster than BBMap anyway in this case. The syntax is very similar. On the command line, it would be something like "bbduk.sh in=reads.fq outu=clean.fq ref=ecoli.fasta".

    2) Tell BBMap "outu=stdout.fq" and pipe that to a file, if Galaxy supports pipes.

    As for your question about pairing, the normal behavior in paired-mapping mode is:

    "out=" will get everything.
    "outm=" will get all pairs in which either of the reads mapped to the reference.
    "outu=" will get all pairs in which neither read mapped to the reference.

    For BBDuk, it's slightly different but essentially the same:
    "out=" is the same as "outu=".
    "outu", aka "out", will get all pairs in which neither had a kmer match to the reference.
    "outm" will get all pairs in which either had a kmer match to the reference.
    For BBDuk, this behavior can be changed with the "reib" (removeIfEitherBad) flag. The assumption of that flag's name is that the reference is contaminants being filtered against, so the default "reib=true" means any pair where either matches the contaminant is removed.

    So, for both tools, if the input data is paired, the output data will also be paired - pairs are always kept together in all streams.

    Comment


    • #32
      Thank you for the quick reply Brian.

      I was able to get things working with a pipe. I'm guessing the reads have to be interleaved with this method, but that will work fine for me until you can implement the alternate output flag.

      Thanks again!

      Comment


      • #33
        How would you like us to cite bbduk in papers?

        Comment


        • #34
          Originally posted by mendezg View Post
          How would you like us to cite bbduk in papers?
          In the past Brian has asked people to link to the project site on source forge.

          Comment


          • #35
            Hi - sorry I somehow missed this question! Yes, as Genomax stated, please just cite it something like this (altered according the format of the journal):

            "BBDuk - Bushnell B. - sourceforge.net/projects/bbmap/"

            Comment


            • #36
              Hello!
              Is it possible to use cutprimers.sh to cut the sequence AND to preserve the primer sites around?

              Comment


              • #37
                Not currently... but I'll plan to add a flag for that.

                Comment


                • #38
                  I added the "include" flag to cutprimers. Default is "include=f". If you set "include=t" the primers will be retained for the output.

                  Comment


                  • #39
                    Hello Brian! Thanks a lot for the implementation of this feature!
                    Meanwhile I thought to modify sam files from msa.sh, but the out of the box functionality is much more convenient!
                    Thanks again!

                    Comment


                    • #40
                      Brian,

                      is there a way with the BB Suite to demultiplex paired-end reads based on inline barcodes, like Flexbar does?

                      I can see it can be done one barcode at a time by outputting matching reads based on the first 6 left bases. But can it be done in one command to demultiplex for multiple barcodes?

                      cheers
                      DK

                      Comment


                      • #41
                        It is almost possible to do this with Seal, which outputs reads into bins based on kmer matching.

                        seal.sh in=reads.fq pattern=%.fq k=6 restrictleft=6 mm=f ref=barcodes.fa rcomp=f

                        That would require a file "barcodes.fa" like this:
                        >AACTGA
                        AACTGA
                        >GGCCTT
                        GGCCTT

                        etc., with one fasta entry per barcode, so the output reads would be in file AACTGA.fq and so forth. This is sort of a common request, so maybe I will make it unnecessary to provide a fasta file of the barcodes. Does that matter to you either way?

                        However, BBDuk has the flags "skipr1" and "skipr2", which allow it to only do kmer operations on one read or the other. Seal currently lacks this, but it's essential for processing inline barcodes. I'll add it for the next release.

                        Comment


                        • #42
                          i hadn't noticed the Seal command. Thanks for responding so fast!

                          So i assume that if I were to input paired-end reads to Seal with a barcodes.fa as the ref, it would try and match the barcodes in both the R1 and R2 reads? Hence the need for skipr1 and skipr2...?

                          Additionally, would seal let you left trim off the barcode bases from the R1 read?

                          Comment


                          • #43
                            Originally posted by dkainer View Post
                            i hadn't noticed the Seal command. Thanks for responding so fast!

                            So i assume that if I were to input paired-end reads to Seal with a barcodes.fa as the ref, it would try and match the barcodes in both the R1 and R2 reads? Hence the need for skipr1 and skipr2...?
                            That's correct.

                            Additionally, would seal let you left trim off the barcode bases from the R1 read?
                            Yes, it has a flag "ftl" (forcetrimleft) for doing that... "ftl=6" would remove the first 6 bases of all reads. Unfortunately it would do that for both read 1 and read 2. So... if you have reads in 2 files, that's fine; you just process the read1 file with "ftl=6". If they are interleaved it's more complicated - you'd have to split them first (for example, reformat.sh in=reads.fq out=read#.fq). I'll consider adding that the ability to only do all operations on left or right reads... it seems useful.

                            Comment


                            • #44
                              Hi Brian,

                              I produced the following graph using khist.sh for my 100bp PE Illumina reads. Could you help me interpret the graph please? What is the difference between raw_count and unique_kmers?

                              https://raw.githubusercontent.com/ha...hist.sh_output
                              Habib

                              Life does have an instruction...

                              Comment


                              • #45
                                Try dividing the raw count by the depth and see that the result equals unique_kmers. That might give you a clue as to what everything means.

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Exploring the Dynamics of the Tumor Microenvironment
                                  by seqadmin




                                  The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                                  07-08-2024, 03:19 PM
                                • seqadmin
                                  Exploring Human Diversity Through Large-Scale Omics
                                  by seqadmin


                                  In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                                  06-25-2024, 06:43 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, 07-19-2024, 07:20 AM
                                0 responses
                                25 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 07-16-2024, 05:49 AM
                                0 responses
                                41 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 07-15-2024, 06:53 AM
                                0 responses
                                45 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 07-10-2024, 07:30 AM
                                0 responses
                                42 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X