Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Looking for a trimming software that does these things

    Hello,

    I'm looking for a trimming/filtering software that can do the following:

    1) Trim both ends until there's at least a certain number of consecutive bases higher than a specific quality score.

    2) Remove the 3'-regions of a certain length if they contained a certain percentage of bp below a specific quality score. For example, remove 3' ends of 200 bp if they were made of more than 10% of bp below 20 phred score.

    3) Filter out reads with a certain percentage of bp below a specific quality score.

    4) Remove reads with a certain number of consecutive Ns.

    5) Be paired-end-aware, i.e. if one read was removed, remove its pair (there're several of these available, but without the other features).

    6) If a read was identical to the reverse compliment to its pair, remove it.

    I'd really appreciate your help.

  • #2
    BBduk.sh (part of BBMap), Trimmomatic, Cutadapt (and perhaps others that I am missing) should fit the bill. Though they may not check every box you have up there they should get the job done.

    Comment


    • #3
      Thanks. I tried Trimmomatic but not the other two. BBduk.sh seems promising (so does the BBMap package), but I'm gonna have to take a while before understanding its syntax. I'll post back if it does what I want.

      Comment


      • #4
        Originally posted by antifolate View Post
        Hello,

        I'm looking for a trimming/filtering software that can do the following:

        1) Trim both ends until there's at least a certain number of consecutive bases higher than a specific quality score.
        BBDuk used to use this strategy, but it's not optimal so I don't really recommend it. I was able to demonstrate empirically that it was not too good, either. So, BBDuk currently uses the Phred algorithm for quality trimming, which is optimal, though it's technically possible to disable that with a flag and use the old method instead. BBDuk also supports windowed trimming (trim until the average in a sliding window exceeds some threshold).

        3) Filter out reads with a certain percentage of bp below a specific quality score.
        The "maq" flag filters by average quality, where average quality is calculated by transforming the quality scores into probabilities, so basically if you set "maq=20" it removes reads with an expected error rate greater than 1%. I don't recommend setting it that high, though.

        4) Remove reads with a certain number of consecutive Ns.
        The "maxns=X" flag will filter reads with at least X Ns, but it doesn't care whether they are consecutive.

        5) Be paired-end-aware, i.e. if one read was removed, remove its pair (there're several of these available, but without the other features).
        Check.

        6) If a read was identical to the reverse compliment to its pair, remove it.
        You can do this with BBMerge, by running it but telling it not to join overlapping reads (using the "join=f" flag), and using the "maxlength" flag plus the "out" and "outu" streams. "maxlength=X" will send reads with insert sizes longer than X to outu rather than out. So:

        bbmerge.sh in=reads.fq out=short.fq outu=long.fq join=f maxlen=150

        (this command assumes pairs are interleaved in one file)

        Comment


        • #5
          I just got around to trying these commands and- although they're not exactly what I'm trying to do- they worked pretty well. bbmerge would merge my reads so I avoided it.

          Thank you!

          Comment


          • #6
            try skewer

            Another option is skewer. Good luck!

            Originally posted by antifolate View Post
            I just got around to trying these commands and- although they're not exactly what I'm trying to do- they worked pretty well. bbmerge would merge my reads so I avoided it.

            Thank you!

            Comment


            • #7
              @Brian

              "... though it's technically possible to disable that with a flag and use the old method instead."

              How can I do this?

              Comment


              • #8
                Originally posted by antifolate View Post
                @Brian

                "... though it's technically possible to disable that with a flag and use the old method instead."

                How can I do this?
                Add the flag "otm=f" (otm stands for "optimal trimming mode").

                Comment


                • #9
                  otm=f (outputtrimmedtomatch) Output reads trimmed to shorter
                  than minlength to outm rather than discarding.


                  What bbduk you talking about?

                  Comment


                  • #10
                    Ooops, looks like I have an overloaded flag. Thanks for spotting that! I'll rename that one to "ottm" in the next release. Currently, "otm" acts on the quality trimming, so "outputtrimmedtomatch" would have to be fully spelled out in order to function according to that description. To be more specific for now, use the flag "optitrim=f" to turn off optimal trimming, and "outputtrimmedtomatch" to dictate whether trimmed reads shorter than minlen go to outm.

                    Comment


                    • #11
                      I didn't know bbduk was your work. Thanks for the help and the tool!

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Genetic Variation in Immunogenetics and Antibody Diversity
                        by seqadmin



                        The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                        11-06-2024, 07:24 PM
                      • seqadmin
                        Choosing Between NGS and qPCR
                        by seqadmin



                        Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                        10-18-2024, 07:11 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Today, 11:09 AM
                      0 responses
                      23 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, Today, 06:13 AM
                      0 responses
                      20 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 11-01-2024, 06:09 AM
                      0 responses
                      30 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 10-30-2024, 05:31 AM
                      0 responses
                      21 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X