Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Looking for a trimming software that does these things

    Hello,

    I'm looking for a trimming/filtering software that can do the following:

    1) Trim both ends until there's at least a certain number of consecutive bases higher than a specific quality score.

    2) Remove the 3'-regions of a certain length if they contained a certain percentage of bp below a specific quality score. For example, remove 3' ends of 200 bp if they were made of more than 10% of bp below 20 phred score.

    3) Filter out reads with a certain percentage of bp below a specific quality score.

    4) Remove reads with a certain number of consecutive Ns.

    5) Be paired-end-aware, i.e. if one read was removed, remove its pair (there're several of these available, but without the other features).

    6) If a read was identical to the reverse compliment to its pair, remove it.

    I'd really appreciate your help.

  • #2
    BBduk.sh (part of BBMap), Trimmomatic, Cutadapt (and perhaps others that I am missing) should fit the bill. Though they may not check every box you have up there they should get the job done.

    Comment


    • #3
      Thanks. I tried Trimmomatic but not the other two. BBduk.sh seems promising (so does the BBMap package), but I'm gonna have to take a while before understanding its syntax. I'll post back if it does what I want.

      Comment


      • #4
        Originally posted by antifolate View Post
        Hello,

        I'm looking for a trimming/filtering software that can do the following:

        1) Trim both ends until there's at least a certain number of consecutive bases higher than a specific quality score.
        BBDuk used to use this strategy, but it's not optimal so I don't really recommend it. I was able to demonstrate empirically that it was not too good, either. So, BBDuk currently uses the Phred algorithm for quality trimming, which is optimal, though it's technically possible to disable that with a flag and use the old method instead. BBDuk also supports windowed trimming (trim until the average in a sliding window exceeds some threshold).

        3) Filter out reads with a certain percentage of bp below a specific quality score.
        The "maq" flag filters by average quality, where average quality is calculated by transforming the quality scores into probabilities, so basically if you set "maq=20" it removes reads with an expected error rate greater than 1%. I don't recommend setting it that high, though.

        4) Remove reads with a certain number of consecutive Ns.
        The "maxns=X" flag will filter reads with at least X Ns, but it doesn't care whether they are consecutive.

        5) Be paired-end-aware, i.e. if one read was removed, remove its pair (there're several of these available, but without the other features).
        Check.

        6) If a read was identical to the reverse compliment to its pair, remove it.
        You can do this with BBMerge, by running it but telling it not to join overlapping reads (using the "join=f" flag), and using the "maxlength" flag plus the "out" and "outu" streams. "maxlength=X" will send reads with insert sizes longer than X to outu rather than out. So:

        bbmerge.sh in=reads.fq out=short.fq outu=long.fq join=f maxlen=150

        (this command assumes pairs are interleaved in one file)

        Comment


        • #5
          I just got around to trying these commands and- although they're not exactly what I'm trying to do- they worked pretty well. bbmerge would merge my reads so I avoided it.

          Thank you!

          Comment


          • #6
            try skewer

            Another option is skewer. Good luck!

            Originally posted by antifolate View Post
            I just got around to trying these commands and- although they're not exactly what I'm trying to do- they worked pretty well. bbmerge would merge my reads so I avoided it.

            Thank you!

            Comment


            • #7
              @Brian

              "... though it's technically possible to disable that with a flag and use the old method instead."

              How can I do this?

              Comment


              • #8
                Originally posted by antifolate View Post
                @Brian

                "... though it's technically possible to disable that with a flag and use the old method instead."

                How can I do this?
                Add the flag "otm=f" (otm stands for "optimal trimming mode").

                Comment


                • #9
                  otm=f (outputtrimmedtomatch) Output reads trimmed to shorter
                  than minlength to outm rather than discarding.


                  What bbduk you talking about?

                  Comment


                  • #10
                    Ooops, looks like I have an overloaded flag. Thanks for spotting that! I'll rename that one to "ottm" in the next release. Currently, "otm" acts on the quality trimming, so "outputtrimmedtomatch" would have to be fully spelled out in order to function according to that description. To be more specific for now, use the flag "optitrim=f" to turn off optimal trimming, and "outputtrimmedtomatch" to dictate whether trimmed reads shorter than minlen go to outm.

                    Comment


                    • #11
                      I didn't know bbduk was your work. Thanks for the help and the tool!

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Latest Developments in Precision Medicine
                        by seqadmin



                        Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                        Somatic Genomics
                        “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                        05-24-2024, 01:16 PM
                      • seqadmin
                        Recent Advances in Sequencing Analysis Tools
                        by seqadmin


                        The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                        05-06-2024, 07:48 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Today, 01:32 PM
                      0 responses
                      7 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 05-24-2024, 07:15 AM
                      0 responses
                      198 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 05-23-2024, 10:28 AM
                      0 responses
                      220 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 05-23-2024, 07:35 AM
                      0 responses
                      230 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X