Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Please Help: What is the differences between standard trimming and adaptive trimming

    Hi All,

    When I do RNAseq Quality Trimmming using Perl scripts in Terminal, these Options appear:

    --type <num> 0=standard trimming, 1=adaptive trimming, 2=windowed adaptive trimming. Default 0

    -- qual-threshold <num> quality threshold for trimming, default 20
    -- length-threshold <num> length threshold for trimming, default 20
    ... ...

    Could anyboday explain the differences of 0=standard trimming, 1=adaptive trimming, 2=windowed adaptive trimming? and the criteria about setting length-threshold??

    Thanks a lot in advance.
    Last edited by byou678; 08-19-2011, 11:34 AM.

  • #2
    Is 'RNAsq' a program? If so (and I can not find it on the web) what does the program's documentation say? I am sure that we could hazard a guess but the program itself is your best bet.

    Oh ... I just found what you are probably using. 'Trim.pl' by Nik Joshi. That would have been nice to know. Anyway, yeah, there isn't much documentation to that program, is there? I suspect that you don't read "Perl" and Nik obviously believes that "good code is self-documenting" (e.g., his lack of comments about the basics is appalling although, unfortunately, I've seen worse) so it might take someone to dig into the code to give a definitive answer.

    Comment


    • #3
      For anyone who wants to dig:



      Or you could write to Nik Joshi.

      Comment


      • #4
        Sorry for the confusion. Actually, I use RNA-seq technology here. The data come from Illumina Genomic Analyzer II. Yes, I use this Scripts: 'Trim.pl' http://wiki.bioinformatics.ucdavis.e...ex.php/Trim.pl

        westerman, Thanks for your nice reply!!!
        Last edited by byou678; 08-19-2011, 11:44 AM.

        Comment


        • #5
          So from reading the code, "standard trimming" means that it will trim off a defined number of bases (as given by the "length-threshold" flag) from all reads, regardless of quality. In "adaptive trimming" mode it will use the quality scores to assess each read individually, by finding the first position which has a quality below cutoff (as given by the "qual-threshold" flag) and then trimming away this base and all following bases (unless the remaining read is shorter than the length threshold, in which case it will discard the whole read).

          So the adaptive method is slightly more sophisticated than the standard, though it might not always do what you'd want: if a read has a single poor-quality base early on but is otherwise high-quality, this method will throw away the good part of the read (possibly the whole read). The script has a third method which is slightly more sophisticated still, the "windowed adaptive trimming", which tries to combat this problem by running a sliding window over the read and looking at the average quality in this window, rather than at a single base.

          Comment


          • #6
            Thanks for the reply

            Hi gaffa, Thank you very much for the reply. For "standard trimming", from which end of the reads, the 20 bases ( if I use the default number) will be trimmed off? And if "standard trimming" regardless of quality scores, it may not be used often, am i right?

            In addition, could you send me the related papers or resources about my question. I need take a deeper look because this project is really important to me.

            Thanks again! Have a great weekend!


            Originally posted by gaffa View Post
            So from reading the code, "standard trimming" means that it will trim off a defined number of bases (as given by the "length-threshold" flag) from all reads, regardless of quality. In "adaptive trimming" mode it will use the quality scores to assess each read individually, by finding the first position which has a quality below cutoff (as given by the "qual-threshold" flag) and then trimming away this base and all following bases (unless the remaining read is shorter than the length threshold, in which case it will discard the whole read).

            So the adaptive method is slightly more sophisticated than the standard, though it might not always do what you'd want: if a read has a single poor-quality base early on but is otherwise high-quality, this method will throw away the good part of the read (possibly the whole read). The script has a third method which is slightly more sophisticated still, the "windowed adaptive trimming", which tries to combat this problem by running a sliding window over the read and looking at the average quality in this window, rather than at a single base.

            Comment


            • #7
              Is there anybody can offer me the related papers or resources about my urgent question? Thanks!

              Comment


              • #8
                Originally posted by byou678 View Post
                Is there anybody can offer me the related papers or resources about my urgent question? Thanks!
                I doubt if there are any papers. As far as I can tell the terms used and the algorithm used by the program are internal to the program. In other words if the author of the program got his idea from somewhere he did not cite those sources. The ideas behind his code are not that unique and have probably been implemented many times.

                Comment


                • #9
                  I think the two adaptive trimming modes will check the bases with quality scores from 5' end to 3' end, and then do trimming when the poor quality base or window is found. For standard trimming, it will directly trim off the defined number bases ( like 10 or 15 ) on the 3' end regardless the quality scores are good or bad (because Most modern sequencing technologies produce reads that have deteriorating quality towards the 3'-end).

                  Please correct me if i am wrong. Below is a related resouce and all other ideas and help will be greatly appreciated!!

                  Most modern sequencing technologies produce reads that have deteriorating quality towards the 3'-end. Incorrectly called bases here negatively impact assembles, mapping, and downstream bioinformatics analyses.

                  Sickle is a tool that uses sliding windows along with quality and length thresholds to determine when quality is sufficiently low to trim the 3'-end of reads. It will also discard reads based upon the length threshold. It takes the quality values and slides a window across them whose length is 0.1 times the length of the read. If this length is less than 1, then the window is set to be equal to the length of the read. Otherwise, the window slides along the quality values until the average quality in the window drops below the threshold. At that point the algorithm determines where in the window the drop occurs and cuts both the read and quality strings there. However, if the cut point is less than the minimum length threshold, then the read is discarded entirely.

                  Thanks westerman.

                  Originally posted by westerman View Post
                  I doubt if there are any papers. As far as I can tell the terms used and the algorithm used by the program are internal to the program. In other words if the author of the program got his idea from somewhere he did not cite those sources. The ideas behind his code are not that unique and have probably been implemented many times.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Best Practices for Single-Cell Sequencing Analysis
                    by seqadmin



                    While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                    06-06-2024, 07:15 AM
                  • seqadmin
                    Latest Developments in Precision Medicine
                    by seqadmin



                    Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                    Somatic Genomics
                    “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                    05-24-2024, 01:16 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Today, 07:23 AM
                  0 responses
                  8 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 06-17-2024, 06:54 AM
                  0 responses
                  11 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 06-14-2024, 07:24 AM
                  0 responses
                  24 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 06-13-2024, 08:58 AM
                  0 responses
                  17 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X