Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • kga1978
    Senior Member
    • Nov 2010
    • 100

    Trimmomatic quality trimming

    I have been using Trimmomatic to trim adapters and quality scores. In general, I have been pleased with the performance, but I just ran some low quality samples through and Trimmomatic doesn't appear to be trimming correctly based on quality? In certain cases I even see the per base quality being worse after trimming than before. I have set my cutoff to '10', so I would expect everything below that to be cut off. Furthermore, I have specified a sliding-window minimum quality of '18'. A couple of examples:

    Before:


    After:


    Before:


    After:


    In each case I ran the following commands:
    Code:
    java -Xmx2g -classpath /usr/local/bin/trimmomatic/trimmomatic.jar org.usadellab.trimmomatic.TrimmomaticSE -phred33 sample.fastq.gz sample.trimmed.fastq ILLUMINACLIP:/Volumes/Storage_1/Sequencing_1/References/Contaminants/contaminants.fasta:2:40:12 LEADING:10 TRAILING:10 SLIDINGWINDOW:4:18 MINLEN:18
  • kga1978
    Senior Member
    • Nov 2010
    • 100

    #2
    I tried running these samples through PrinSeq and cutadapt as well with very similar results. This means that the problem isn't specific to Trimmomatic, but I'm still interested to hear if anybody knows what is causing this? I guess it only happens on really low-quality reads?

    Comment

    • tonybolger
      Senior Member
      • Feb 2010
      • 156

      #3
      Originally posted by kga1978 View Post
      I have been using Trimmomatic to trim adapters and quality scores. In general, I have been pleased with the performance, but I just ran some low quality samples through and Trimmomatic doesn't appear to be trimming correctly based on quality?
      Strange indeed.

      Is your data really phred33 as suggested in the command line? Illumina 1.5 is normally phred64.

      Comment

      • kga1978
        Senior Member
        • Nov 2010
        • 100

        #4
        To be perfectly honest, I'm not sure - the quality score thing is doing my head in (damn you, Illumina!). I assumed if it was phred64, my maximum score would be higher than 40, no? I'll try and rerun with phred64 and see what happens.

        Comment

        • tonybolger
          Senior Member
          • Feb 2010
          • 156

          #5
          Originally posted by kga1978 View Post
          To be perfectly honest, I'm not sure - the quality score thing is doing my head in (damn you, Illumina!). I assumed if it was phred64, my maximum score would be higher than 40, no? I'll try and rerun with phred64 and see what happens.
          If the data really is phred-64 but trimmomatic is told that it is phred33, trimmomatic will interpret each score as 31 higher than it really is - thus not really trimming much since the quality appears 'excellent'. I really should add a warning if the quality scores are outside the expected range, as this is nearly always caused by wrong phred-33/phred-64 selection, and results in either no trimming, or almost everything trimmed, depending on the direction of the mistake.

          In any case, you really shouldn't see a significant percentage of the reads with base calls much below the sliding window threshold - e.g. in fastQC, the yellow bars should mostly be above, but the whiskers will tend to be below. On really bad data, you might also see the yellow bars drop in the last few bases, an artefact of 'under-testing' as the sliding window runs off the end of the reads - this is to be expected.

          Here's an example of some really low quality data pre/post trimming, using sliding window 4 wide, quality 15.

          Untrimmed Forward:

          Untrimmed Reverse:

          Trimmed Forward Paired:

          Trimmed Forward Unpaired:

          Trimmed Reverse Paired:

          Trimmed Reverse Unpaired:

          Comment

          • kga1978
            Senior Member
            • Nov 2010
            • 100

            #6
            Hi Tony,

            Got it. I reran some of the reads and most of them got better with phred64 (I mostly use trimming for adapters though - my aligner takes into consideration quality). However, as you said, really bad reads still fall off dramatically in the end - probably due to the sliding window. So, just to be clear - am I correct in the following?

            Casava 1.3 - 1.7: Use Phred64
            Casava 1.8+: Use Phred33
            454 data (although Trimmomatic can't do this right now): Use Phred33

            Thanks for following up.

            Comment

            • tonybolger
              Senior Member
              • Feb 2010
              • 156

              #7
              Originally posted by kga1978 View Post
              Got it. I reran some of the reads and most of them got better with phred64 (I mostly use trimming for adapters though - my aligner takes into consideration quality). However, as you said, really bad reads still fall off dramatically in the end - probably due to the sliding window.
              How far in do you see the low bases, i.e below the threshold cut-off? Just the last few? Do your new plots look anything like the ones i posted?

              Originally posted by kga1978 View Post
              So, just to be clear - am I correct in the following?

              Casava 1.3 - 1.7: Use Phred64
              Casava 1.8+: Use Phred33
              454 data (although Trimmomatic can't do this right now): Use Phred33
              I believe so - though generally i verify by looking at the scores by eye, and checking here. Occasionally i've seen data in the 'wrong' phred because someone decided to be 'helpful'

              Comment

              • kga1978
                Senior Member
                • Nov 2010
                • 100

                #8
                Actually, it's all good - the one that had a dramatic drop-off in the end, I had forgotten to change to phred64!

                This is what the data looks like now:

                Comment

                • aforntacc
                  Member
                  • Jun 2011
                  • 48

                  #9
                  Originally posted by tonybolger View Post
                  If the data really is phred-64 but trimmomatic is told that it is phred33, trimmomatic will interpret each score as 31 higher than it really is - thus not really trimming much since the quality appears 'excellent'. I really should add a warning if the quality scores are outside the expected range, as this is nearly always caused by wrong phred-33/phred-64 selection, and results in either no trimming, or almost everything trimmed, depending on the direction of the mistake.

                  In any case, you really shouldn't see a significant percentage of the reads with base calls much below the sliding window threshold - e.g. in fastQC, the yellow bars should mostly be above, but the whiskers will tend to be below. On really bad data, you might also see the yellow bars drop in the last few bases, an artefact of 'under-testing' as the sliding window runs off the end of the reads - this is to be expected.

                  Here's an example of some really low quality data pre/post trimming, using sliding window 4 wide, quality 15.

                  Untrimmed Forward:

                  Untrimmed Reverse:

                  Trimmed Forward Paired:

                  Trimmed Forward Unpaired:

                  Trimmed Reverse Paired:

                  Trimmed Reverse Unpaired:

                  ok, i get this part very well, but my question is please if i want to use tophat for mapping which of these files should i use? (forward paired and reverse paired) what about the unpaired. i am new to trimmomatic and tophat sorry if this seems a stupid question.
                  thanks in advance

                  Comment

                  • mastal
                    Senior Member
                    • Mar 2009
                    • 666

                    #10
                    Trimmomatic quality trimming

                    I don't think Tophat and Bowtie will let you use paired reads and unpaired reads in the same run, so you would have to do 2 runs, one with the R1_paired.fastq and R2_paired.fastq files, and another run with the files containing the R1_unpaired.fastq and R2_unpaired.fastq reads.

                    Comment

                    • ebioman
                      Member
                      • Aug 2013
                      • 41

                      #11
                      How is quality score evaluated ?

                      Hi
                      I wondered whether anybody can explain me how the quality scores of the program
                      are actually calculated.
                      E.g. for the Lead-Trimming using a often cited value of 3 - obviously that won't be phred score. So what is it ?

                      Comment

                      • tonybolger
                        Senior Member
                        • Feb 2010
                        • 156

                        #12
                        Originally posted by ebioman View Post
                        Hi
                        I wondered whether anybody can explain me how the quality scores of the program
                        are actually calculated.
                        E.g. for the Lead-Trimming using a often cited value of 3 - obviously that won't be phred score. So what is it ?
                        It's a phred score

                        Historically, the illumina pipeline occasionally created reads with one (or more rarely two) N base-calls at the start, and more often, a set of trailing B phred quality scores at the end. N-base calls are treated as zero phred score, and B are quality 2, so by trimming both ends for all scores below 3, these artefacts are removed.

                        Comment

                        • ebioman
                          Member
                          • Aug 2013
                          • 41

                          #13
                          Thanks that was as short as informative ! I always thought it might be some other internal scores and tried desperately to reveal its calculation

                          Comment

                          • Laine
                            Junior Member
                            • Mar 2015
                            • 1

                            #14
                            Originally posted by ebioman View Post
                            Thanks that was as short as informative ! I always thought it might be some other internal scores and tried desperately to reveal its calculation
                            I had just the same doubt!! Very informative indeed...

                            Comment

                            • trimmoMe
                              Junior Member
                              • Sep 2015
                              • 2

                              #15
                              need help with trimmomatics

                              Hi everyone,

                              I am have been having some issues with my command line for trimmomatics,

                              this is what ive been using:
                              java -jar /Users/omriadini/Desktop/Trimmomatic-0.33/trimmomatic-0.33.jar SE -threads 4 -trimlog /Users/omriadini/Desktop/156\ L001/L002.trimLog /Volumes/omri\ hard\ drive/Ally\'s\ stuff/Liron\'s\ Project/Raw\ Data/NRF2-1_S17/NRF2-1_S17_L001_R1_001.fastq trimmed.NRF2-1_S17_L001_R1_001.fastq ILLUMINACLIP:/Users/omriadini/Desktop/156\ L001/Truseq_NEBnext_adapter_sequences\ \(1\).txt:2:30:10 HEADCROP:12 MAXINFO:0:40:0.5 MINLEN:36

                              however, this is the response i get everytime:
                              ILLUMINACLIP: Using 0 prefix pairs, 48 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
                              Quality encoding detected as phred33
                              Input Reads: 7877072 Surviving: 0 (0.00%) Dropped: 7877072 (100.00%)
                              TrimmomaticSE: Completed successfully

                              I am not sure why it keeps dropping all of my reads,

                              any ideas?

                              thanks in advance

                              Comment

                              Latest Articles

                              Collapse

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Today, 10:09 AM
                              0 responses
                              8 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, Yesterday, 08:59 AM
                              0 responses
                              14 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              22 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              19 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...