Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • amango
    Member
    • Dec 2009
    • 17

    Problem with trimmomatic

    I ran trimmomatic on my 100bp paired end Illumina hiseq dataset using the command below. But the output files were all empty when unzipped (4kb when zipped). The program ran for around 8 hours (each of the input fastq files is around 33gb) and the trimlog file was populated (11.19gb total), but it looks like every line indicates that the sequence length for each respective read is 0 (e.g. HS4_80:8:2308:9999:99196/2 0 0 0 0). I don't think the problem is with the input files since they pass qc.

    Any ideas on what's going wrong?

    nohup java -classpath trimmomatic-0.17.jar org.usadellab.trimmomatic.TrimmomaticPE -trimlog trimlog C08DRACXX.8_1.fastq C08DRACXX.8_2.fastq forward_paired.fq.gz forward_unpaired.fq.gz reverse_paired.fq.gz reverse_unpaired.fq.gz ILLUMINACLIP:adapters+RC_NOindexes_for_trimmomatic.fastq:2:40:15 LEADING:10 TRAILING:10 SLIDINGWINDOW:4:20 MINLEN:70 &
  • westerman
    Rick Westerman
    • Jun 2008
    • 1104

    #2
    Hum. That is indeed strange. The '0 0 0 0' indicates a surviving sequence length of 0 yet nothing was trimmed.

    Four suggestions:

    First, perhaps you encountered a system error (out of memory, out of time, etc.). Try running with a reduced input. Say a 'head --lines=40000' from each of the files.

    Second, perhaps you have malformed PE reads? Try running Trimmomatic as single-end on one of your input files.

    Third, just to make sure that you have proper looking input files please respond with the output of a 'head --lines=8' from one of your input files.

    And fourth, when you say that they pass QC, what quality control measurement are you using?

    Comment

    • amango
      Member
      • Dec 2009
      • 17

      #3
      Third, just to make sure that you have proper looking input files please respond with the output of a 'head --lines=8' from one of your input files.

      And fourth, when you say that they pass QC, what quality control measurement are you using?
      Thanks for the input. I received another tip that since my data was generated on a Hiseq, quality scores are probably phred33, whereas the Trimmomatic default is phred64. I will specify phredd33 and run the command again, if that doesn't work, I will follow your first two suggestions.

      Below are the first 8 lines of one of my input fastq files.

      I checked quality using fastqc, and also used picard commands CollectAlignmentSummaryMetrics and ValidateSamFile (our core provided our raw reads in a bam file).


      @HS4_80:8:1101:10000:100155/1
      CATTCGTGTGAAAATGATAGTGAACCTCTGATAAGCAGTACGGACTCCAAAGAAGTGAAAGATAATAAAAAAAATAGGAAAGCACTGGGGTGCATTAAAA
      +
      ?@@FFFFFFHFHHJIJIIJEFHHGHGGGIJIIIIIGIDFHGIIJJGIIIJHIJGIBF=F@FCAGGHGCDHGHCDDCDDDDDDCDDDDDDB59B>CDCC>A
      @HS4_80:8:1101:10000:101570/1
      TTGCGATCGGACGTCAAACATGAAGGTGTATTTATGACCATCGAGGGCACAGTCGACTTACAGATCAGCGCACAGAACGTAGGTGCTTTTGACGCTTTCT
      +
      @CCFFFFFDFHHHIIHJJJIJJJJJJ?DBGHGHEIJIJJJJIJBABGGEGGECEHBBBCCEEEECDDDDD@BBDCDDDD?BDDCCDDDDCCDDBD>BDCD

      Comment

      • westerman
        Rick Westerman
        • Jun 2008
        • 1104

        #4
        I received another tip that since my data was generated on a Hiseq, quality scores are probably phred33, whereas the Trimmomatic default is phred64.
        That is a good point. I have my Trimmomatic script set up to automatically use phred33 thus completely forgot that this is the most likely culprit.

        Your file looks fine. Good to hear that you ran fastqc and the other programs. I would still reduce your dataset to a smaller number of reads. This would allow you to test out trimmomatic quickly.

        Comment

        • tonybolger
          Senior Member
          • Feb 2010
          • 156

          #5
          Originally posted by amango View Post
          Any ideas on what's going wrong?
          The most likely explanation is a phred33 vs phred64 mismatch - HiSeq data is typically Phred-33, while Trimmomatic uses phred64 by default (which was the 'standard' when it was implemented). If you get this wrong in one direction, you get no trimming, in the other direction, everything gets trimmed. I really need to implement a warning for this, since it's relatively common.

          Incidentally, you can just 'head' a few thousand lines from the input files for test purposes - normally you get a higher 'casuality' rate from the start/end of the file (since these tiles are at the edge of the flow cell), but some reads should still pass.
          Last edited by tonybolger; 02-04-2012, 03:03 AM.

          Comment

          • amango
            Member
            • Dec 2009
            • 17

            #6
            I did re-run trimmomatic on my fastq files, this time specifying --phred33, and it seems to have worked.

            However, when I tried to assess the resulting output files using fastQC, I run into errors. It seems the fastq files outputted by trimmomatic are not what fastqc expects. Below are A) the first few lines of one output file, trimmed_forward_paired.fq, and B) the first and last few lines of the log file produced by fastqc, documenting the errors encountered when this same file was run. Similar errors as described for lines 1 and 21 were found for many lines throughout the file.

            I haven't seen this type of a problem with fastQC before though I have tried it with fastq files. And I don't know enough about the fastq format to tell based on the files alone whether the problem here is with trimmomatic or fastqc. Any pointers would be appreciated.

            A)
            @HS4_80:8:1101:10000:100155/1
            CATTCGTGTGAAAATGATAGTGAACCTCTGATAAGCAGTACGGACTCCAAAGAAGTGAAAGATAATAAAAAAAATAGGAAAGCACTGGGGTGCATTAAAA
            +
            ?@@FFFFFFHFHHJIJIIJEFHHGHGGGIJIIIIIGIDFHGIIJJGIIIJHIJGIBF=F@FCAGGHGCDHGHCDDCDDDDDDCDDDDDDB59B>CDCC>A
            @HS4_80:8:1101:10000:104061/1
            CGAGATTGTAGTGTCCACCGCATTTGCTGACACCAAGCCGGCAGATAAGAACGAGAAGAAAAGGGCCATTTTATCCAACCCATTATTCTCATTTGGAGCC
            +
            CCCFFFFFHHHHHJIJJJJIJJJJJGGIJIJJJIIJIJJJJJJJIJJJJJIIJHHGFFFFDEEDDDDDDDEEEDDDCDDDDBCCCDEEEDEEEEECDDCD
            @HS4_80:8:1101:10000:105586/1
            ATGGCTTTTTTCATCCAAGATGAGGACGATAAATGCCAACCAATCTGTGAAAATCCCCGATGGCATTGATGTCACAGTCAATAAGAGGATCATAGTTGTC
            +
            @CCFFFFFHHGHHJJJJJJIJEHHIJIJIJJIJJIJJIIJJGGJIJJIJJIJIJJJJJJGHFFFCEEEEDEEDEDDCDDDDFEDDDD@BDCDDDDDEDDD
            @HS4_80:8:1101:10000:107366/1
            GAGCAATGTTAAAGTTAGGTGTCTTAAAGAATGCAACCAAATATCATATTCGCAACACTTGTCTGCAGCCTGTTTAGATGCCACAGAAGTTATATTGTAC
            +
            CCCFFFFFHHHHHJJJJJJIIIIJJIJEHIIJJJJJJJJIJJJJHIJJJJJJIIJJJJIJJJJJJJIJJJJIHHHHHHFFFFFFEEEEECCDDDFEEEEE
            @HS4_80:8:1101:10000:107743/1
            ATGGCTTCTACTGGCCACTGCACCGGTTGCGTGCGGATCTGCTCGTGCACCGCCAGTACCGTATCCGCGGTGTACGGCAGCCGACCGTAGACAA
            +
            CCCFFFFFHHHHHJJIJJJJJIIJIJDHHGIFHGHJIJIIJIGGHEHHEFFFDD:BCCDDDBDDDDBDBD9@5ACBBBBDDB>>BD@52<<C:?
            @HS4_80:8:1101:10000:11073/1
            TTACGATCTTCACGTCCACGTCATCGTCCTGGACCAGAGATTCGTGGAAAGCACTATGAACGGCCGCCACGCTAAACATCTTAATATCGATATTATAATC
            +
            CCCFFFFFHHHHHJJJJIJJJJJJJJJJJJJJJJJJJJJIJJJJHHJHIJJJJJJJJJJJJHHFFDDDDDDDDDDDDDDDDDDEEEEEDDDDDEEEDDED


            B)
            Semicolon seems to be missing at /Volumes/pichia/aman/data/trimmed_forward_paired.fq line 1.
            Array found where operator expected at /Volumes/pichia/aman/data/trimmed_forward_paired.fq line 21, at end of line
            (Might be a runaway multi-line ?? string starting on line 4)
            (Missing semicolon on previous line?)
            Semicolon seems to be missing at /Volumes/pichia/aman/data/trimmed_forward_paired.fq line 21.
            [...]
            syntax error at trimmed_forward_paired.fq line 1, near "@HS4_80:"
            syntax error at trimmed_forward_paired.fq line 84, near "@ACCDC@>"
            syntax error at trimmed_forward_paired.fq line 200, near "F@GBH"
            syntax error at trimmed_forward_paired.fq line 212, near "?DDDDD<FBFFDFFAGA=FF9CEGFIFE;CF>GEEDGBBFABFFF<B?FF"
            syntax error at trimmed_forward_paired.fq line 252, near "@@<DD;D?FFF"
            syntax error at trimmed_forward_paired.fq line 252, near "?DD1DD?FGDHGFB"
            syntax error at trimmed_forward_paired.fq line 252, near "C>"
            BEGIN not safe after errors--compilation aborted at trimmed_forward_paired.fq line 300.
            Last edited by amango; 02-04-2012, 02:59 PM.

            Comment

            • tonybolger
              Senior Member
              • Feb 2010
              • 156

              #7
              Originally posted by amango View Post
              I did re-run trimmomatic on my fastq files, this time specifying --phred33, and it seems to have worked.
              Excellent.

              Originally posted by amango View Post
              However, when I tried to assess the resulting output files using fastQC, I run into errors. It seems the fastq files outputted by trimmomatic are not what fastqc expects. Below are A) the first few lines of one output file, trimmed_forward_paired.fq, and B) the first and last few lines of the log file produced by fastqc, documenting the errors encountered when this same file was run. Similar errors as described for lines 1 and 21 were found for many lines throughout the file.
              Strange - i've just tested those 6 records on FastQC, and it seems relatively happy to parse them.

              BTW those errors you show seem very like what would happen if perl tried to parse a fastq file.

              Comment

              • mparida
                Member
                • Mar 2012
                • 15

                #8
                Trimmomatic 0.32 error

                Hi
                I am running Trimmomatic 0.32 and I couldn't figure out why the trim log file shows the following for some reads in the reverse pair file:

                phred 33
                Illumina sequencing reads

                HWI-ST1122:289:C38D1ACXX:8:1101:1200:2205 1:N:0:ATCACG 98 0 98 3=>Read 1
                HWI-ST1122:289:C38D1ACXX:8:1101:1200:2205 2:N:0:ATCACG 0 0 0 0 =>Read 2

                HWI-ST1122:289:C38D1ACXX:8:1101:4410:2059 1:N:0:ATCACG 101 0 101 0=>Read 1
                HWI-ST1122:289:C38D1ACXX:8:1101:4410:2059 2:N:0:ATCACG 0 0 0 0=>Read 2

                HWI-ST1122:289:C38D1ACXX:8:1101:4892:2178 1:N:0:ATCACG 101 0 101 0 =>Read 1
                HWI-ST1122:289:C38D1ACXX:8:1101:4892:2178 2:N:0:ATCACG 0 0 0 0 =>Read 2

                It looks to me like reads trimmed 0 and reads survived 0.

                Comment

                • mastal
                  Senior Member
                  • Mar 2009
                  • 666

                  #9
                  In the excerpt from your log file, it looks like the Read2 reads were dropped by trimmomatic.

                  Did you have a look at those reads to see why? For example, they might have been very low quality, or shorter than a minimum length you specified.

                  What parameters did you run trimmomatic with?

                  Comment

                  • mparida
                    Member
                    • Mar 2012
                    • 15

                    #10
                    Reply

                    Trimmomatic paramters:
                    -phred33
                    ILLUMINACLIP:/Users/mparida/Software/Trimmomatic-0.32/adapters/:2:40:12 SLIDINGWINDOW:5:20 LEADING:10 TRAILING:12 MINLEN:90

                    Thanks for your reply. I figured out what Trimmomatic is doing. After I changed the MINLEN parameter to 40, it started giving me different stats:
                    for example:
                    Read 1 101 0 101 0
                    Read 2 37 0 37 64
                    So when it trims and the read length doesn't pass MINLEN threshold, it shows us this weird stats for the same read:
                    Read 1 101 0 101 0
                    Read 2 0 0 0 0
                    This makes sense.

                    Comment

                    Latest Articles

                    Collapse

                    • SEQadmin2
                      From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                      by SEQadmin2


                      Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                      The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                      ...
                      06-02-2026, 10:05 AM
                    • SEQadmin2
                      Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                      by SEQadmin2


                      With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                      Introduction

                      Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                      05-22-2026, 06:42 AM
                    • SEQadmin2
                      Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                      by SEQadmin2

                      Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                      Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                      05-06-2026, 09:04 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by SEQadmin2, Today, 08:59 AM
                    0 responses
                    10 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-02-2026, 12:03 PM
                    0 responses
                    21 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-02-2026, 11:40 AM
                    0 responses
                    17 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 05-28-2026, 11:40 AM
                    0 responses
                    31 views
                    0 reactions
                    Last Post SEQadmin2  
                    Working...