Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • mgbfx9
    Member
    • Sep 2015
    • 25

    Hi GeoMax,

    Thank you so much for all the help.

    Yes, I saw that and for that reason I posted the percent surviving report.
    Yes, I ran FastQC before processing, but after processing FastQC was giving me lots of troubles. Anyway,
    Before Timming, total sequence was 3032230, and it didn’t pass per base sequence quality and the adapter content. From the graph, the adapter content (which was Nextera Transposase sequence) which was started at 40bp position and went little over 60% till 229 bp position. I am not sure how to interpret this without sending the graph. But I am not sure how to paste this graph. I tried printScrn, it is still not pasting.
    After trimming, total sequence is 1084634, but this sample passed both per base sequence quality and adapter content.

    How to check the adapter contamination? Is there a way?

    I am not clear about your last question: "How long were the reads to begin with (you have asked for minlength 36)?"
    Yes, I asked for minimum length 36, so it would drop the read if it is below 36. And my code at the end was:
    ILLUMINACLIP:/home/mydir/Trimmomatic-0.33/adapters/NexteraPE-PE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
    Output was :
    Using PrefixPair: 'AGATGTGTATAAGAGACAG' and 'AGATGTGTATAAGAGACAG'
    Using Long Clipping Sequence: 'GTCTCGTGGGCTCGGAGATGTGTATAAGAGACAG'
    Using Long Clipping Sequence: 'TCGTCGGCAGCGTCAGATGTGTATAAGAGACAG'
    Using Long Clipping Sequence: 'CTGTCTCTTATACACATCTGACGCTGCCGACGA'
    Using Long Clipping Sequence: 'CTGTCTCTTATACACATCTCCGAGCCCACGAGAC'
    ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
    Input Read Pairs: 3032230 Both Surviving: 1084634 (35.77%) Forward Only Surviving: 1933475 (63.76%) Reverse Only Surviving: 606 (0.02%) Dropped: 13515 (0.45%)
    TrimmomaticPE: Completed successfully.

    Comment

    • apredeus
      Senior Member
      • Jul 2012
      • 151

      Originally posted by Brian Bushnell View Post
      Hi all,

      I'm testing Trimmomatic's performance on adapter removal, and the results are mysteriously bad. So I'd like to make sure I'm not doing anything wrong. This is my command line (modified from the website):

      java -Xmx8g -jar trimmomatic-0.32.jar SE -phred33 dirty.fq tclean.fq ILLUMINACLIP:gruseq.fa:2:30:10

      ...where dirty.fq is a file containing reads with adapter sequences and gruseq.fa is a file containing the adapter sequences. The adapters are inserted synthetically and the reads are tagged, so I know precisely what the correct results should be, and what I'm getting is not really close. Any suggestions?
      Hello Brian,

      did you figure out what was the issue? I'm perplexed - I wanted to use Trimmomatic and BBduk and compare the results, however, Trimmomatic just would not remove the adapter that is absolutely certainly there and could be found using simply using grep. And they are precisely the case for the palindrome mode.

      So I have my test reads that are just two reads - one R1, one R2.

      Code:
      @E00513:47:HF757ALXX:1:1101:17208:2047 1:N:0:ATCACG
      CNAAAAAAAAGATTGCGACCTCGATGTTGGATTAAAATGAACTTTTGGCGCAAAAGTTAAAAGGGTTAGGTCTGTTCGACCTTTAAAATTTTAGATCGGAAGAGCACACGTCTGAACTCCAGTCACATCACGATCACGTATGCCGTCTTC
      +
      <#AAFAJJJJJFJFJFJJJJF<AFJJJJJJFFFJJJJJ<JF-FJJJJJFJA<-FAF-FFJAJFJFAFJJJAAF<<7<JJJ<JJJAJJJJJJJJJFJFJFFFJJFA<FJ<FJ7F<7--7F<JAFAAAFFFJJJ<-FA7FAF<J77AFJJ<-
      @E00513:47:HF757ALXX:1:1101:17208:2047 2:N:0:ATCACG
      AAAATTTTAAAGGTCGAACAGACCTAACCCTTTTAACTTTTGCGCCCAAAGTTCATTTTAATCCAACATCGAGGTCGCAATCTTTTTTTTCGAGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTAGATCTCGGTGGTCGACGTATCAAT
      +
      AAF<FFAFAJJJJAJFJJFJJFJJJAJJJJJFJFFFJJJJJFAF<<-A<-<FFFJJJJFJJ<FFFFJ7<F-77A--7<<FJJ7-<AFF<F7AFJ-7--7--<-77-77-A)7-FJ<7A-7FA<-----77--A))7)7)<)))7<-----
      Then I take the following adapter file:
      Code:
      >PrefixPE/1
      AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC
      >PrefixPE/2
      AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA
      Both of these could be found in above reads - first one in R1, second one in R2. Both start at position 93.

      then I run

      Code:
      java -jar $JAR PE -trimlog trim.log test.R1.fastq test.R2.fastq output_forward_paired.fq.gz output_forward_unpaired.fq.gz output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:test.fa:0:30:10
      And the adapter is not recognized or clipped. IT IS, however, successfully clipped in simple mode (e.g. when adapters are renamed)!

      Even more curiously, the adapters I've posted above are actually reverse-complements of the TrueSeq3-PE.fa. When I use the rev-comp, they are recognized!

      I think this might be due to the library construction peculiarities of RNA-seq, but I am not sure. At any rate that's a strange behaviour for the trimmer, is it not?

      Comment

      • apredeus
        Senior Member
        • Jul 2012
        • 151

        Originally posted by tonybolger View Post
        Enjoy and happy trimming.
        Dear Tony,

        few small suggestions that can make the program a lot easier to use -

        1) just require 1 tag/prefix for the output, and use that to construct the output file names. Four file names make the command unnecessary long and unreadable
        2) make a simple wrapper around the jar, sort of like FastQC has. This would also allow to "install" the program and thus have the adapter fasta at a fixed location for the program.

        Comment

        • apredeus
          Senior Member
          • Jul 2012
          • 151

          I've also found a strange issue with the "keepBothEnds" option.

          In the example I have described two posts above (but with bigger files, 25K reads each), the following command
          Code:
          java -jar $JAR PE -trimlog trim.log test.R1.fastq test.R2.fastq output_forward_paired.fq.gz output_forward_unpaired.fq.gz output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:0:30:10
          gives the following output:

          Code:
          Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
          ILLUMINACLIP: Using 1 prefix pairs, 0 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
          Quality encoding detected as phred33
          Input Read Pairs: 25000 Both Surviving: 18039 (72.16%) Forward Only Surviving: 6948 (27.79%) Reverse Only Surviving: 0 (0.00%) Dropped: 13 (0.05%)
          TrimmomaticPE: Completed successfully
          while adding the option to keep both reads leads to failure:

          Code:
          java -jar $JAR PE -trimlog trim.log test.R1.fastq test.R2.fastq output_forward_paired.fq.gz output_forward_unpaired.fq.gz output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:0:30:10:2:TRUE
          output:

          Code:
          Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
          ILLUMINACLIP: Using 1 prefix pairs, 0 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
          Quality encoding detected as phred33
          Input Read Pairs: 25000 Both Surviving: 24987 (99.95%) Forward Only Surviving: 0 (0.00%) Reverse Only Surviving: 0 (0.00%) Dropped: 13 (0.05%)
          TrimmomaticPE: Completed successfully
          This should not happen. My trimmomatic version is 0.36.

          Comment

          • mastal
            Senior Member
            • Mar 2009
            • 666

            Originally posted by apredeus View Post
            then I run

            Code:
            java -jar $JAR PE -trimlog trim.log test.R1.fastq test.R2.fastq output_forward_paired.fq.gz output_forward_unpaired.fq.gz output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:test.fa:0:30:10
            And the adapter is not recognized or clipped. IT IS, however, successfully clipped in simple mode (e.g. when adapters are renamed)!

            Even more curiously, the adapters I've posted above are actually reverse-complements of the TrueSeq3-PE.fa. When I use the rev-comp, they are recognized!

            I think this might be due to the library construction peculiarities of RNA-seq, but I am not sure. At any rate that's a strange behaviour for the trimmer, is it not?
            One point to note might be that the thresholds you are using in the ILLUMINACLIP command are 30 for the palindrome mode and 10 for the simple mode, and you are not allowing for any mismatches. So it's quite possible that in palindrome mode the score is not reaching the threshold of 30 that trimmomatic needs in order to identify the presence of the adapter, whereas the score of 10 requires fewer matching bases and is more easily reached. Each matching base contributes about 0.6 to the score, according to the trimmomatic manual.

            Note that the adapter sequences used for palindrome mode are the reverse complements of the sequences used in simple mode.


            Edit: The adapter fasta sequences you have above appear to be the reverse complements of the sequences I have in my version of the TruSeqv3-PE adapter fasta file, which may explain why they didn't work until you used the reverse complements. My version of the TruSeqv3 file is:

            >PrefixPE/1
            TACACTCTTTCCCTACACGACGCTCTTCCGATCT
            >PrefixPE/2
            GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT

            It's also possible that at least one of the 2 adapters is different for the RNA-Seq kits.
            Last edited by mastal; 04-29-2017, 03:14 PM.

            Comment

            • mastal
              Senior Member
              • Mar 2009
              • 666

              Originally posted by apredeus View Post
              Dear Tony,

              few small suggestions that can make the program a lot easier to use -

              1) just require 1 tag/prefix for the output, and use that to construct the output file names. Four file names make the command unnecessary long and unreadable
              Unless you are using a very old version of trimmomatic, you can use the -basein and -baseout options to specify the prefix of the input and output file names.

              Comment

              • mastal
                Senior Member
                • Mar 2009
                • 666

                Originally posted by apredeus View Post
                I've also found a strange issue with the "keepBothEnds" option.

                while adding the option to keep both reads leads to failure:

                Code:
                java -jar $JAR PE -trimlog trim.log test.R1.fastq test.R2.fastq output_forward_paired.fq.gz output_forward_unpaired.fq.gz output_reverse_paired.fq.gz output_reverse_unpaired.fq.gz ILLUMINACLIP:TruSeq3-PE.fa:0:30:10:2:TRUE
                output:

                Code:
                Using PrefixPair: 'TACACTCTTTCCCTACACGACGCTCTTCCGATCT' and 'GTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT'
                ILLUMINACLIP: Using 1 prefix pairs, 0 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
                Quality encoding detected as phred33
                Input Read Pairs: 25000 Both Surviving: 24987 (99.95%) Forward Only Surviving: 0 (0.00%) Reverse Only Surviving: 0 (0.00%) Dropped: 13 (0.05%)
                TrimmomaticPE: Completed successfully
                This should not happen. My trimmomatic version is 0.36.
                Why do you think this indicates failure? You asked trimmomatic to keep both reads of a pair when it finds adapter in palindrome mode, and it appears to have done that.
                This does not mean that trimmomatic has not trimmed the adapters.

                The trimlog tells you for each read how many bases have been trimmed. Check the trimlog entries for some of the reads that were ending up in the forward_unpaired file previously, and check whether adapters have been trimmed from both the forward and reverse reads, although both reads of the pair have been kept, rather than the reverse read being discarded.

                Comment

                • apredeus
                  Senior Member
                  • Jul 2012
                  • 151

                  Code:
                  Note that the adapter sequences used for palindrome mode are the reverse complements of the sequences used in simple mode.
                  Ok, I understand now. And it's the rev-comp of /2 that is present in read 1, and vise versa - so that's why the adapters are "switched". Which I guess makes sense since R1 read-through would lead us to reading adapter ligated to R2.

                  However, it makes it very confusing to make your own file. It should definitely be switched to sequences matching the simple mode.

                  Edit: The adapter fasta sequences you have above appear to be the reverse complements of the sequences I have in my version of the TruSeqv3-PE adapter fasta file, which may explain why they didn't work until you used the reverse complements. My version of the TruSeqv3 file is:
                  Yes, I know - I've written before that it works OK with the rev-comp adapters that are also switched. I was just not sure why that was the case.

                  One remaining question is why does it not work for with the extra option to keep the output paired-ended?

                  Comment

                  • apredeus
                    Senior Member
                    • Jul 2012
                    • 151

                    Originally posted by mastal View Post
                    Why do you think this indicates failure? You asked trimmomatic to keep both reads of a pair when it finds adapter in palindrome mode, and it appears to have done that.
                    This does not mean that trimmomatic has not trimmed the adapters.
                    You are right, thank you! The message confused me as well.

                    Comment

                    • yip
                      Junior Member
                      • Oct 2011
                      • 5

                      I assume that Trimmomatic processes the reads sequentially, so I don't understand why a run with paired-end fastq input could take >10GB of memory easily. Does anyone know what I am missing?

                      I'm using trimmomatic-0.36 with 4 threads for a job. The input fastq files are about 4GB in size each.

                      Thank you!
                      yip

                      Comment

                      Latest Articles

                      Collapse

                      • SEQadmin2
                        From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                        by SEQadmin2


                        Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                        The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                        ...
                        06-02-2026, 10:05 AM
                      • SEQadmin2
                        Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                        by SEQadmin2


                        With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                        Introduction

                        Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                        05-22-2026, 06:42 AM
                      • SEQadmin2
                        Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                        by SEQadmin2

                        Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                        Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                        05-06-2026, 09:04 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by SEQadmin2, 06-02-2026, 12:03 PM
                      0 responses
                      21 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-02-2026, 11:40 AM
                      0 responses
                      14 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 05-28-2026, 11:40 AM
                      0 responses
                      29 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 05-26-2026, 10:12 AM
                      0 responses
                      31 views
                      0 reactions
                      Last Post SEQadmin2  
                      Working...