Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • kmcarr
    Senior Member
    • May 2008
    • 1181

    #31
    Originally posted by alisrpp View Post
    Thanks a lot!!!

    Do you recommend me to do an individual FASTA file for each index?
    No, I use a single file with all of the Illumina TruSeq sequences in it. I also don't bother designating any of them for "Palindrome" search. I've read the description of Palindrome search several times on the Trimmomatic site and frankly still don't understand it. I just do simple searches for them and it seems to work fine for me.

    I have attached the file I use with Trimmomatic.
    Attached Files

    Comment

    • alisrpp
      Member
      • Dec 2010
      • 40

      #32
      Thansk for the answers!

      About the palindrome clipping, some days ago i wrote one of the creators of Trimmomatic asking about an alternative explanation to the one in the web site (i couldn't understand it either).
      Here is the answer, for me was useful:

      Simple clipping is just finding a contaminant sequence somewhere within a read. Conceptually, you get contaminant and read, and you slide them across each other, until you get a perfect or close enough match. So, with R being read bases, and C being contaminant, you check

      1)
      RRRRRRRRRRR
      CCCC

      2)
      RRRRRRRRRRR
      CCCC ->

      etc.

      Palindrome clipping is a bit more complex - and related to actual palindromes only in a twisted mind like mine. In this case, you 'ligate' the presumed adapter sequence to the start of each read in a pair, and try sliding them over each other.

      So with F being bases from the forward read, R being bases from the reverse read, and A being either adapter (technically the two adapters are different, but lets ignore that for now).

      AAAAAAFFFFFFF ->
      <- RRRRRRRAAAAAA

      In this case, the aligning region is much longer, since it consists of the entire read length plus part of the adapter. This gives a very high confidence that an apparent 'read-though' is a true-positive.

      Comment

      • kmcarr
        Senior Member
        • May 2008
        • 1181

        #33
        Originally posted by alisrpp View Post
        Here is the answer, for me was useful:

        Simple clipping is just finding a contaminant sequence somewhere within a read. Conceptually, you get contaminant and read, and you slide them across each other, until you get a perfect or close enough match. So, with R being read bases, and C being contaminant, you check

        1)
        RRRRRRRRRRR
        CCCC

        2)
        RRRRRRRRRRR
        CCCC ->

        etc.

        Palindrome clipping is a bit more complex - and related to actual palindromes only in a twisted mind like mine. In this case, you 'ligate' the presumed adapter sequence to the start of each read in a pair, and try sliding them over each other.

        So with F being bases from the forward read, R being bases from the reverse read, and A being either adapter (technically the two adapters are different, but lets ignore that for now).

        AAAAAAFFFFFFF ->
        <- RRRRRRRAAAAAA

        In this case, the aligning region is much longer, since it consists of the entire read length plus part of the adapter. This gives a very high confidence that an apparent 'read-though' is a true-positive.
        Yeah, still clear as mud.

        Comment

        • claire.anderson1
          Junior Member
          • Mar 2013
          • 1

          #34
          How does adapter trimming in Trimmomatic work?

          I have two adapter sequences of 58 bp and 66 bp that I would like to remove from my Illumina data set (if present). Can Trimmomatic recognise partial matches to these adapter sequences? For example, if I am using 100 bp reads and a particular sequence contains 90 bp of DNA from the source organism, the remaining 10 bp at the end of the read might be from the adapter. Would Trimmomatic be able to pick this up? Or must it find a match to the whole adapter sequence?

          I'm new at playing with NGS data, so any advice would be gratefully received!

          Comment

          • cllorens
            Member
            • Nov 2011
            • 44

            #35
            Maybe you can also check out cutadapt, that it is also useful for illumina data.

            Comment

            • tonybolger
              Senior Member
              • Feb 2010
              • 156

              #36
              Originally posted by claire.anderson1 View Post
              I have two adapter sequences of 58 bp and 66 bp that I would like to remove from my Illumina data set (if present). Can Trimmomatic recognise partial matches to these adapter sequences? For example, if I am using 100 bp reads and a particular sequence contains 90 bp of DNA from the source organism, the remaining 10 bp at the end of the read might be from the adapter. Would Trimmomatic be able to pick this up? Or must it find a match to the whole adapter sequence?
              In the case of paired-end data with adapter 'read-though' (where the DNA fragment is less than the read length, and the end of the reads are from the 'opposite' adapter), trimmomatic can remove even a single adapter base (if you use sufficiently aggressive settings). Older versions of trimmomatic required at least 8 bp of adapter in this case, but that was probably too conservative so i reduced it. The latest versions also include the recommended adapter sequences, which have been a common stumbling point.

              For other, less common, scenarios, where the adapter location/orientation isn't known in advance, or where you're using single end data, you'd typically want to be a bit more cautious, but 10bp or greater can usually be removed at a reasonable false positive rate.

              Hope this helps.

              Comment

              • tonybolger
                Senior Member
                • Feb 2010
                • 156

                #37
                Originally posted by kmcarr View Post
                Yeah, still clear as mud.
                Sorry that my explanation for this obviously sucks, and now that the adapter sequences are included directly in trimmomatic, there's probably not such a major need for everyone to understand it, but here goes anyway.

                During adapter read-though, with paired end data (and assuming the same length of forward and reverse reads) we get pairs with:
                • The forward read consisting of X useful bases, followed by Y bases from the end of the reverse read adapter.
                • The reverse read consisting of X useful bases, followed by Y bases from the end of the forward read adapter.

                The beauty is that those X bases in both the forward and reverse reads, are the same bases, though in reverse complement, and those Y bases are always specific known sequences starting immediately afterwards. So rather than fish for those Y bases in isolation (which is risky / difficult if Y is small), we can check simultaneously for 3 things:
                • The first X bases of both reads being reverse complements of each other.
                • The additional bases from the forward read match the reverse adapter.
                • The additional bases from the reverse read match the forward adapter.

                Since all three must be found to support the 'read-though' hypothesis in a given read pair/position, the false positive rate is very low. Naturally we don't know what X is, but we can check every possible X from zero to the read length.

                Comment

                • leda
                  Junior Member
                  • Feb 2013
                  • 7

                  #38
                  What do the four columns following the read identifier in the trimlog represent? I can't find this in the documentation.

                  thanks!

                  Comment

                  • mastal
                    Senior Member
                    • Mar 2009
                    • 666

                    #39
                    Introducing the Trimmomatic

                    This is an extract from the trimmomatic web page:

                    specifying a trimlog file creates a log of all read trimmings, indicating the following details:

                    * the read name
                    * the surviving sequence length
                    * the location of the first surviving base, aka. the amount trimmed from the start
                    * the location of the last surviving base in the original read
                    * the amount trimmed from the end

                    Comment

                    • helios
                      Junior Member
                      • Nov 2010
                      • 1

                      #40
                      Make trimmomatic a binary/executable

                      Hi Guys,

                      in case you prefer to run trimmomatic as binary ./trimmomatic

                      you can follow these steps:

                      1) download and gunzip stub.sh.gz (in attachment) where trimmomatic-0.X.jar is located
                      2) cat stub.sh trimmomatic-0.30.jar >> trimmomatic
                      3) chmod +x trimmomatic
                      4) add trimmomatic's home to your path

                      ref: https://coderwall.com/p/ssuaxa

                      in case you need to modify java's parameters we must modify stub.sh opportunely.

                      Ciao.
                      Attached Files

                      Comment

                      • rmdoyle
                        Junior Member
                        • May 2013
                        • 6

                        #41
                        Hi everyone,

                        I've recently used Trimmomatic on some Illumina HiSeq PE fastq files. I then attempted to run the post-Trimmomatic fastq files through fastqc. My original illumina files run through fastqc just fine, but the post-trimmomatic files get stuck, which makes me think I've corrupted the files somehow while using Trimmomatic.

                        When I run fastqc on my post-trimmomatic fastq files, I get the following output after inputting my sequences:

                        Exception in thread "Thread-4" java.lang.NullPointerException
                        at uk.ac.babraham.FastQC.Sequence.FastQFile.readNext(FastQFile.java:141)
                        at uk.ac.babraham.FastQC.Sequence.FastQFile.next(FastQFile.java:105)
                        at uk.ac.babraham.FastQC.Analysis.AnalysisRunner.run(AnalysisRunner.java:76)
                        at java.lang.Thread.run(Unknown Source)

                        I also did get one error message after running trimmomatic. This error was:

                        Exception in thread "main" java.lang.RuntimeException: Sequence and quality length don't match: 'GAGGTTCTTTGCTTCCTTCGGGAACCTCTCCAGCCCCACTGCCATCCTTGGCAACCCCATGGTCCGTGCCCATGGCAAGAAAGTGCTCAC' vs 'ggggggggggggfeggggggggcgggeggggggggeggg

                        My original trimmomatic code was:

                        TrimmomaticPE: -phred64 -trimlog trimlog SRR522907_1.fastq SRR522907_2.fastq paired_output1.fastq unpaired_output1.fastq paired_output2.fastq unpaired_output2.fastq ILLUMINACLIP:TruSeq3_PE.fa:2:30:10 LEADING:20 TRAILING:20 MINLEN:30

                        I'd appreciate any thoughts on where I went wrong...

                        Comment

                        • tonybolger
                          Senior Member
                          • Feb 2010
                          • 156

                          #42
                          Originally posted by rmdoyle View Post
                          I'd appreciate any thoughts on where I went wrong...
                          Very strange indeed, and nothing i've seen before.

                          I would suspect something like a lack of disk space, or something killed the trimmomatic process. It may also be a one-off glitch, so perhaps running it again, and checking if the output is still broken might help.

                          Comment

                          • rmdoyle
                            Junior Member
                            • May 2013
                            • 6

                            #43
                            Hmmm... gave it another shot and still no dice. Any thoughts on the following error/warning, tonybolger?

                            Exception in thread "main" java.lang.RuntimeException: Sequence and quality length don't match: 'GAGGTTCTTTGCTTCCTTCGGGAACCTCTCCAGCCCCACTGCCATCCTTGGCAACCCCATGGTCCGTGCCCATGGCAAGAAAGTGCTCAC' vs 'ggggggggggggfeggggggggcgggeggggggggeggg

                            Comment

                            • tonybolger
                              Senior Member
                              • Feb 2010
                              • 156

                              #44
                              Originally posted by rmdoyle View Post
                              Hmmm... gave it another shot and still no dice. Any thoughts on the following error/warning, tonybolger?

                              Exception in thread "main" java.lang.RuntimeException: Sequence and quality length don't match: 'GAGGTTCTTTGCTTCCTTCGGGAACCTCTCCAGCCCCACTGCCATCCTTGGCAACCCCATGGTCCGTGCCCATGGCAAGAAAGTGCTCAC' vs 'ggggggggggggfeggggggggcgggeggggggggeggg
                              Ah, that was a trimmomatic error. Normally a FASTQ record should have the same number of bases and quality scores, and for some reason, this read appears to have fewer quality scores, which trimmomatic considers invalid (AFAIK this is correct behaviour). At this point, trimmomatic gives up, and probably leaves a partial output file, which may cause other issues.

                              The question is why the record is invalid. Can you find that fastq record within the file?

                              Of course, trimmomatic should really log the name of the record as well, rather than just the data, but i haven't seen this happen before.

                              Comment

                              • rmdoyle
                                Junior Member
                                • May 2013
                                • 6

                                #45
                                Yup, the complete record is:

                                @FCB01CWABXX:1:2205:1823:145892
                                GAGGTTCTTTGCTTCCTTCGGGAACCTCTCCAGCCCCACTGCCATCCTTGGCAACCCCATGGTCCGTGCCCATGGCAAGAAAGTGCTCAC
                                +FCB01CWABXX:1:2205:1823:145892
                                ggggggggggggfeggggggggcgggeggggggggeggg18207:146312

                                I suppose I could just cut this record out?

                                Interestingly, if I leave out the ILLUMINACLIP:TruSeqForTrimmomatic.fna:2:30:10 option, and leave my code as:

                                trimmomatic paired-end -phred64 -trimlog trimlog SRR522907_1.fastq SRR522907_2.fastq paired_output1b.fastq unpaired_output1b.fastq paired_output2b.fastq unpaired_output2b.fastq LEADING:20 TRAILING:20 MINLEN:30

                                I get files that I CAN run through fastqc without any problems (the results don't look great, but I can run the files through). Does that set off any red flags?

                                Comment

                                Latest Articles

                                Collapse

                                • SEQadmin2
                                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                  by SEQadmin2


                                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                  ...
                                  06-02-2026, 10:05 AM
                                • SEQadmin2
                                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                  by SEQadmin2


                                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                  Introduction

                                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                  05-22-2026, 06:42 AM
                                • SEQadmin2
                                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                  by SEQadmin2

                                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                  05-06-2026, 09:04 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by SEQadmin2, 06-02-2026, 12:03 PM
                                0 responses
                                21 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-02-2026, 11:40 AM
                                0 responses
                                14 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-28-2026, 11:40 AM
                                0 responses
                                29 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-26-2026, 10:12 AM
                                0 responses
                                31 views
                                0 reactions
                                Last Post SEQadmin2  
                                Working...