Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Trimmomatic dropping 100% sRNA reads

    Hi All,

    This is my first post on SEQanswers and I am hoping some help from senior members . I am analyzing sRNA single end sequencing data and using Trimmomatic for trimming adapters. The problem is that after trimming process all the reads are getting dropped. Here is the summary for one file:

    TrimmomaticSE: Started with arguments: -threads 52 -phred64 -trimlog C2.log.txt C2.fastq.gz C2.Processed ILLUMINACLIP:./Trimmomatic-0.32/TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 CROP:24 MINLEN:21
    Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
    Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
    ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
    Input Reads: 39733090 Surviving: 0 (0.00%) Dropped: 39733090 (100.00%)
    TrimmomaticSE: Completed successfully
    As you see, all the trimmed reads have been dropped. I believe that this because of threshold values used - palindrome clip threshold, leading, trailing and sliding window. Should I be using palindrome clip thershold? I would really appreciate it if you can help me with this problem.

    Thanks

    BADE

  • #2
    I guess the problem is you crop parameter. Your adaptor sequence is 34 nucleotide length, but the crop parameter is 24.

    Comment


    • #3
      The palindrome clip parameter will not be a problem, as you are not doing any palindrome trimming (you have SE reads, not PE, and your trimmomatic output says 'Using 0 prefix pairs').

      How long are your reads? Your CROP:24 and MINLEN:21 is probably the reason
      none of your reads are surviving.



      What Illumina version are your reads? The current Illumina versions use the -phred33 quality encoding. See

      Comment


      • #4
        Hi Mastal

        Originally posted by mastal View Post
        How long are your reads? Your CROP:24 and MINLEN:21 is probably the reason
        none of your reads are surviving.
        The reads are 50nt long. I selected Minlength so that reads with length >= 21 after trimming are retained. That is because a proportion of miRNAs are of length 21nt. I chop at length 24 after trimming because sRNA longer than 24 are basically not miRNAs and not of particular interet to me. But in both cases Minlength and Crop I am assuming that steps are performed on trimmed read. Am I wrong?

        What Illumina version are your reads? The current Illumina versions use the -phred33 quality encoding.
        FastQC mentiones Illumina version 1.9 which uses phred33 as per the wiki link you sent.

        These are the results after modifying the quality encoding to phred33 and with different MinLength and Crop options:

        $ ./trimmomatic.sh
        TrimmomaticSE: Started with arguments: -threads 52 -phred33 C1.fastq.gz C1.Processed ILLUMINACLIP:./Trimmomatic-0.32/TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 CROP:24 MINLEN:21
        Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
        Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
        ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
        Input Reads: 37243929 Surviving: 36503574 (98.01%) Dropped: 740355 (1.99%)
        TrimmomaticSE: Completed successfully

        TrimmomaticSE: Started with arguments: -threads 52 -phred33 C1fastq.gz C1.ProcessedDefault ILLUMINACLIP:./Trimmomatic-0.32/TruSeq3-SE.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
        Using Long Clipping Sequence: 'AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA'
        Using Long Clipping Sequence: 'AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC'
        ILLUMINACLIP: Using 0 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
        Input Reads: 37243929 Surviving: 24107314 (64.73%) Dropped: 13136615 (35.27%)
        TrimmomaticSE: Completed successfully
        Now we have reads surviving and the number of survived reads is high with CROP:24 and MINLEN:21. Again I am assuming that both of these parameters are realted to trimmed read and not adapter?

        Best

        Bade

        Comment


        • #5
          Originally posted by BADE View Post
          FastQC mentiones Illumina version 1.9 which uses phred33 as per the wiki link you sent.
          Using phred64 on phred33 data will usually result in little or no output, since it lowers quality scores by 31 across the board, dropping them below any reasonable threshold. Since this is a common problem, the most recent version of trimmomatic auto-detects the quality score.

          Originally posted by BADE View Post
          Now we have reads surviving and the number of survived reads is high with CROP:24 and MINLEN:21. Again I am assuming that both of these parameters are realted to trimmed read and not adapter?
          CROP:24 cut the read after the 24th base, but will not cause a read to be dropped.

          MINLEN:21 will drop all reads shorter than 21 bases, but will not shorten or otherwise modify the reads.

          To answer the question in the original post, trimmomatic steps are applied in the order specified to the read (or pair if you were using pairs). The first step gets the whole read or pair, and subsequent steps get to work on the part which survived previous steps.

          Hope this helps,

          Tony.

          Comment


          • #6
            Hi Tony (and All),

            Thanks for confirming. I think I am on right track than. Many thanks.

            BADE

            Comment


            • #7
              Trimmomatic is widely accepted because it is written in Java and can be easily run on various platforms. But if you pursue simplicity and efficiency, you may try skewer which also has good performance in small RNA adapter trimming.

              For your case, you may input the following command:
              $ skewer --min 21 --max 24 -t 8 -x TruSeq3-SE.fa C2.fastq.gz

              where content of TruSeq3-SE.fa is:
              >TruSeq3_IndexedAdapter
              AGATCGGAAGAGCACACGTCTGAACTCCAGTCAC
              >TruSeq3_UniversalAdapter
              AGATCGGAAGAGCGTCGTGTAGGGAAAGAGTGTA
              Last edited by relipmoc; 01-14-2014, 08:39 AM.

              Comment


              • #8
                Hi all,

                This is my first post in seqanswers. I am also working with small RNA and i found this thread more useful in understanding the different parameters used in Trimmomatic. I have just now started learning things, so i have few questions regarding the parameter used by Bade (the user who opened this thread). BADE used LEADING:3 TRAILING:3.

                Does it mean that the program cut 3 bases off the start and end of the read if it falls below the threshold quality?

                If so, why just it just needs to be 3? Is it an optimum value? Is there a rationale in choosing 3?

                When i look at my quality control report i can see the per base sequence quality drops towards the end of the read. Does it mean i have to focus only on the end of the read. not the start of the read?

                Also, what is the difference between slidingwindow and leading? Both seems one and same for me.

                Sorry if i am asking too many questions at the same time and sorry if my questions are stupid. I am just learning.

                Thanks in advance for answering

                Comment


                • #9
                  Hi,

                  I think you should probably read the trimmomatic manual.



                  The parameter used with LEADING: and TRAILING: refers to the base quality score, not the number of bases, it was designed to remove Ns from the 3' or 5' ends of the reads.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Genetic Variation in Immunogenetics and Antibody Diversity
                    by seqadmin



                    The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                    11-06-2024, 07:24 PM
                  • seqadmin
                    Choosing Between NGS and qPCR
                    by seqadmin



                    Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                    10-18-2024, 07:11 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 11-01-2024, 06:09 AM
                  0 responses
                  30 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 10-30-2024, 05:31 AM
                  0 responses
                  21 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 10-24-2024, 06:58 AM
                  0 responses
                  26 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 10-23-2024, 08:43 AM
                  0 responses
                  57 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X