Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Asymmetric trimmomatic output with paired-end RNA seq. data

    I have data from two experiments of 6 samples each that used 2x100 b paired-end Illumina HiSeq 2000 RNA sequencing with unstranded libraries in case of one and stranded libraries in the other. Average insert/fragment lengths in both experiments were ~200 b.

    I used trimmomatic (0.32; on 64-bit Linux) to remove contaminant adapter as well as poor quality sub-sequences from the reads.

    Code:
    java -jar trimmomatic-0.32.jar PE -threads 16 -phred33 sample_1.fastq sample_2.fastq sample_trimmed_paired_1.fastq.gz sample_trimmed_unpaired_1.fastq.gz sample_trimmed_paired_2.fastq.gz sample_trimmed_unpaired_2.fastq.gz ILLUMINACLIP:adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
    For both experiments and for all samples, I find that the trimmed_unpaired_1 files are >5-10 times in size than the trimmed_unpaired_2 files. The trimmed_paired_1 and _2 files are similar in size, as expected. See example file-size listings below.

    What could be the reason for this?


  • #2
    It could mean that the quality in read 2 was lower than read 1, this is why you retain a lot of unpaired read 1 where the corresponding read 2 was dropped due to being trimmed to short or having a very low quality.

    Comment


    • #3
      Originally posted by lorendarith View Post
      It could mean that the quality in read 2 was lower than read 1, this is why you retain a lot of unpaired read 1 where the corresponding read 2 was dropped due to being trimmed to short or having a very low quality.
      This is a possibility. However, it seems unlikely in my case (I examined read qualities using FastQC).

      I wonder if the asymmetry that I am seeing is because I am not using the keepBothReads option of trimmomatic. From the manual:

      After read-though has been detected by palindrome mode, and the adapter sequence removed, the reverse read contains the same sequence information as the forward read, albeit in reverse complement. For this reason, the default behaviour is to entirely drop the reverse read. By specifying 'true' for this parameter, the reverse read will also be retained, which may be useful e.g. if the downstream tools cannot handle a combination of paired and unpaired reads.

      Comment


      • #4
        Originally posted by alpha2zee View Post
        I wonder if the asymmetry that I am seeing is because I am not using the keepBothReads option of trimmomatic.
        It seems this is not the reason. I tested this, with 'ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10:8:TRUE' (see my first post). Usage of keepBothReads as per the manual is ILLUMINACLIP:<fastaWithAdaptersEtc>:<seed mismatches>:<palindrome clip threshold>:<simple clip threshold>:<minAdapterLength>:<keepBothReads>.

        Comment


        • #5
          I tweaked the trimmomatic run parameters a bit and it seems to have a significant effect: there are less unpaired reads, and the asymmetry between the left and right unpaired reads is less as well.

          Code:
          java -jar trimmomatic-0.32.jar PE -threads 16 -phred33 sample_1.fastq sample_2.fastq sample_trimmed_paired_1.fastq.gz sample_trimmed_unpaired_1.fastq.gz sample_trimmed_paired_2.fastq.gz sample_trimmed_unpaired_2.fastq.gz ILLUMINACLIP:adapters/TruSeq3-PE-2.fa:2:30:10:8:TRUE LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:26

          Comment


          • #6
            yes, it looks like the difference in size of your R1_unpaired file is due to changing the parameters so that Trimmomatic keeps both reads after trimming adapter sequences in palindrome mode, rather than the default behaviour, which is to discard R2, thus leaving R1 as unpaired.

            Comment


            • #7
              Hi,
              just wanted to say that this thread really helped sorting out my trimmomatic call.

              I have smallRNA libraries and due to their nature, a big proportion of the read is adapter. I was running the default mode and was quite unhappy about the results.

              Code:
              java -Xmx1000m -jar ./Trimmomatic-0.32/trimmomatic-0.32.jar PE -threads 4 1.fastq.gz 2.fastq.gz out_1.fq out.unpaired_1.fq out_2.fq out.unpaired_2.fq ILLUMINACLIP:miRNA.neb.solexa.adapters.fasta:2:10:7 MINLEN:15
              The file sizes for the output files were quite discouraging:

              102M Nov 7 10:53 1.out.unpaired_2.fq
              56K Nov 7 10:53 1.out.unpaired_1.fq
              56M Nov 7 10:53 1.out_2.fq
              52M Nov 7 10:53 1.out_1.fq

              After considering the suggested changes in this thread, my new call is:

              Code:
              java -Xmx1000m -jar ./Trimmomatic-0.32/trimmomatic-0.32.jar PE -threads 4 1.fastq.gz 2.fastq.gz out_1.fq out.unpaired_1.fq out_2.fq out.unpaired_2.fq ILLUMINACLIP:miRNA.neb.solexa.adapters.fasta:2:30:10:8:TRUE LEADING:3 TRAILING:3 SLIDINGWINDOW:4:30 MINLEN:15
              And the files' sizes look much better:

              13M Nov 19 13:41 2.out.unpaired_2.fq
              32M Nov 19 13:41 2.out.unpaired_1.fq
              484M Nov 19 13:41 2.out_2.fq
              517M Nov 19 13:41 2.out_1.fq

              PS I am aware that PE is overkill for miRNAs, but SE was not available to us at the time

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Genetic Variation in Immunogenetics and Antibody Diversity
                by seqadmin



                The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                11-06-2024, 07:24 PM
              • seqadmin
                Choosing Between NGS and qPCR
                by seqadmin



                Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                10-18-2024, 07:11 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Today, 11:09 AM
              0 responses
              24 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Today, 06:13 AM
              0 responses
              20 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 11-01-2024, 06:09 AM
              0 responses
              30 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 10-30-2024, 05:31 AM
              0 responses
              21 views
              0 likes
              Last Post seqadmin  
              Working...
              X