Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Asymmetric trimmomatic output with paired-end RNA seq. data

    I have data from two experiments of 6 samples each that used 2x100 b paired-end Illumina HiSeq 2000 RNA sequencing with unstranded libraries in case of one and stranded libraries in the other. Average insert/fragment lengths in both experiments were ~200 b.

    I used trimmomatic (0.32; on 64-bit Linux) to remove contaminant adapter as well as poor quality sub-sequences from the reads.

    Code:
    java -jar trimmomatic-0.32.jar PE -threads 16 -phred33 sample_1.fastq sample_2.fastq sample_trimmed_paired_1.fastq.gz sample_trimmed_unpaired_1.fastq.gz sample_trimmed_paired_2.fastq.gz sample_trimmed_unpaired_2.fastq.gz ILLUMINACLIP:adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
    For both experiments and for all samples, I find that the trimmed_unpaired_1 files are >5-10 times in size than the trimmed_unpaired_2 files. The trimmed_paired_1 and _2 files are similar in size, as expected. See example file-size listings below.

    What could be the reason for this?


  • #2
    It could mean that the quality in read 2 was lower than read 1, this is why you retain a lot of unpaired read 1 where the corresponding read 2 was dropped due to being trimmed to short or having a very low quality.

    Comment


    • #3
      Originally posted by lorendarith View Post
      It could mean that the quality in read 2 was lower than read 1, this is why you retain a lot of unpaired read 1 where the corresponding read 2 was dropped due to being trimmed to short or having a very low quality.
      This is a possibility. However, it seems unlikely in my case (I examined read qualities using FastQC).

      I wonder if the asymmetry that I am seeing is because I am not using the keepBothReads option of trimmomatic. From the manual:

      After read-though has been detected by palindrome mode, and the adapter sequence removed, the reverse read contains the same sequence information as the forward read, albeit in reverse complement. For this reason, the default behaviour is to entirely drop the reverse read. By specifying 'true' for this parameter, the reverse read will also be retained, which may be useful e.g. if the downstream tools cannot handle a combination of paired and unpaired reads.

      Comment


      • #4
        Originally posted by alpha2zee View Post
        I wonder if the asymmetry that I am seeing is because I am not using the keepBothReads option of trimmomatic.
        It seems this is not the reason. I tested this, with 'ILLUMINACLIP:TruSeq3-PE-2.fa:2:30:10:8:TRUE' (see my first post). Usage of keepBothReads as per the manual is ILLUMINACLIP:<fastaWithAdaptersEtc>:<seed mismatches>:<palindrome clip threshold>:<simple clip threshold>:<minAdapterLength>:<keepBothReads>.

        Comment


        • #5
          I tweaked the trimmomatic run parameters a bit and it seems to have a significant effect: there are less unpaired reads, and the asymmetry between the left and right unpaired reads is less as well.

          Code:
          java -jar trimmomatic-0.32.jar PE -threads 16 -phred33 sample_1.fastq sample_2.fastq sample_trimmed_paired_1.fastq.gz sample_trimmed_unpaired_1.fastq.gz sample_trimmed_paired_2.fastq.gz sample_trimmed_unpaired_2.fastq.gz ILLUMINACLIP:adapters/TruSeq3-PE-2.fa:2:30:10:8:TRUE LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:26

          Comment


          • #6
            yes, it looks like the difference in size of your R1_unpaired file is due to changing the parameters so that Trimmomatic keeps both reads after trimming adapter sequences in palindrome mode, rather than the default behaviour, which is to discard R2, thus leaving R1 as unpaired.

            Comment


            • #7
              Hi,
              just wanted to say that this thread really helped sorting out my trimmomatic call.

              I have smallRNA libraries and due to their nature, a big proportion of the read is adapter. I was running the default mode and was quite unhappy about the results.

              Code:
              java -Xmx1000m -jar ./Trimmomatic-0.32/trimmomatic-0.32.jar PE -threads 4 1.fastq.gz 2.fastq.gz out_1.fq out.unpaired_1.fq out_2.fq out.unpaired_2.fq ILLUMINACLIP:miRNA.neb.solexa.adapters.fasta:2:10:7 MINLEN:15
              The file sizes for the output files were quite discouraging:

              102M Nov 7 10:53 1.out.unpaired_2.fq
              56K Nov 7 10:53 1.out.unpaired_1.fq
              56M Nov 7 10:53 1.out_2.fq
              52M Nov 7 10:53 1.out_1.fq

              After considering the suggested changes in this thread, my new call is:

              Code:
              java -Xmx1000m -jar ./Trimmomatic-0.32/trimmomatic-0.32.jar PE -threads 4 1.fastq.gz 2.fastq.gz out_1.fq out.unpaired_1.fq out_2.fq out.unpaired_2.fq ILLUMINACLIP:miRNA.neb.solexa.adapters.fasta:2:30:10:8:TRUE LEADING:3 TRAILING:3 SLIDINGWINDOW:4:30 MINLEN:15
              And the files' sizes look much better:

              13M Nov 19 13:41 2.out.unpaired_2.fq
              32M Nov 19 13:41 2.out.unpaired_1.fq
              484M Nov 19 13:41 2.out_2.fq
              517M Nov 19 13:41 2.out_1.fq

              PS I am aware that PE is overkill for miRNAs, but SE was not available to us at the time

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 06:37 PM
              0 responses
              8 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, Yesterday, 06:07 PM
              0 responses
              8 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              49 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              66 views
              0 likes
              Last Post seqadmin  
              Working...
              X