I have data from two experiments of 6 samples each that used 2x100 b paired-end Illumina HiSeq 2000 RNA sequencing with unstranded libraries in case of one and stranded libraries in the other. Average insert/fragment lengths in both experiments were ~200 b.
I used trimmomatic (0.32; on 64-bit Linux) to remove contaminant adapter as well as poor quality sub-sequences from the reads.
For both experiments and for all samples, I find that the trimmed_unpaired_1 files are >5-10 times in size than the trimmed_unpaired_2 files. The trimmed_paired_1 and _2 files are similar in size, as expected. See example file-size listings below.
What could be the reason for this?
I used trimmomatic (0.32; on 64-bit Linux) to remove contaminant adapter as well as poor quality sub-sequences from the reads.
Code:
java -jar trimmomatic-0.32.jar PE -threads 16 -phred33 sample_1.fastq sample_2.fastq sample_trimmed_paired_1.fastq.gz sample_trimmed_unpaired_1.fastq.gz sample_trimmed_paired_2.fastq.gz sample_trimmed_unpaired_2.fastq.gz ILLUMINACLIP:adapters/TruSeq3-PE-2.fa:2:30:10 LEADING:3 TRAILING:3 SLIDINGWINDOW:4:15 MINLEN:36
What could be the reason for this?
Comment