Hi,
I am working on a re-sequencing project and have sequenced some whole genomes using Illumina HiSeq 2000 (150 bp paired end reads), which I hope to later align to an existing reference genome. I would like to remove any possible adapter contamination with Trimmomatic, but have run into the problem that in 70- 80% of my reads the reverse read is being dropped and the forward only is surviving. When I use the "keep both reads" parameter, then both pairs survive for about 97% of reads. So my question is... does this mean that more than 70% of my reads have "adapter read through", or have I done something wrong in my adapter file?
The adapters used were:
P5 adapter: AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
P7 adapter: CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT
The adapter file I created looks as follows:
>PrefixPE/1
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
>PrefixPE/2
CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT
>P5
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
>P7
CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT
An example of my script (using purposefully lenient quality control)...
java -jar ~/bin/trimmomatic.jar PE -phred33 -trimlog ten_trimLog 10_R1.gz 10_R2.gz Ten_out_1P.fq.gz Ten_out_1U.fq.gz Ten_out_2P.fq.gz Ten_out_2U.fq.gz ILLUMINACLIP:P5_P7.fa:2:30:10 LEADING:2 TRAILING:2 MAXINFO:40:0.2 MINLEN:36
... and the resulting output:
ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Read Pairs: 41173498 Both Surviving: 9680654 (23.51%) Forward Only Surviving: 30355811 (73.73%) Reverse Only Surviving: 26285 (0.06%) Dropped: 1110748 (2.70%)
TrimmomaticPE: Completed successfully
I'm new to Trimmomatic, so apologize in advance if this is something obvious!
Thanks!
Meli
I am working on a re-sequencing project and have sequenced some whole genomes using Illumina HiSeq 2000 (150 bp paired end reads), which I hope to later align to an existing reference genome. I would like to remove any possible adapter contamination with Trimmomatic, but have run into the problem that in 70- 80% of my reads the reverse read is being dropped and the forward only is surviving. When I use the "keep both reads" parameter, then both pairs survive for about 97% of reads. So my question is... does this mean that more than 70% of my reads have "adapter read through", or have I done something wrong in my adapter file?
The adapters used were:
P5 adapter: AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
P7 adapter: CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT
The adapter file I created looks as follows:
>PrefixPE/1
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
>PrefixPE/2
CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT
>P5
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
>P7
CAAGCAGAAGACGGCATACGAGATNNNNNNGTGACTGGAGTTCAGACGTGTGCTCTTCCGATCT
An example of my script (using purposefully lenient quality control)...
java -jar ~/bin/trimmomatic.jar PE -phred33 -trimlog ten_trimLog 10_R1.gz 10_R2.gz Ten_out_1P.fq.gz Ten_out_1U.fq.gz Ten_out_2P.fq.gz Ten_out_2U.fq.gz ILLUMINACLIP:P5_P7.fa:2:30:10 LEADING:2 TRAILING:2 MAXINFO:40:0.2 MINLEN:36
... and the resulting output:
ILLUMINACLIP: Using 1 prefix pairs, 4 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Read Pairs: 41173498 Both Surviving: 9680654 (23.51%) Forward Only Surviving: 30355811 (73.73%) Reverse Only Surviving: 26285 (0.06%) Dropped: 1110748 (2.70%)
TrimmomaticPE: Completed successfully
I'm new to Trimmomatic, so apologize in advance if this is something obvious!
Thanks!
Meli
Comment