Hi, All,
I have some problems about using Trimmomatic on bisulfite sequencing data sets.
For example, I downloaded the raw data of human methylome from Lister et al. 2009 Nature (SRR019072.sra), and got the FASTQ file by the sratoolkit.
Next, I tried the Trimmomatic to trim the adaptors and other low-quality sequences.
This is the command I used.
java -classpath trimmomatic-0.20.jar org.usadellab.trimmomatic.TrimmomaticSE -threads 10 -phred64 SRR019072.fastq SRR019072.trim.fq ILLUMINACLIP:remove_adaptor_PCR.fa:2:40:15 LEADING:2 TRAILING:2 MINLEN:60
Because they were using the single-end adaptors, the remove_adaptor_PCR.fa file is as follows:
>Prefix_PCR_PRIMER_SEQUENCE/1
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
>Prefix_PCR_PRIMER_SEQUENCE/2
CAAGCAGAAGACGGCATACGAGCTCTTCCGATCT
>ADAPTOR_SEQUENCE_B
ACACTCTTTCCCTACACGACGCTCTTCCGATCT
>ADAPTOR_SEQUENCE_A
GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG
This is the running results:
ILLUMINACLIP: Using 1 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Reads: 13074151 Surviving: 4569673 (34.95%) Dropped: 8504478 (65.05%)
TrimmomaticSE: Completed successfully
It seems that 65% of reads were dropped.
Moreover, even 65% reads were dropped, the trimmed results still did not pass the FastQC, especially for the "Per base sequence content" and "Kmer Content".
Therefore, I need your helps to figure out which parameter I used was not proper? Or the remove_adaptor_PCR.fa is not correct?
Do we need specific parameters for bisulfite sequencing data?
Many thanks and best regards,
Jerry
I have some problems about using Trimmomatic on bisulfite sequencing data sets.
For example, I downloaded the raw data of human methylome from Lister et al. 2009 Nature (SRR019072.sra), and got the FASTQ file by the sratoolkit.
Next, I tried the Trimmomatic to trim the adaptors and other low-quality sequences.
This is the command I used.
java -classpath trimmomatic-0.20.jar org.usadellab.trimmomatic.TrimmomaticSE -threads 10 -phred64 SRR019072.fastq SRR019072.trim.fq ILLUMINACLIP:remove_adaptor_PCR.fa:2:40:15 LEADING:2 TRAILING:2 MINLEN:60
Because they were using the single-end adaptors, the remove_adaptor_PCR.fa file is as follows:
>Prefix_PCR_PRIMER_SEQUENCE/1
AATGATACGGCGACCACCGAGATCTACACTCTTTCCCTACACGACGCTCTTCCGATCT
>Prefix_PCR_PRIMER_SEQUENCE/2
CAAGCAGAAGACGGCATACGAGCTCTTCCGATCT
>ADAPTOR_SEQUENCE_B
ACACTCTTTCCCTACACGACGCTCTTCCGATCT
>ADAPTOR_SEQUENCE_A
GATCGGAAGAGCTCGTATGCCGTCTTCTGCTTG
This is the running results:
ILLUMINACLIP: Using 1 prefix pairs, 2 forward/reverse sequences, 0 forward only sequences, 0 reverse only sequences
Input Reads: 13074151 Surviving: 4569673 (34.95%) Dropped: 8504478 (65.05%)
TrimmomaticSE: Completed successfully
It seems that 65% of reads were dropped.
Moreover, even 65% reads were dropped, the trimmed results still did not pass the FastQC, especially for the "Per base sequence content" and "Kmer Content".
Therefore, I need your helps to figure out which parameter I used was not proper? Or the remove_adaptor_PCR.fa is not correct?
Do we need specific parameters for bisulfite sequencing data?
Many thanks and best regards,
Jerry