Hi all,
I am having some issues with getting rid of kmers in my sequencing data sets. We have re-sequenced several African buffalo genomes on an Illumina HiSeq X machine, 150bp paired-end reads. I ran FastQC on the raw reads and all samples pass the adapters and overrepresented sequences tests, but some fail the kmer test (length = 7 nucleotides). The samples either have kmers overrepresented at the 5' or 3' end. I can see that at least the 3' kmers are actually part of the Illumina adapters used (but FastQC is not picking these up in the adapter test), but the 5' kmers do not seem to be part of the adapters, as far as I can tell.
Trimmomatic successfully removes the 3' kmers of one sample, but not the 5' kmers of another sample. The same goes for Trim Galore (cutadapt), which can only remove adapter sequences from the 3' end of the reads (as far as I can tell from the cutadapt documentation). The per base quality is almost always above phred=20. This is the relevant part of the code I used for Trimmomatic and Trim Galore:
Trimmomatic: ILLUMINACLIP:/apps/chpc/bio/trimmomatic/0.36/bin/adapters/TruSeq3-PE-2.fa:2:30:10:1:true SLIDINGWINDOW:4:20 MINLEN:36
Trim Galore: trim_galore -q 20 --phred33 --stringency 1 -e 0.1 --length 36 -o /mnt/lustre/users/djager/trim_galore_out --paired --retain_unpaired -r1 37 -r2 37
Is there any way I can remove the 5' kmers without having to trim the 5' nucleotides of all the reads? Are these 5' kmers perhaps biological sequences?
The FastQC results are attached. The examples are all for the forward reads.
I am having some issues with getting rid of kmers in my sequencing data sets. We have re-sequenced several African buffalo genomes on an Illumina HiSeq X machine, 150bp paired-end reads. I ran FastQC on the raw reads and all samples pass the adapters and overrepresented sequences tests, but some fail the kmer test (length = 7 nucleotides). The samples either have kmers overrepresented at the 5' or 3' end. I can see that at least the 3' kmers are actually part of the Illumina adapters used (but FastQC is not picking these up in the adapter test), but the 5' kmers do not seem to be part of the adapters, as far as I can tell.
Trimmomatic successfully removes the 3' kmers of one sample, but not the 5' kmers of another sample. The same goes for Trim Galore (cutadapt), which can only remove adapter sequences from the 3' end of the reads (as far as I can tell from the cutadapt documentation). The per base quality is almost always above phred=20. This is the relevant part of the code I used for Trimmomatic and Trim Galore:
Trimmomatic: ILLUMINACLIP:/apps/chpc/bio/trimmomatic/0.36/bin/adapters/TruSeq3-PE-2.fa:2:30:10:1:true SLIDINGWINDOW:4:20 MINLEN:36
Trim Galore: trim_galore -q 20 --phred33 --stringency 1 -e 0.1 --length 36 -o /mnt/lustre/users/djager/trim_galore_out --paired --retain_unpaired -r1 37 -r2 37
Is there any way I can remove the 5' kmers without having to trim the 5' nucleotides of all the reads? Are these 5' kmers perhaps biological sequences?
The FastQC results are attached. The examples are all for the forward reads.
Comment