Hi
I am working on an Illumina Hiseq paired end whole genome data.
The data shows overrepresentation of adapter sequences such as TruSeq Adapter, Index 14 (97% over 40bp) according to the quality reported by the fastqc analysis.
So I performed trimming of the reads by the TruSeq-PE adapter sequences in trimmomatic data and performed fastqc after trimming.
The post-trimming data shows appearance of new k-mers overrepresented at various positions within 0-50 bp. The bbmap analysis for the percentage match still shows no remarkable mismatch problems, indicating that the k-mers could be genomic.
I have attached the before and after trimming k-mer overpresentation plots herewith.
This observation makes me doubt the need of trimming in paired end data.
My question is :
Do we indeed need to perform trimming in such cases ?
Any suggestion on change in the trimming was needed such as using only the overrepresented adapter sequences and but not other adapters from the same library in the analysis.
Thanks
I am working on an Illumina Hiseq paired end whole genome data.
The data shows overrepresentation of adapter sequences such as TruSeq Adapter, Index 14 (97% over 40bp) according to the quality reported by the fastqc analysis.
So I performed trimming of the reads by the TruSeq-PE adapter sequences in trimmomatic data and performed fastqc after trimming.
The post-trimming data shows appearance of new k-mers overrepresented at various positions within 0-50 bp. The bbmap analysis for the percentage match still shows no remarkable mismatch problems, indicating that the k-mers could be genomic.
I have attached the before and after trimming k-mer overpresentation plots herewith.
This observation makes me doubt the need of trimming in paired end data.
My question is :
Do we indeed need to perform trimming in such cases ?
Any suggestion on change in the trimming was needed such as using only the overrepresented adapter sequences and but not other adapters from the same library in the analysis.
Thanks
Comment