Hi there,
Could somebody help in better understanding how kmers and overrepresented sequence can be used to get idea about primer/adapter sequence.
Sequence reads (pair end), which was from Illumina.
I did few preprocessing steps:
1) Remove reads with bad Quality flag, as indicated by ":Y" in header.
2) Used Fastq_qulaity_filter to remove low quality reads
fastq_quality_filter -i R1_QC.fastq -o R1_QC_Filter.fastq -q 20 -p 80 -Q 33 -v
fastq_quality_filter -i R2_QC.fastq -o R2_QC_Filter.fastq -q 20 -p 80 -Q 33 -v
3) Now i want to trim (start/last) bases and remove the adapter sequences.
Which i haven't.
I plot these figures using FASTQC (on quality filtered reads), and i have posted the figures from both pair, and this is how the overrepresented Kmers/sequence look.
How can i get info about adapter/primer from these overrepresented Kmers and sequence.
When i map all these sequence to the genome none of them map to the genome, which are possibly adapter sequences. Can there be more than 2 adapter sequences used in the same experiments ?
IF there are over represented Kmers in the edge (start/end), you can always trim the last 5-6 nt, and you will get rid of those Kmers. How do we treat if there are internal overrepresented Kmers.
Thank you for your help in advance !
regards
CN
Could somebody help in better understanding how kmers and overrepresented sequence can be used to get idea about primer/adapter sequence.
Sequence reads (pair end), which was from Illumina.
I did few preprocessing steps:
1) Remove reads with bad Quality flag, as indicated by ":Y" in header.
2) Used Fastq_qulaity_filter to remove low quality reads
fastq_quality_filter -i R1_QC.fastq -o R1_QC_Filter.fastq -q 20 -p 80 -Q 33 -v
fastq_quality_filter -i R2_QC.fastq -o R2_QC_Filter.fastq -q 20 -p 80 -Q 33 -v
3) Now i want to trim (start/last) bases and remove the adapter sequences.
Which i haven't.
I plot these figures using FASTQC (on quality filtered reads), and i have posted the figures from both pair, and this is how the overrepresented Kmers/sequence look.
How can i get info about adapter/primer from these overrepresented Kmers and sequence.
When i map all these sequence to the genome none of them map to the genome, which are possibly adapter sequences. Can there be more than 2 adapter sequences used in the same experiments ?
IF there are over represented Kmers in the edge (start/end), you can always trim the last 5-6 nt, and you will get rid of those Kmers. How do we treat if there are internal overrepresented Kmers.
Thank you for your help in advance !
regards
CN
Comment