Hello all,
I'm currently analyzing a single-end 50bp-read RNAseq data, which was sequenced at an outside facility. I've got a very naive question, since I'm relatively new to all this.
The facility provided me with what they call raw reads, containing sequencing adaptors etc. In addition to that, I also have the pre-processed "clean" reads. The details of the "cleaning", as they informed me, are as follows:
I've already used these for alignment and other downstream analyses, but I just wanted to make sure so went ahead to quality check the "clean" fastq files with FASTQC, which gives me an error that the sequence duplication levels are high(roughly >66% in average for each sample I have)
I think this is because of the "cleaning" process, enriching the fastqs for higher quality data, but could this be due to any error during the library preparation step, or anything else? Would it even make sense QC'ing these processed fastq files?
Ege
I'm currently analyzing a single-end 50bp-read RNAseq data, which was sequenced at an outside facility. I've got a very naive question, since I'm relatively new to all this.
The facility provided me with what they call raw reads, containing sequencing adaptors etc. In addition to that, I also have the pre-processed "clean" reads. The details of the "cleaning", as they informed me, are as follows:
1. Remove reads with adaptor sequences. 2. Remove reads in which the percentage of unknown bases (N) is greater than 10%. 3. Remove low quality reads. If the percentage of the low quality base (base with quality value ≤ 5) is greater than 50% in a read, we define this read as low quality.
I think this is because of the "cleaning" process, enriching the fastqs for higher quality data, but could this be due to any error during the library preparation step, or anything else? Would it even make sense QC'ing these processed fastq files?
Ege
Comment