I have Illumina RNA-seq data. I have been finding a lot of overrepresented sequences in the library (>0.1% of total, some are even 0.3%) in all the samples I have.
These over-represented sequences are 50bp long and do not have any hits to known adapter/primer sequences.
Few of these over-represented sequences are simply string of polyA's (which can be trimmed easily), but majority of them have hits to human mitochondrial genome (this is based on the BLAST nt similarity search), for example, "CTTTGTGTTTGAGGGGGTGATCTAAAACACTCTTTACGCCGGCTTCTATT".
I am struggling on how to process or clean-up this raw data before doing any mapping or downstream analyses. I guess my question is:
1) Is this a result of contamination? If so, how should one proceed, I mean should I trim reads with these sequences or remove the reads with these sequences completely or leave them in there? what is the best approach?
2) Could these be biologically significant sequences? just happen to be similar to mitochondrial DNA. How would I know that? Is there a way to measure this?
I would really appreciate your help or any suggestions you might have.
Thank you!
These over-represented sequences are 50bp long and do not have any hits to known adapter/primer sequences.
Few of these over-represented sequences are simply string of polyA's (which can be trimmed easily), but majority of them have hits to human mitochondrial genome (this is based on the BLAST nt similarity search), for example, "CTTTGTGTTTGAGGGGGTGATCTAAAACACTCTTTACGCCGGCTTCTATT".
I am struggling on how to process or clean-up this raw data before doing any mapping or downstream analyses. I guess my question is:
1) Is this a result of contamination? If so, how should one proceed, I mean should I trim reads with these sequences or remove the reads with these sequences completely or leave them in there? what is the best approach?
2) Could these be biologically significant sequences? just happen to be similar to mitochondrial DNA. How would I know that? Is there a way to measure this?
I would really appreciate your help or any suggestions you might have.
Thank you!
Comment