hello.
So I am trying to trim out duplicate sequences from fastQC, but have come across lots of duplicated sequences for an RNAseq project.
I have already trimmed out the adapters, and I expect the duplication levels to be higher for an RNAseq experiment.
Now after QC generates the sequences that are highly represented, how do I determine from BLAST if I should trim out the duplicated sequences, or to keep them?
I can use trimmomatic to paste the sequences that are dupes, into the adapter.fa file and remove them.
However some of the sequences that are duplicating are RNA from mitochondria which is interesting to look at.
but other sequences are relating to RNA from chromosomes 16, 2, X, etc... should I remove these?
And I found a repeating sequence called, "Homo sapiens unplaced genomic contig, GRCh37.p13 Primary Assembly" .. and am not sure if I should cut this out?
So I am trying to trim out duplicate sequences from fastQC, but have come across lots of duplicated sequences for an RNAseq project.
I have already trimmed out the adapters, and I expect the duplication levels to be higher for an RNAseq experiment.
Now after QC generates the sequences that are highly represented, how do I determine from BLAST if I should trim out the duplicated sequences, or to keep them?
I can use trimmomatic to paste the sequences that are dupes, into the adapter.fa file and remove them.
However some of the sequences that are duplicating are RNA from mitochondria which is interesting to look at.
but other sequences are relating to RNA from chromosomes 16, 2, X, etc... should I remove these?
And I found a repeating sequence called, "Homo sapiens unplaced genomic contig, GRCh37.p13 Primary Assembly" .. and am not sure if I should cut this out?
Comment