Hi All,
I've been lurking for a while here trying to get a feel for some Illumina transcriptomic data I recently acquired (1 Lane 2 x 76bp).
After running FastQC, the following checks pass with flying colors:
Per base sequence quality
Per sequence quality scores
Per base N content
but I get red flags for the following with (what appear to me strange) staggered graphs:
Per base sequence content --- Strange staggered graph, peaks are staggered every 3bases
Per base GC content --- Same behavior as previous
Per sequence GC content --- GC Distribution appears to indicate contamination -- or possibly DNA sequenced from an organelle with a GC bias different from the host
Sequence duplication Levels -- uhoh, 63% duplication? Could this just indicate that we have more than full coverage of the organism's transcriptome?
Kmer Content -- No idea what this indicates, but must be related to strange staggered peaks in Per Base Sequence Content graph somehow?
Anyway, if anyone could shed some light on what the staggered graphs mean in terms of my data quality, I would appreciate any insight.
BTW> The graphs look similar before AND after primer / adapter clipping on both sides.
thanks & aloha
I've been lurking for a while here trying to get a feel for some Illumina transcriptomic data I recently acquired (1 Lane 2 x 76bp).
After running FastQC, the following checks pass with flying colors:
Per base sequence quality
Per sequence quality scores
Per base N content
but I get red flags for the following with (what appear to me strange) staggered graphs:
Per base sequence content --- Strange staggered graph, peaks are staggered every 3bases
Per base GC content --- Same behavior as previous
Per sequence GC content --- GC Distribution appears to indicate contamination -- or possibly DNA sequenced from an organelle with a GC bias different from the host
Sequence duplication Levels -- uhoh, 63% duplication? Could this just indicate that we have more than full coverage of the organism's transcriptome?
Kmer Content -- No idea what this indicates, but must be related to strange staggered peaks in Per Base Sequence Content graph somehow?
Anyway, if anyone could shed some light on what the staggered graphs mean in terms of my data quality, I would appreciate any insight.
BTW> The graphs look similar before AND after primer / adapter clipping on both sides.
thanks & aloha
Comment