Dear all,
we've run a Illumina TruSeq mRNA nonstranded library (mouse brain lower end of input recommendation) on a MiSeq for quality control with v3 Reagent Kit, 2*75 PE. The run has very good QCs, good cluster density and some 22 Mio reads.
After pairing the reads, Picard MarkDuplicates reports more then 3 Mio optical duplicates (i.e. duplicates less then 100pixel apart on the flow cell, similar results for 10pixel) next to roughly the same amount of "real" PCR duplicates (50% of QC20 reads after throwing the optical duplicates).
If analyzing the SE data without pairing, the number of optical duplicates is reduced to 0, thus it seems not to be a cluster-read failure.
For me, this indicates that there's 3 Mio clusters on the flowcell next to a cluster with their reverse complement strand which seems to be much above chance level even with a high PCR-duplicate RNA-Seq?
Potential explanations put forward by the representative contacted are incomplete library denaturation prior to loading (thus partially dsDNA library molecules hybridize to the flowcell and build two reverse complement clusters) or low complexity of the library (however, 3 Mio sounds really above chance level for me, the comeplexity of the library doesn't seem to be that bad?)
Has anybody ever seen this? Any ideas what is causing this high number of optical duplicates?
Or is this number of duplicates simply expected on a standard RNA-Seq on low input (~200ng)?
Any ideas or recommendations are very much appreciated!
Many thanks!
Regards,
Mareen
Edit1:
Ps.: Not sure which heading this thread should go to, please move if you have a better idea. Thanks!
Edit2:
library PCR was 12 cycles
we've run a Illumina TruSeq mRNA nonstranded library (mouse brain lower end of input recommendation) on a MiSeq for quality control with v3 Reagent Kit, 2*75 PE. The run has very good QCs, good cluster density and some 22 Mio reads.
After pairing the reads, Picard MarkDuplicates reports more then 3 Mio optical duplicates (i.e. duplicates less then 100pixel apart on the flow cell, similar results for 10pixel) next to roughly the same amount of "real" PCR duplicates (50% of QC20 reads after throwing the optical duplicates).
If analyzing the SE data without pairing, the number of optical duplicates is reduced to 0, thus it seems not to be a cluster-read failure.
For me, this indicates that there's 3 Mio clusters on the flowcell next to a cluster with their reverse complement strand which seems to be much above chance level even with a high PCR-duplicate RNA-Seq?
Potential explanations put forward by the representative contacted are incomplete library denaturation prior to loading (thus partially dsDNA library molecules hybridize to the flowcell and build two reverse complement clusters) or low complexity of the library (however, 3 Mio sounds really above chance level for me, the comeplexity of the library doesn't seem to be that bad?)
Has anybody ever seen this? Any ideas what is causing this high number of optical duplicates?
Or is this number of duplicates simply expected on a standard RNA-Seq on low input (~200ng)?
Any ideas or recommendations are very much appreciated!
Many thanks!
Regards,
Mareen
Edit1:
Ps.: Not sure which heading this thread should go to, please move if you have a better idea. Thanks!
Edit2:
library PCR was 12 cycles
Comment