Could you please expand on what you mean by "Indexes with specific DNA composition patterns". We have been tearing our hair out recently because suddenly the quality of our index reads is horrible, leading to massive loss of sequence data because we can't determine the index sequence. This is happening on both our HiSeq2k and GAIIx. We have considered, and tentatively ruled out cluster density and degree of barcode diversity as the source of the problem. Any findings you could share would be greatly appreciated.
What we noticed during some of our multiplex-runs (single-end) was that the sequencing of the actual read was great... high quality scores, several million reads, etc. The indexes on the other hand were saturated with N's. Some indexes had the occasional N but as you know, CASAVA has a flag to handle such situations.
Nonetheless, we ran CASAVA demultiplexing on this dataset. The resultant CASAVA-build had very few reads in it simply due to the fact that the indexes are not mapped properly (too many Ns). What we saw is that samples that failed had indexes with a high T and/or G content. Whether it was index-sequencing error or these bases played a role in failed samples.. it's tough to say.
Have you looked at the thumbnails and found any anomalies?
Leave a comment: