Hi folks,
I'm new here, and I'm in the process of preparing a PCR-generated amplicon sequencing library (composed of many samples) that will be run on a NovaSeq 6000 (2x250bp). I've sequenced exclusively on the MiSeq previously, and some of the major implementation differences I've encountered involve:
This sounds pretty serious, but easy to abide by (i.e., don't use indices beginning with two base calls of G). An intuitive example of an "ideal" index combination example is given:
This is then followed by a less intuitive example of an "acceptable" combination of indices:
It's at least not intuitive to me, mainly because it seems that the index read associated with sample N718 would fail cluster registration (two Gs for the i7 index). I'm assuming that in order for this combination to be "acceptable", reads from N718 are still properly sequenced/indexed/etc, otherwise I don't see how the loss of N718 is in any way acceptable. If that's the assumption I need to make, then I think I have a fundamental misunderstanding of cluster registration--that successful cluster registration doesn't occur on an index-by-index basis, but is actually an outcome for the entire combination of indices.
Because N709 and N712 have a non-G base in position 2, cluster registration proceeds normally--no index is ignored, all samples should receive more-or-less equal numbers of reads (assuming perfect normalization, sequencing, etc.), right? The sense in which this combination is less ideal than the first comes down to the fact that some index positions don't receive two-channel support for i7 and i5. Position 1 for i7 (riskily?) has no channel signal, but is saved by the 2 channel signal from position 2.
This line of thinking seems to be supported in the last "unacceptable combination" table:
As a combination of indices, this fails because both i7 indices have Gs in the critical positions 1 and 2 and no other index with a non-G base in these positions is available. This makes sense, but I still have questions:
I'm new here, and I'm in the process of preparing a PCR-generated amplicon sequencing library (composed of many samples) that will be run on a NovaSeq 6000 (2x250bp). I've sequenced exclusively on the MiSeq previously, and some of the major implementation differences I've encountered involve:
- NovaSeq use of patterened flow cells. Obviously this is what helps achieve some of the enormous read output, but apparently patterned flow cells are associated with greater index-hopping relative to non-patterned flow cells. UDIs are the safest, but other mitigation strategies are possible, etc.
- NovaSeq use of two-channel SBS. This is probably what's giving me greatest pause right now.
Index Reads must begin with at least one base other than G in either of the first two cycles. If an Index Read begins with two base calls of G, no signal intensity is generated and cluster registration will fail. Signal must be present in either of the first two cycles to ensure successful demultiplexing.
This sounds pretty serious, but easy to abide by (i.e., don't use indices beginning with two base calls of G). An intuitive example of an "ideal" index combination example is given:
This is then followed by a less intuitive example of an "acceptable" combination of indices:
It's at least not intuitive to me, mainly because it seems that the index read associated with sample N718 would fail cluster registration (two Gs for the i7 index). I'm assuming that in order for this combination to be "acceptable", reads from N718 are still properly sequenced/indexed/etc, otherwise I don't see how the loss of N718 is in any way acceptable. If that's the assumption I need to make, then I think I have a fundamental misunderstanding of cluster registration--that successful cluster registration doesn't occur on an index-by-index basis, but is actually an outcome for the entire combination of indices.
Because N709 and N712 have a non-G base in position 2, cluster registration proceeds normally--no index is ignored, all samples should receive more-or-less equal numbers of reads (assuming perfect normalization, sequencing, etc.), right? The sense in which this combination is less ideal than the first comes down to the fact that some index positions don't receive two-channel support for i7 and i5. Position 1 for i7 (riskily?) has no channel signal, but is saved by the 2 channel signal from position 2.
This line of thinking seems to be supported in the last "unacceptable combination" table:
As a combination of indices, this fails because both i7 indices have Gs in the critical positions 1 and 2 and no other index with a non-G base in these positions is available. This makes sense, but I still have questions:
- If I have relatively even base representation/color diversity at each index position, does the GG problem even matter? That is, can I still use indices leading with GG if I know my other indices are non-GG in the first two positions?
- Do index combination concerns only apply to i7, or is i5 also affected? If so, do I need to look at the original index sequence (i.e., as ordered from the primer supplier) or the reverse complement of the i5?
- Even if including indices with leading GG is technically "ok" against a background of even base diversity for the first two positions, is there any benefit to omitting the use of these indices in the library prep effort? Does the use of these indices still incur some risk? I have some control over this, and it'd be relatively easy (but not nothing) for me to avoid certain index combinations.