Rules for making your own index

PabloMarin-Garcia replied

04-05-2012, 04:42 PM
illumina recomendation for index selection

Originally posted by ETHANol View Post

It's funny, I found this on the internet some time ago and follow it but Illumina hasn't followed up on it so I just assumed it wasn't a problem. Apparently, it can be. Which leads one to ask, why is this not mentioned in any of the library preparation manuals.

They explain here for nextera:

404 Resource at '/content/dam/illumina-marketing/documents/products/technotes/technote_nextera_low_plex_pooling_guidelines.pdf' not found: No resource found

http://www.illumina.com/documents/products/technotes/technote_nextera_low_plex_pooling_guidelines.pdf
Leave a comment:
TonyBrooks replied

03-15-2012, 07:00 AM
On a related note; I have a bunch of indexes (The Sanger 96-plex ones) that I'd like to use for low plexing too (4plex). These indexes are 8 bases long. Is there anything stopping just reading the first 6 bases (as per standard illumina indexing) on the GAIIx/HiSeq as long as there is AC/GT balance at all 6 positions? I only want to do this on one lane, so don't need to read 8 cycles on the other 7 lanes.
Leave a comment:
HESmith replied

02-16-2012, 07:42 AM
Originally posted by ETHANol View Post

http://www.plosone.org/article/info%...l.pone.0016607

With a 5' index you have the invariant T required for adapter ligation in all libraries. I guess it doesn't cause too much of a problem because people use this strategy, but it is something to think about nonetheless. Has this caused problems for anyone?

I assume you mean barcodes that are part of the adapter. Unless your index is only three bases long, you should be okay (but I haven't done the experiment). You could also resolve the problem by using indices of different length so the T is phase-shifted, and balance the other nucleotides for that cycle.
Leave a comment:
kentk replied

02-16-2012, 07:41 AM
Originally posted by ETHANol View Post

http://www.plosone.org/article/info%...l.pone.0016607
With a 5' index you have the invariant T required for adapter ligation in all libraries. I guess it doesn't cause too much of a problem because people use this strategy, but it is something to think about nonetheless. Has this caused problems for anyone?

Article looks interesting. I'll have to read it through first. Thanks again ETHANol

You mean the T for the T-A ligation right? No I don't think it's a problem because that T (or actually its complement A) anneals to the last base of the sequencing primer so essentially it's not part of the read
Leave a comment:
HESmith replied

02-16-2012, 07:37 AM
Originally posted by kentk View Post

Thanks guys. Yes I was planning to introduce a 5' index. Being able to multiplex only at multiples of 4 isn't a problem. Just need to multiplex into the hundreds.

I think I've read the same post by pmiguel mentioning index reads should always contain a A/C and G/T at each position that is why I was curious why all bases should be in equal proportion.

There's a distinction b/t the Illumina index read (which is separate) vs. barcodes that are incorporated at the start of your insert. In addition to cluster calling, I believe that the measured signal intensities for the first four cycles are used to calibrate values that are utilized for the remainder of the run (e.g., signal-to-noise), which would obviously affect the data if the bases are not equally represented in those cycles.

For the index read, A/C vs. G/T is usually sufficient to discriminate between a small number of barcodes.

Harold

Harold
Leave a comment:
ETHANol replied

02-16-2012, 07:27 AM
Thanks Harold!
Leave a comment:
ETHANol replied

02-16-2012, 07:25 AM
Large Scale Loss of Data in Low-Diversity Illumina Sequencing Libraries Can Be Recovered by Deferred Cluster Calling

http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0016607

Massively parallel DNA sequencing is capable of sequencing tens of millions of DNA fragments at the same time. However, sequence bias in the initial cycles, which are used to determine the coordinates of individual clusters, causes a loss of fidelity in cluster identification on Illumina Genome Analysers. This can result in a significant reduction in the numbers of clusters that can be analysed. Such low sample diversity is an intrinsic problem of sequencing libraries that are generated by restriction enzyme digestion, such as e4C-seq or reduced-representation libraries. Similarly, this problem can also arise through the combined sequencing of barcoded, multiplexed libraries. We describe a procedure to defer the mapping of cluster coordinates until low-diversity sequences have been passed. This simple procedure can recover substantial amounts of next generation sequencing data that would otherwise be lost.

With a 5' index you have the invariant T required for adapter ligation in all libraries. I guess it doesn't cause too much of a problem because people use this strategy, but it is something to think about nonetheless. Has this caused problems for anyone?
Leave a comment:
HESmith replied

02-16-2012, 07:24 AM
Originally posted by ETHANol View Post

HESmith, Thanks for the correction. I'm curious here. How do you determine that the indices are called incorrectly? How do I go about performing QC on the index read?

I examined the frequencies of different indices in the Undetermined directory. The most common were one-base mismatches with the correct indices (we required perfect matches for demultiplexing), but there were nearly as many with two or more mismatches. Also, there were some pseudotiles with all Ns in the index despite high quality insert reads. Note that we observed this problem only at very high cluster densities.

You can use SAV or HCS to visualize the Q-scores for the index cycles. They are usually a bit lower than read one; if they're a lot lower, be concerned. A high fraction (>3-4%) of reads in the Undetermined directory is another indication of poor index reads.

Harold
Leave a comment:
kentk replied

02-16-2012, 07:18 AM
Thanks guys. Yes I was planning to introduce a 5' index. Being able to multiplex only at multiples of 4 isn't a problem. Just need to multiplex into the hundreds.

I think I've read the same post by pmiguel mentioning index reads should always contain a A/C and G/T at each position that is why I was curious why all bases should be in equal proportion.
Leave a comment:
ETHANol replied

02-16-2012, 07:09 AM
It's funny, I found this on the internet some time ago and follow it but Illumina hasn't followed up on it so I just assumed it wasn't a problem. Apparently, it can be. Which leads one to ask, why is this not mentioned in any of the library preparation manuals.

I think pmiguel has said that base balanced base composition for the index read is important on the HiScan.

1. Some sequencing experiments require the use of fewer than 12 index sequences in a lane with a high cluster density. In such cases, select indexes carefully to ensure optimum base calling and demultiplexing by having different bases at each cycle of the index read. Illumina recommends the following sets of indexes for low-level pooling experiments.
Pool of 2 samples:
• Index #6 GCCAAT • Index #12 CTTGTA

Pool of 3 samples:
• Index #4 TGACCA • Index #6 GCCAAT • Index #12 CTTGTA

Pool of 6 samples: • Index #2 CGATGT • Index #4 TGACCA • Index #5 ACAGTG • Index #6 GCCAAT • Index #7 CAGATC • Index #12 CTTGTA
Leave a comment:
ETHANol replied

02-16-2012, 07:01 AM
HESmith, Thanks for the correction. I'm curious here. How do you determine that the indices are called incorrectly? How do I go about performing QC on the index read?
Leave a comment:
HESmith replied

02-16-2012, 06:52 AM
Actually, ETHANol's statement is not 100% accurate. On the HiSeq, high cluster densities (900-1000K) have a more deleterious effect on index reads than inserts. We've had several flow cells with good cluster calling (80-90% PF) and high quality scores (mean ~38), yet fewer than 50% of the indices were called accurately. In some cases, pseudotiles at the inflow side (which contain higher cluster densities) have completely dropped out (i.e., no basecalling) during the index read after producing high-quality insert reads. The problem can be mitigated by balancing the ratio of index bases that are excited by the same laser (A/C or G/T).

If your second index is at the start of read one, then you absolutely have to use all four bases in roughly equal proportions for the first four cycles (which is when cluster calling occurs).
Leave a comment:
ETHANol replied

02-16-2012, 06:29 AM
On the HiSeq, you need balance nucleotide composition at the beginning of the sequencing read but not the barcode read. Otherwise you could only multiplex in multiples of four. Which was one of the drawbacks of putting the barcode at the beginning of the sequencing read. Maybe the MiSeq is more picky about the barcode read, I don't know.
Leave a comment:
kentk replied

02-16-2012, 06:23 AM
Yes I should maximize hamming distance.
But for example I have 4 indexes...

5' ATGCAT
5' TGAACG
5' GCTGTC
5' AGCTGC

An Illumina representative mentioned that I can't use that index set because the first, second and last bases will not have all of the four bases. So one the flowcell for base 1, I'll have signals for A, T, G clusters but not for C so our machine (MiSeq) will trash that cycle. Well this is what I understood from our conversation.

Any thoughts?
Leave a comment:
ETHANol replied

02-16-2012, 06:10 AM
Not really sure if I am getting what you are saying but it's best to keep all the hamming distance >1 for all the barcodes.
Leave a comment:

Previous 1 2 template Next

Recent Advances in Sequencing Analysis Tools

by seqadmin

The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
- Channel: Articles
05-06-2024, 07:48 AM
Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM

Topics	Statistics	Last Post
A Closer Look at the Enigmatic Genomes of Oikopleura dioica by seqadmin Started by seqadmin, 05-10-2024, 06:35 AM	0 responses 19 views 0 likes	Last Post by seqadmin 05-10-2024, 06:35 AM
Advanced Epigenome Editing Platform Explores Gene Regulation Mechanisms by seqadmin Started by seqadmin, 05-09-2024, 02:46 PM	0 responses 22 views 0 likes	Last Post by seqadmin 05-09-2024, 02:46 PM
Telomere Maintenance by PARP1: A New Perspective in Cancer Research by seqadmin Started by seqadmin, 05-07-2024, 06:57 AM	0 responses 21 views 0 likes	Last Post by seqadmin 05-07-2024, 06:57 AM
Enhanced Neoantigen Detection: Introducing NeoHunter by seqadmin Started by seqadmin, 05-06-2024, 07:17 AM	0 responses 21 views 0 likes	Last Post by seqadmin 05-06-2024, 07:17 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News