Seqanswers Leaderboard Ad

**ECO** · 02-06-2012, 11:19 PM

How many samples and what size are your amplicons?

**nancysch** · 02-06-2012, 11:23 PM

to start, just 4 amplicons of sizes 130-200 (without adaptors), and 32 samples.

**NextGenSeq** · 02-07-2012, 07:54 AM

That's a very low complexity library. You'll probably have to spike a lot of PhiX in to get good results.

**TonyBrooks** · 02-07-2012, 08:26 AM

I've often wondered if you couldn't include some Ns in the primer for low complexity libraries which you could then remove post sequencing along with your target primer. So your fwd primer would look like this

[P5][Seq Primer]NNNNN[Target Primer]

In theory the first 5 bases would be random which would give decent cluster identification.

There's probably some obvious reason why this wouldn't work that I haven't thought of yet.

Or maybe designing PCR's to both strands would increase the complexity enough?

**krobison** · 02-07-2012, 11:35 AM

Originally posted by TonyBrooks View Post

I've often wondered if you couldn't include some Ns in the primer for low complexity libraries which you could then remove post sequencing along with your target primer. So your fwd primer would look like this

[P5][Seq Primer]NNNNN[Target Primer]

In theory the first 5 bases would be random which would give decent cluster identification.

There's probably some obvious reason why this wouldn't work that I haven't thought of yet.

Or maybe designing PCR's to both strands would increase the complexity enough?

A related trick I saw published was to use a variable length sequence between the Illumina primers & targeting sequence; this way the targeting sequences aren't all in phase & the complexity is significantly increased in the eye of the cluster caller.

**CoastsideOldRider** · 02-09-2012, 09:31 PM

Do you have the reference for the paper mentioned in your post?

Thanks!
Marc

**nancysch** · 02-14-2012, 12:14 PM

is it this one?

Kindle et al., Detection and Quantification of Rare Variants with Massively Parallel Sequencing. PNAS doi:10.1073 (April 2011)

**ShiveringFire** · 03-28-2012, 11:38 AM

Cluster caller?

Originally posted by krobison View Post

A related trick I saw published was to use a variable length sequence between the Illumina primers & targeting sequence; this way the targeting sequences aren't all in phase & the complexity is significantly increased in the eye of the cluster caller.

Can you educate me more on "cluster caller"? I recently got results from MiSeq Paired-End run (150bp) and I suspect there's a double read problem. Qscores are rather wavy at the beginning.

My samples are multiplexed targeted re-sequencing of exomes from 16 genotypes. Could this be due to a low complexity issue discussed on this thread?

I also heard rumors that if the first 4bp of a read is identical (which is very likely in targeted re-sequencing) it will be assigned to the same cluster. Is this true?

**pmiguel** · 03-29-2012, 08:55 AM

Originally posted by ShiveringFire View Post

Can you educate me more on "cluster caller"? I recently got results from MiSeq Paired-End run (150bp) and I suspect there's a double read problem. Qscores are rather wavy at the beginning.

My samples are multiplexed targeted re-sequencing of exomes from 16 genotypes. Could this be due to a low complexity issue discussed on this thread?

I also heard rumors that if the first 4bp of a read is identical (which is very likely in targeted re-sequencing) it will be assigned to the same cluster. Is this true?

I am thinking about this a lot. Especially with respect to possibly streamlining our operations by dumping our GS-FLX.

Our MiSeq is apparently 6 weeks from being delivered, so I can only extrapolate from the performance of our HiScanSQ (which is similar to a HiSeq). There are two issues masquerading as a single "problem".

(1) The first is that the actual instrument focusing software goes nuts and misfocuses if you have an empty tile (or set of tiles?) early in the run. Probably mainly during the first cycle. For our HiScanSQ this means if you don't have a fair number of clusters in the G&T channel and the A&C channel, you have a good chance that the instrument will pick the wrong focal plan to image.
Doesn't anyone have connections inside Illumina they could point out this blunder to? If focus is good for one channel and bad for the other, why not just use the good focal point?
Anyway, I think this is only an issue during the first few cycles (maybe only the first) of a read. Then the focal points seem to be carried along with little (or no) adjustments.
The effect? Once you are out of focus, you are hosed for that tile for the run, I think. (Or "psuedo-tile" -- but I'll just presume that everyone reading this realizes that no masonry is involved in this process and leave it at "tile".) This applies no matter how perfect your cluster spacing is.

(2) If the instrument did not hose up its focal plane during the first cycle, you can still have diminished results if you have a low complexity of base calls. This is the one everyone thinks about -- two adjacent clusters give the same base calls for 4 bases. If they are very close together, then neither of them may "register" (if I use the terminology correctly), well at all. This is a little more understandable. Except that Illumina foreclosed the use of methodology that could circumvent this issue on the GA-IIx for the HiSeq by making the early layers of data processing possible only on the instrument console.

So if issue 1 doesn't kill you, issue 2 probably will. Work-arounds are required.

--
Phillip

**ShiveringFire** · 04-05-2012, 09:16 AM

no more deferred cluster calling

Thanks Philip,

I came across this paper formally defining the problem and a possible solution:

Large Scale Loss of Data in Low-Diversity Illumina Sequencing Libraries Can Be Recovered by Deferred Cluster Calling

http://www.plosone.org/article/info%3Adoi%2F10.1371%2Fjournal.pone.0016607

Massively parallel DNA sequencing is capable of sequencing tens of millions of DNA fragments at the same time. However, sequence bias in the initial cycles, which are used to determine the coordinates of individual clusters, causes a loss of fidelity in cluster identification on Illumina Genome Analysers. This can result in a significant reduction in the numbers of clusters that can be analysed. Such low sample diversity is an intrinsic problem of sequencing libraries that are generated by restriction enzyme digestion, such as e4C-seq or reduced-representation libraries. Similarly, this problem can also arise through the combined sequencing of barcoded, multiplexed libraries. We describe a procedure to defer the mapping of cluster coordinates until low-diversity sequences have been passed. This simple procedure can recover substantial amounts of next generation sequencing data that would otherwise be lost.

Sadly I was told that MiSeq doesn't save raw image files so a "deferred cluster calling" is not possible anymore. This sounds like getting rid of the evidence for a dirty job...

So spiking a lot of pHiX seems like the only solution for low complexity. Can one spike phiX even in custom made libraries?

**NextGenSeq** · 04-06-2012, 06:40 AM

Yes, you just buy the PhiX control from Illumina and spike it in.

**gnomers** · 07-24-2012, 06:48 AM

I am looking for an alternative to spiking in 50% or more PhiX while sequencing low diversity amplicon libraries.

I am wondering if using longer barcodes with balanced representation of bases (up to 24 bases) would be sufficient.

Additionally, does anyone know if I can specify barcodes of different lengths on a MiSeq sample sheet?

**bbeitzel** · 07-24-2012, 10:25 AM

I don't think balanced barcodes will work. The index read occurs after read 1 so the instrument will still have issues calling clusters.

I am doing some tag sequencing that requires a custom library prep protocol, and I have the same issue with homogeneous nucleotides for the first 10 bases of read 1. Illumina tech support recommended adding a stretch of 12 Ns (not 6) right after the read 1 primer binding site. It seems like that would be fairly straightforward to add to primers for amplicon sequencing.

I am pretty sure (but don't quote me on this) that you can specify indeces of different lengths on the sample sheet. I think that the 2 index reads use a 6 base and a 7 base index.

**gnomers** · 07-24-2012, 10:34 AM

Originally posted by bbeitzel View Post

I don't think balanced barcodes will work. The index read occurs after read 1 so the instrument will still have issues calling clusters.

I am doing some tag sequencing that requires a custom library prep protocol, and I have the same issue with homogeneous nucleotides for the first 10 bases of read 1. Illumina tech support recommended adding a stretch of 12 Ns (not 6) right after the read 1 primer binding site. It seems like that would be fairly straightforward to add to primers for amplicon sequencing.

Thanks for your answer. I should have clarified that I am not using truseq/nextera indexing reads; my barcodes compose the very first bases of read1. So, it sounds like I could use balanced barcodes, in lieu of the stretch of 12 Ns they recommended to you.

Topics	Statistics	Last Post
SIX2 Protein Identified as a Key Player in Prostate Cancer Treatment Resistance by seqadmin Started by seqadmin, Yesterday, 06:55 AM	0 responses 12 views 0 likes	Last Post by seqadmin Yesterday, 06:55 AM
Genetic Mosaicism More Prevalent Than Previously Thought by seqadmin Started by seqadmin, 05-30-2024, 03:16 PM	0 responses 24 views 0 likes	Last Post by seqadmin 05-30-2024, 03:16 PM
Comprehensive Sequencing of Great Ape Sex Chromosomes Yields Insights into Evolution and Genetic Variability by seqadmin Started by seqadmin, 05-29-2024, 01:32 PM	0 responses 29 views 0 likes	Last Post by seqadmin 05-29-2024, 01:32 PM
New Toolkit Enhances Plant Mitochondrial Genome Research by seqadmin Started by seqadmin, 05-24-2024, 07:15 AM	0 responses 215 views 0 likes	Last Post by seqadmin 05-24-2024, 07:15 AM

Seqanswers Leaderboard Ad

Announcement

amplicon sequencing on MiSeq

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News