Seqanswers Leaderboard Ad

**BIG_SNP** · 11-17-2011, 04:04 PM

We have had great success using the NuGen library prep. Their adapters have inline barcodes which adds to the diversity for the first cycles and allows sequences to pass filter. After passing filter the HiSeq can sequence the no or low diversity samples without any problems.

**fkrueger** · 11-18-2011, 02:32 AM

Just as another thought, if you could afford to spike in 90-95% gDNA, couldn't you also find an external sequencing facility who still run GAIIx's and use the methods which work well on these?

**pmiguel** · 11-18-2011, 04:59 AM

I have seen bad batches of phiX that had fairly high (a few percent, I think) adapter dimers levels in them. Maybe you should make your own genomic DNA library to make sure your "diluent" is of high quality.

Also you could obtain your "diluent" by sub-contracting a sequencing job. Send an email out to a prospective department (maybe one with a high level of plant or fungal sciences being done) and offer a one time only discount genome sequence. Our diluent was a sorghum genomic DNA library.

--
Phillip

**HMorrison** · 11-18-2011, 06:55 AM

Originally posted by HESmith View Post

There's an alternative approach, assuming that you have not yet constructed the libraries. Design them so the junction is at the opposite end of the insert, and perform paired-end sequencing. Cluster calling is based only on the first five cycles of read one, so you'll avoid the low-complexity issue.

I have a sample of 96-plex low diversity amplicon libraries running now and clusters were found just fine--but the low diversity is causing a tremendous discrepancy between the blue and the green box-and-whiskers plot--raw clusters and clusters passing filter. I hope those data are recoverable at the end. Nothing in my primer design, barcoding, indexing scheme can change the fact that it's "low complexity". First four bases were completely random and followed by eight different in-line bar codes.

This is PE sequencing.

Yet I know labs are making this work.

**HESmith** · 11-18-2011, 07:24 AM

If the first four bases are random, then subsequent low complexity should not adversely affect cluster calling or data quality. Excessive cluster density is a possible culprit: what are your raw and PF values?

**csquared** · 11-18-2011, 10:05 AM

We do similar things to what you are describing all the time. A 20-30% PhiX spike (or any other library) should do the trick. PhiX is easy since it can be easily removed without an index and you can monitor the percent alignment as the run is going.

Our most common condition is a HiC or 5C library where we need to get through some T3 and T7 sequences that are common to all of the samples. We have used both ChIPseq libraries as well as PhiX spikes with very good results. The use of the ChIPseq libraries just allows those reads to be used for something useful where PhiX is just data thrown away.

Add in a spike and lower cluster density on the HiSeq to the 500k to 600k range and you should be fine. If you want to avoid the spike altogether, lowering clusters to about 200k also works but with more variable results.

**greigite** · 11-18-2011, 02:04 PM

For what it's worth we sequenced two lanes on a HiSeq (v2 flow cell) containing 11 7 bp inline barcodes with a spike-in of 5% phiX. Despite the low diversity visible in fastqc plots for the first 7 bp our cluster density and % of clusters passing filter was comparable to other lanes on the same run that did not have any low-complexity issues.

**pmiguel** · 11-20-2011, 03:47 PM

Originally posted by HESmith View Post

If the first four bases are random, then subsequent low complexity should not adversely affect cluster calling or data quality. Excessive cluster density is a possible culprit: what are your raw and PF values?

Semantically "random" does not mean "high complexity". See.

Point being we presume "random" here means all four bases were a random mix of ACGT and therefore an even mixture of all 256 possible sequences. But we would need to know what the method of generating these was to know.

--
Phillip

**HESmith** · 11-21-2011, 09:36 AM

random (adj.) - lacking a definite plan, purpose, or pattern (emphasis added).

The poster stated that the "first four bases were completely random". I assumed (not "presumed") he meant what he wrote :-).

Semantically, "high complexity" does not mean base composition diversity, which is the relevant issue for cluster calling. A library that consists solely of AAAAA, CCCCC, GGGGG, or TTTTT starts (in roughly equal amounts) would suffice, yet (almost) no one would argue that this constitutes high complexity.

Apologies if this message comes across as cranky, Phillip. I was just trying my best to help the poster, and don't see how your comments contribute to the solution.

-Harold

**pmiguel** · 11-21-2011, 11:46 AM

Hi Harold,
Despite the simplicity of what we are discussing here, I think there are ambiguities. I agree your interpretation is likely the correct one. But randomly choosing 5 bases once and prefixing all the reads in a lane with that random sequence would lead to failure of the cluster calling software. That is all I meant. That might seem ludicrous, but I have seen experiments fail for misunderstandings just as ludicrous.

But, yeah, that might be sufficiently unlikely that my bringing up was just distracting, not illuminating. (Also, it could be the malign influence of xkcd forcing my hand to create that link back to it...)

--
Phillip

**HMorrison** · 11-22-2011, 11:43 AM

Originally posted by pmiguel View Post

Semantically "random" does not mean "high complexity". See.

Point being we presume "random" here means all four bases were a random mix of ACGT and therefore an even mixture of all 256 possible sequences. But we would need to know what the method of generating these was to know.

--
Phillip

Harold and Phillip,
I am the poster ("she", not "he", btw) who used the phrase "First four bases were completely random". The 4 random bases were generated by ordering my oligos with "NNNN" where the read is supposed to begin. I did not generate a single "random" sequence to use. Back when I used to synthesize oligos myself, we achieved randomness by mixing reagents into a single bottle that went on the instrument along with A, C, G, T. Don't know what InVitrogen or IDT do these days. (Anybody else go back to Maxam&Gilbert sequencing days, pre-PCR

?)

To update, it appears that I got a reasonable number of reads surviving up until the HiSeq lost focus partway through read 3. I'll try the lower cluster density and phiX or shotgun library spike in next time.

Thanks, all.

Hilary

**HESmith** · 11-22-2011, 12:15 PM

Hi Hilary,

Apologies for using the incorrect gender, and sorry to hear about the focusing error. Better luck next time.

Harold

**TonyBrooks** · 12-01-2011, 06:24 AM

Would running low diversity libraries at a low concentration not help solve the problem?
If you are not looking for large number of reads, then running at a low concentration should mean less chance of overlapping clusters and more reads passing filter.

**pmiguel** · 12-01-2011, 07:32 AM

Yes it helps. But from my limited experience, the catastrophic failures come from focusing issues -- where the instrument sees a blank flow cell surface and de-focuses as it attempts to "find" the clusters it expects.

Again, recent firmware upgrades may have mitigated this particular issue. I am particularly paranoid about it because we only recently got an Illumina sequencer and our particular model is an outlier. So problems probably get solved for the HiSeqs first -- those particular to a HiScanSQ would be noticed and fixed later in most cases.

--
Phillip

**csquared** · 12-01-2011, 11:55 AM

Originally posted by HMorrison View Post

Harold and Phillip,
To update, it appears that I got a reasonable number of reads surviving up until the HiSeq lost focus partway through read 3. I'll try the lower cluster density and phiX or shotgun library spike in next time.

Thanks, all.

Hilary

The loss of focus during read 3 is likely a bubble from fluidics than a diversity problem. If you got that far with good PF clusters, good base quality and a good, flat FWHM metric, it isn't the diversity of the library that is the problem. It is very likely a fluidics issue and one you should raise with your FAS as it would potentially be eligible for a warranty replacement of the affected lane.

It is also worth checking if your HiSeq (assuming it is a HiSeq instrument, if not, this may not apply) has had the new solenoid valves installed. They help prevent, but not eliminate, the bubble issues. I don't know what Illumina is calling the new valves but your FSE or FAS will know.

Topics	Statistics	Last Post
New Toolkit Enhances Plant Mitochondrial Genome Research by seqadmin Started by seqadmin, Yesterday, 07:15 AM	0 responses 13 views 0 likes	Last Post by seqadmin Yesterday, 07:15 AM
Catalog of Gene-Isoform Variation in Developing Human Brain by seqadmin Started by seqadmin, 05-23-2024, 10:28 AM	0 responses 17 views 0 likes	Last Post by seqadmin 05-23-2024, 10:28 AM
Ancient Viral Sequences in Human Brain Linked to Psychiatric Disorders by seqadmin Started by seqadmin, 05-23-2024, 07:35 AM	0 responses 20 views 0 likes	Last Post by seqadmin 05-23-2024, 07:35 AM
New Milestone for COSMIC with Extensive Cancer Mutation Data by seqadmin Started by seqadmin, 05-22-2024, 02:06 PM	0 responses 10 views 0 likes	Last Post by seqadmin 05-22-2024, 02:06 PM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News