We have had great success using the NuGen library prep. Their adapters have inline barcodes which adds to the diversity for the first cycles and allows sequences to pass filter. After passing filter the HiSeq can sequence the no or low diversity samples without any problems.
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
I have seen bad batches of phiX that had fairly high (a few percent, I think) adapter dimers levels in them. Maybe you should make your own genomic DNA library to make sure your "diluent" is of high quality.
Also you could obtain your "diluent" by sub-contracting a sequencing job. Send an email out to a prospective department (maybe one with a high level of plant or fungal sciences being done) and offer a one time only discount genome sequence. Our diluent was a sorghum genomic DNA library.
--
Phillip
Comment
-
Originally posted by HESmith View PostThere's an alternative approach, assuming that you have not yet constructed the libraries. Design them so the junction is at the opposite end of the insert, and perform paired-end sequencing. Cluster calling is based only on the first five cycles of read one, so you'll avoid the low-complexity issue.
I have a sample of 96-plex low diversity amplicon libraries running now and clusters were found just fine--but the low diversity is causing a tremendous discrepancy between the blue and the green box-and-whiskers plot--raw clusters and clusters passing filter. I hope those data are recoverable at the end. Nothing in my primer design, barcoding, indexing scheme can change the fact that it's "low complexity". First four bases were completely random and followed by eight different in-line bar codes.
This is PE sequencing.
Yet I know labs are making this work.
Comment
-
We do similar things to what you are describing all the time. A 20-30% PhiX spike (or any other library) should do the trick. PhiX is easy since it can be easily removed without an index and you can monitor the percent alignment as the run is going.
Our most common condition is a HiC or 5C library where we need to get through some T3 and T7 sequences that are common to all of the samples. We have used both ChIPseq libraries as well as PhiX spikes with very good results. The use of the ChIPseq libraries just allows those reads to be used for something useful where PhiX is just data thrown away.
Add in a spike and lower cluster density on the HiSeq to the 500k to 600k range and you should be fine. If you want to avoid the spike altogether, lowering clusters to about 200k also works but with more variable results.HudsonAlpha Institute for Biotechnology
http://www.hudsonalpha.org/gsl
Comment
-
For what it's worth we sequenced two lanes on a HiSeq (v2 flow cell) containing 11 7 bp inline barcodes with a spike-in of 5% phiX. Despite the low diversity visible in fastqc plots for the first 7 bp our cluster density and % of clusters passing filter was comparable to other lanes on the same run that did not have any low-complexity issues.
Comment
-
Originally posted by HESmith View PostIf the first four bases are random, then subsequent low complexity should not adversely affect cluster calling or data quality. Excessive cluster density is a possible culprit: what are your raw and PF values?
Point being we presume "random" here means all four bases were a random mix of ACGT and therefore an even mixture of all 256 possible sequences. But we would need to know what the method of generating these was to know.
--
Phillip
Comment
-
random (adj.) - lacking a definite plan, purpose, or pattern (emphasis added).
The poster stated that the "first four bases were completely random". I assumed (not "presumed") he meant what he wrote :-).
Semantically, "high complexity" does not mean base composition diversity, which is the relevant issue for cluster calling. A library that consists solely of AAAAA, CCCCC, GGGGG, or TTTTT starts (in roughly equal amounts) would suffice, yet (almost) no one would argue that this constitutes high complexity.
Apologies if this message comes across as cranky, Phillip. I was just trying my best to help the poster, and don't see how your comments contribute to the solution.
-Harold
Comment
-
Hi Harold,
Despite the simplicity of what we are discussing here, I think there are ambiguities. I agree your interpretation is likely the correct one. But randomly choosing 5 bases once and prefixing all the reads in a lane with that random sequence would lead to failure of the cluster calling software. That is all I meant. That might seem ludicrous, but I have seen experiments fail for misunderstandings just as ludicrous.
But, yeah, that might be sufficiently unlikely that my bringing up was just distracting, not illuminating. (Also, it could be the malign influence of xkcd forcing my hand to create that link back to it...)
--
Phillip
Comment
-
Originally posted by pmiguel View PostSemantically "random" does not mean "high complexity". See.
Point being we presume "random" here means all four bases were a random mix of ACGT and therefore an even mixture of all 256 possible sequences. But we would need to know what the method of generating these was to know.
--
Phillip
I am the poster ("she", not "he", btw) who used the phrase "First four bases were completely random". The 4 random bases were generated by ordering my oligos with "NNNN" where the read is supposed to begin. I did not generate a single "random" sequence to use. Back when I used to synthesize oligos myself, we achieved randomness by mixing reagents into a single bottle that went on the instrument along with A, C, G, T. Don't know what InVitrogen or IDT do these days. (Anybody else go back to Maxam&Gilbert sequencing days, pre-PCR ?)
To update, it appears that I got a reasonable number of reads surviving up until the HiSeq lost focus partway through read 3. I'll try the lower cluster density and phiX or shotgun library spike in next time.
Thanks, all.
Hilary
Comment
-
Yes it helps. But from my limited experience, the catastrophic failures come from focusing issues -- where the instrument sees a blank flow cell surface and de-focuses as it attempts to "find" the clusters it expects.
Again, recent firmware upgrades may have mitigated this particular issue. I am particularly paranoid about it because we only recently got an Illumina sequencer and our particular model is an outlier. So problems probably get solved for the HiSeqs first -- those particular to a HiScanSQ would be noticed and fixed later in most cases.
--
Phillip
Comment
-
Originally posted by HMorrison View PostHarold and Phillip,
To update, it appears that I got a reasonable number of reads surviving up until the HiSeq lost focus partway through read 3. I'll try the lower cluster density and phiX or shotgun library spike in next time.
Thanks, all.
Hilary
It is also worth checking if your HiSeq (assuming it is a HiSeq instrument, if not, this may not apply) has had the new solenoid valves installed. They help prevent, but not eliminate, the bubble issues. I don't know what Illumina is calling the new valves but your FSE or FAS will know.HudsonAlpha Institute for Biotechnology
http://www.hudsonalpha.org/gsl
Comment
Latest Articles
Collapse
-
by seqadmin
The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...-
Channel: Articles
04-22-2024, 07:01 AM -
-
by seqadmin
Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...-
Channel: Articles
04-04-2024, 04:25 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 04-25-2024, 11:49 AM
|
0 responses
19 views
0 likes
|
Last Post
by seqadmin
04-25-2024, 11:49 AM
|
||
Started by seqadmin, 04-24-2024, 08:47 AM
|
0 responses
19 views
0 likes
|
Last Post
by seqadmin
04-24-2024, 08:47 AM
|
||
Started by seqadmin, 04-11-2024, 12:08 PM
|
0 responses
62 views
0 likes
|
Last Post
by seqadmin
04-11-2024, 12:08 PM
|
||
Started by seqadmin, 04-10-2024, 10:19 PM
|
0 responses
60 views
0 likes
|
Last Post
by seqadmin
04-10-2024, 10:19 PM
|
Comment