NovaSeq from Illumina

GenoMax replied

07-14-2017, 07:14 AM
In my latest test, NovaSeq only had a 4-5% duplication rate.

The important point is JGI probably made VERY GOOD quality libraries. With patterned FC's having clean libraries (with just the right sized inserts, zero primers and dimers) are critical to minimizing these issues. Since we are talking about "B"illions of reads losing some during dedupe should not cause a major loss. 2D barcoding seems essential (perhaps should be made mandatory).
Leave a comment:
Brian Bushnell replied

07-14-2017, 06:47 AM
I calculated 8000 PPM of index swapping (cross-contamination) for our NovaSeq run with single indexes, and 120 PPM for dual indexes, when allowing zero barcode mismatches.
Leave a comment:
cement_head replied

07-14-2017, 05:45 AM
Slightly off-topic, but related: INDEX swapping on patterned flow cells...

QC Fail Sequencing » The latest Illumina sequencers muddle samples

https://sequencing.qcfail.com/articles/the-latest-illumina-sequencers-muddle-samples/

The new Illumina patterned flow cell technology uses chemistry that is prone to
Leave a comment:
Brian Bushnell replied

07-14-2017, 05:09 AM
Originally posted by GenoMax View Post

@cement_head: See if this blog post helps.

As usual, GenoMax has the perfectly appropriate link...

In my latest test, NovaSeq only had a 4-5% duplication rate. That's using our own NovaSeq data rather than external data. Overall not a huge problem though it's certainly worth removing. I'm not sure why the number is lower than my previous tests on external data, indicating >12%; possibly the chemistry got better. (Edit - I should note that this run used lots of libraries from different organisms multiplexed together, which reduces the apparent duplication rate, but makes it more accurate. That should not be relevant to such a huge discrepency, though.)

This run was extremely high quality (average 99.6% identity to the reference, or ~Q24) so duplicates were easy to detect. I'm really quite impressed with NovaSeq quality. It's unfortunate that there are only 4 quality scores, but CalcTrueQuality seems to do good job of recalibrating them to the full range of 0-41, yielding a 0.04 average deviation from the correct quality, down from 1.1 on the raw data. 1.1 is still really good (better than the HiSeq 2500 I compared it to), but having only 4 quality scores makes many operations like trimming and merging less accurate. It's actually very impressive that NovaSeq managed, with 4 quality scores, to get better quality score accuracy than HiSeq 2500. I've drawn a couple of conclusions from this: 1) The HiSeq quality score algorithm is terrible. And 2) NovaSeq is calibrated for successful runs only and cannot produce correct quality scores if there are any anomalies (e.g., if there is a lighting failure producing no signal, it will still output really high quality scores even though all the data is wrong). With our previous unsuccessful run (there was a lighting failure), the average deviation from the correct quality was ~20 (2 orders of magnitude).

Last edited by Brian Bushnell; 07-14-2017, 05:25 AM.
Leave a comment:
cement_head replied

07-14-2017, 04:43 AM
Originally posted by GenoMax View Post

@cement_head: See if this blog post helps.

Okay. Thanks - that was really helpful. We're tilting towards ALWAYS doing PE RNA-Seq and using UMIs. Doesn't solve every problem, but I think it reduces a lot of issues.
Leave a comment:
GenoMax replied

07-14-2017, 04:30 AM
@cement_head: See if this blog post helps.
Leave a comment:
cement_head replied

07-14-2017, 04:01 AM
Forgive this really basic question, but what is the cause of the duplicates on patterned flow cells as opposed to the older HiSeq2500 approach? Is this due to the density of the clusters and the likelihood of a library molecule detaching and then re-attaching a short distance away? Also, how is this different than a PCR duplicate? Is there anyway to tell other than spatial relatedness? (prediction based on XY locale)?
Leave a comment:
austinso replied

04-01-2017, 11:49 AM
Originally posted by pmiguel View Post

Okay I take your point, but an S2 should produce 3 billion clusters per flowcell, whereas a HiSeq 2500 produces about 1.6 billion with v3 chemistry. So the NovaSeq is about 4x less efficient than the HiSeq 2500 in this regard.

A NextSeq produces about 0.4 billion clusters per flowcell. So, the relative efficiencies would be:

(I'm using PF clusters per flowcell / ~number of input amplicon molecules)
HiSeq2500v3 = 1.6/7 = 23%
NextSeq = 0.4/1.4 = 29%
NovaSeqS2 = 3/90 = 3.3%

So, it absolutely looks like a much lower efficiency of clustering on the NovaSeq. (Anyone know if this is also the case for the HiSeq3000/4000?)

Re: 3000/4000
From what I could glean, based on the published specs (which are really vague, perhaps on purpose), the amount of library loaded ranges between 3-9 billion.

The yield is 0.75 billion to ??? billion (I think those that use these should chime in, it is not clear that the total yields stated are per flow cell or for both flow cells).

Mind you the % efficiencies (as you've defined) are way better than the MiSeq (0.3-0.4%) and the MiniSeq (1-5%)

That said, how much difference will this make for most runs? If you use the standard HiSeq2500 method, you start with 10ul of a 2nM library pool for denaturation. Since it gets diluted down to 20 pM (at least) you end up with 1 ml for each denaturation you do. One denaturation could be used to cluster all 8 lanes of the flowcell. But how often does that happen?

For us, I can't think of a single case where we have clustered more than 2-3 of lanes per denatured sample pool. Usually it is 8 sample pools for 8 lanes.

There are cases where the amount of library produced is limiting. And the NovaSeq would not be a good choice where this is your critical parameter.

So in most cases I would say it is being forced from 8 lanes to 1 lane along with losing the flexibility to run a much smaller flowcell (with rapid chemistry 2 lane flow cells) that are the major limitation of the NovaSeq.

Illumina expects you to just buy a NextSeq to deal with the 2nd issue above. That would okay (for some definitions of "okay") if they hadn't just decided all the NextSeqs should now have the ability to scan their microarrays. But the option is there.

Then there are the data issues considered in this thread. But I'm pretty sure that is something Illumina can fix (as they had for a period of time with the NextSeq, just after they introduced the v2 version of its chemistry/software) if they focus their attention on it.

I'm not sure that they can improve the % efficiency...it seems like ~30% is about the best you can recover in reads. This would explain why you need more library to get more reads in the NovaSeq.

Mind you 30% is not bad...it is an interesting threshold when you think about occupancy in space.

Cheers, A.
Leave a comment:
austinso replied

04-01-2017, 11:13 AM
Originally posted by misterc View Post

Is 150ul of a 1nM library what Illumina recommends for a single S2 flow cell?!?

Apparently for all of them. And that is the lower end (attached see pg. 16).
Attached Files

novaseq-6000-system-guide-1000000019358-01.pdf (1.02 MB, 64 views)
Leave a comment:
GW_OK replied

03-28-2017, 05:55 AM
I don't know if you can truly compare efficiencies of the ExAmp chemistry with the other instruments.

On the HiSeq and NextSeq instruments you are randomly clustering across the flowcell with a good correlation between how much DNA you load and how many clusters are produced.

On the ExAmp instruments there are only a fixed number of wells in which clusters can be formed. Additionally, you have to deal with the duplicates coming out of those wells and those duplicates that are formed in solution prior to the library going onto the flowcell.

I think what Illumina is trying to do in ExAmp is saturate the array as practically as possible.

No argument, though, about the loss of flexibility with the NovaSeq. In its' current iteration it's not something useful for an all-comers core lab.
Leave a comment:
pmiguel replied

03-28-2017, 04:43 AM
Originally posted by austinso View Post

On another note:

150 uL of a 1 nM library (~90 billion molecules) minimum for loading is a lot of library when you consider you can get by with 1.4 billion for the NextSeq and 7 billion for the HiSeq.

FWIW...

Okay I take your point, but an S2 should produce 3 billion clusters per flowcell, whereas a HiSeq 2500 produces about 1.6 billion with v3 chemistry. So the NovaSeq is about 4x less efficient than the HiSeq 2500 in this regard.

A NextSeq produces about 0.4 billion clusters per flowcell. So, the relative efficiencies would be:

(I'm using PF clusters per flowcell / ~number of input amplicon molecules)
HiSeq2500v3 = 1.6/7 = 23%
NextSeq = 0.4/1.4 = 29%
NovaSeqS2 = 3/90 = 3.3%

So, it absolutely looks like a much lower efficiency of clustering on the NovaSeq. (Anyone know if this is also the case for the HiSeq3000/4000?)

That said, how much difference will this make for most runs? If you use the standard HiSeq2500 method, you start with 10ul of a 2nM library pool for denaturation. Since it gets diluted down to 20 pM (at least) you end up with 1 ml for each denaturation you do. One denaturation could be used to cluster all 8 lanes of the flowcell. But how often does that happen?

For us, I can't think of a single case where we have clustered more than 2-3 of lanes per denatured sample pool. Usually it is 8 sample pools for 8 lanes.

There are cases where the amount of library produced is limiting. And the NovaSeq would not be a good choice where this is your critical parameter.

So in most cases I would say it is being forced from 8 lanes to 1 lane along with losing the flexibility to run a much smaller flowcell (with rapid chemistry 2 lane flow cells) that are the major limitation of the NovaSeq.

Illumina expects you to just buy a NextSeq to deal with the 2nd issue above. That would okay (for some definitions of "okay") if they hadn't just decided all the NextSeqs should now have the ability to scan their microarrays. But the option is there.

Then there are the data issues considered in this thread. But I'm pretty sure that is something Illumina can fix (as they had for a period of time with the NextSeq, just after they introduced the v2 version of its chemistry/software) if they focus their attention on it.

--
Phillip
Leave a comment:
misterc replied

03-27-2017, 10:25 PM
Is 150ul of a 1nM library what Illumina recommends for a single S2 flow cell?!?
Leave a comment:
austinso replied

03-22-2017, 04:11 PM
On another note:

150 uL of a 1 nM library (~90 billion molecules) minimum for loading is a lot of library when you consider you can get by with 1.4 billion for the NextSeq and 7 billion for the HiSeq.

FWIW...
Leave a comment:
GenoMax replied

03-22-2017, 11:59 AM
Originally posted by misterc View Post

Does anyone have even a lane's worth of these new .cbcl files from a NovaSeq? I'd like to test our bioinformatics pipeline with the new bcl2fastq converter v.2.19 that supports NovaSeq.

Illumina does not appear to have made the input files for NovaSeq data available on BaseSpace. Just the outputs.
Leave a comment:
misterc replied

03-22-2017, 11:55 AM
Does anyone have even a lane's worth of these new .cbcl files from a NovaSeq? I'd like to test our bioinformatics pipeline with the new bcl2fastq converter v.2.19 that supports NovaSeq.
Leave a comment:

Previous 1 2 3 4 5 6 8 template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
Yesterday, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 57 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 47 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News