Seqanswers Leaderboard Ad

**GenoMax** · 01-12-2017, 09:33 AM

Originally posted by massspecgeek View Post

Sorry, should have said that support will continue. Only sales of new instruments affected.

Taking out HiSeq 2500 would leave a gap in the continuum for "Illumina"verse between NextSeq 550 and HiSeq 4K/NovaSeq 5000.

Perhaps we will see a new sequencer (or two) slot in between there, in future.

**ymc** · 01-14-2017, 06:46 AM

Reagent cost is $6375 per flowcell for Hi Seq X. If the price of the new reagent is 80% of Hi Seq X, then it is $5100 per flowcell for NovaSeq 6000.

This means that the new reagent cost is $1.7/Gbp which is a huge drop from the previous $7/Gbp. Correct?

**AllSeq** · 01-14-2017, 09:44 AM

I'm pretty sure they meant 80% of the running cost (per Gb), not 80% of the specific kit cost. However, we've still only seen hints at specific pricing, so we can't say for sure.

**ymc** · 01-15-2017, 07:42 PM

Originally posted by AllSeq View Post

I'm pretty sure they meant 80% of the running cost (per Gb), not 80% of the specific kit cost. However, we've still only seen hints at specific pricing, so we can't say for sure.

Thanks for your reply.

Then from the cost perspective, it is not that impressive.

Big jump is throughput is always welcomed by the big genome centers. However, if base accuracy is down due to the new chemistry, then that won't even be a plus.

Anyway, I think we need to wait a little bit more to assess this new toy.

**pmiguel** · 01-17-2017, 12:38 PM

Yeah, if you already have a HiSeq X then the only major advantage is that there are no library type limitations on the NovaSeq.
What NovaSeq does is offer the average core a shot at a price per base previously only available to those with the throughput to need 5+ HiSeq X.
That said, you would need to run S4 reagents to get that price per base and:
(1) S4 won't be ready until late 2017
(2) It will generate 3 Tb of data in a single run == a single lane (logically, if not physically).

--
Phillip

**GenoMax** · 01-24-2017, 10:24 AM

Added some information from webinar to the original post.

**misterc** · 03-01-2017, 03:21 PM

Couple things that have changed on this lately.

1 - S4 flow cells now slated to ship in Q3 this year.
2 - S4 reagent kits only being reduced to be 20% cheaper than HiSeq X if you buy 5 NovaSeq instruments. Bleh. Still about half the cost per Gb versus HiSeq 4000.

**Brian Bushnell** · 03-01-2017, 06:50 PM

I did a comparison of duplicate rates on HiSeq2500 and NovaSeq, using Illumina's public data on BaseSpace:

NovaSeq seems to have a problem, but it's not clear why. These are not normal optical/well duplicates; they are extremely remote. It looks like during colony formation, some reads break off and reattach to an empty well somewhere else. The farthest-right point (at 25000) is not for distance 25000 but for distance infinity, including inter-tile duplicates.

These libraries are PCR-free WGS and thus should not really have more than a tiny fraction of duplicates, as seen on the HiSeq. Does anyone have any idea what's causing this? Does my hypothesis sound reasonable? Previous Illumina platforms had a very obvious distance cutoff where the number of duplicates increases rapidly up to a point, then plateaus (which is true for this HiSeq data, at around dist=45, but you can't see it in this graph). That is not the case for NovaSeq - it just keeps ascending, and there is no clear cutoff. It gradually bends, so there is no clear inflection point like there is on other platforms.

For reference, the libraries are both human NA12878 runs. NovaSeq is 2x150 and HiSeq 2500 is 2x100. Pairs are considered duplicates when the distance between colony centers is at most the stated distance, and both R1 and R2 match with some number of substitutions allowed, to account for sequencing error (8 for 150bp reads and 5 for 100bp reads). The insert sizes are quite large on average (>500bp) which reduces the rate of coincidental duplicates. HS2500 is ~10x and NovaSeq is ~30x coverage so the coincidental duplicate rate should be extremely low in both cases.

P.S. This is an underestimate of the duplicate rate for both platforms, as it was generated in a way that is not robust to sequencing error. I will regenerate the data, but it won't change the discrepancy, just the magnitude.

Attached Files

NovaSeq_Duplicates.png (34.0 KB, 900 views)

**SNPsaurus** · 03-01-2017, 10:44 PM

Was there a higher phiX concentration in the NovaSeq run? Wouldn't phiX produce pseudo-duplicates given the small genome, especially if library prep had a biased fragmentation?

I agree with your "fragment break-off" possibility. We were just chatting about that idea recently over here regarding the HiSeq4000.

**Brian Bushnell** · 03-01-2017, 11:05 PM

There was zero PhiX in the Novaseq data. I was wondering a bit about mitochondrial content, but still, the source DNA is the same for both platforms. Anyway, coincidental duplicates won't follow the pattern in the graph, of a curve with a negative derivative. They would cause a positive derivative because the number of potential matches increases with the square of the radius, so random matches would yield a curve that looks like Y=X^2, whereas the curve I plotted looks like... nothing with which I am familiar.

Edit:

Or, maybe, I should say it looks a bit like a step function plus a linear, or square-root, or X^Y function where Y is between 0.5 and 1. The step function has a steep increase until a point (say, 2500 for NovaSeq), which models "traditional" optical- or well-duplicates. The other function models "drifters" that break off and land in remote wells.

**Brian Bushnell** · 03-02-2017, 12:21 AM

Here is a zoomed-in image of HiSeq 2500 duplicates for the same genome (it's an immortal human cell line that does not need amplification, or so I'm told).

This is not the same as the other image, as the x-axis is logarithmic rather than linear. But the important point in my opinion is that there is a rapid increase in duplicates detected up to a point (~45) and subsequently it is completely flat for a long time. That is what I expect from a platform that occasionally identifies oddly-shaped clusters as two clusters, or in which a well occasionally migrates to an adjacent well.

At ~1000, it starts going up again. I'm not sure about that - I would expect it to be sub-linear on the log scale, but then, I'm not sure what's happening in that region. The salient point is that there is a sharp increase over roughly the width of a cluster, and then a plateau, and finally another increase due to the increasing range. After dist=1000, I can't explain the slope. But, the graph only shows duplicates of less than 0.02% of reads, so it's not very important in practice. Still, it would be great if there was one less unsolved mystery.

Attached Files

HighSeq_Duplicates.png (32.8 KB, 772 views)

**pmiguel** · 03-02-2017, 04:10 AM

Hi Brian,
Are you scoring the same number of reads with HiSeq/NovaSeq? If the number of reads for the NovaSeq were an order of magnitude higher, then repetitive or mitochondrial DNA then you might be able to use up all of the possible start sites.

Are you scoring clusters as a duplicate only if both forward and reverse reads are the same? Or are you only checking one side?

BTW, yes, a typical DNA prep from cell culture would yield enough DNA to make it unnecessary to amplify the library.

--
Phillip

**GenoMax** · 03-02-2017, 04:35 AM

Originally posted by pmiguel View Post

Hi Brian,
Are you scoring the same number of reads with HiSeq/NovaSeq? If the number of reads for the NovaSeq were an order of magnitude higher, then repetitive or mitochondrial DNA then you might be able to use up all of the possible start sites.

Probably not since best NovaSeq sample posted on BaseSpace has 1.6 Billion reads (individual R1 and R2 files, if uncompressed are 300G each!, we have the possibility of having uncompressed read files of 1TB each when S4 cells roll around later this year).

Are you scoring clusters as a duplicate only if both forward and reverse reads are the same? Or are you only checking one side?

That should be a yes since @Brian is probably using clumpify which takes both reads into account.

I am wondering if we are sampling the libraries so thoroughly on a NovaSeq that we have duplicates showing up due to oversampling.

**misterc** · 03-02-2017, 08:12 AM

Brian, your hypothesis is reasonable as there is no other possibility to explain the duplicate rate. Not surprisingly, we see similar duplicates on HiSeq 4000, as this 'characteristic' of ExAmp isn't limited to NovaSeq.

**Brian Bushnell** · 03-02-2017, 09:12 AM

Originally posted by pmiguel View Post

Hi Brian,
Are you scoring the same number of reads with HiSeq/NovaSeq? If the number of reads for the NovaSeq were an order of magnitude higher, then repetitive or mitochondrial DNA then you might be able to use up all of the possible start sites.

I might try running again after removing the mito, but it's not like mito accounts for >12% of the reads anyway. The number of reads was different, but this NovaSeq library only has twice the reads of the HiSeq library, so that doesn't explain the result.

Are you scoring clusters as a duplicate only if both forward and reverse reads are the same? Or are you only checking one side?

As Genomax indicated, yes, with this methodology both reads in a pair are required to match for the pair to be considered a duplicate. Due to the large insert size and variance this is unlikely to occur by chance.

Originally posted by misterc

Brian, your hypothesis is reasonable as there is no other possibility to explain the duplicate rate. Not surprisingly, we see similar duplicates on HiSeq 4000, as this 'characteristic' of ExAmp isn't limited to NovaSeq.

I wonder if this is a fundamental limitation of patterned flowcells, and made more pronounced as the dots shrink. When the colony is growing, once a dot is filled, the amplification continues but there is nowhere for the clones on the edges to attach, so some of them break off and drift around. In that case, presumably increasing the loading concentration would reduce the duplicate rate...

But, it makes me wonder what the duplicate rates of the high-throughput flowcells will look like.

Topics	Statistics	Last Post
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, Yesterday, 05:31 AM	0 responses 10 views 0 likes	Last Post by seqadmin Yesterday, 05:31 AM
Small Blood Stem Cell Subset Linked to Immune System Aging by seqadmin Started by seqadmin, 10-24-2024, 06:58 AM	0 responses 20 views 0 likes	Last Post by seqadmin 10-24-2024, 06:58 AM
New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types by seqadmin Started by seqadmin, 10-23-2024, 08:43 AM	0 responses 50 views 0 likes	Last Post by seqadmin 10-23-2024, 08:43 AM
Microbes in Urban Spaces Adapt to Disinfectants and Scarce Resources by seqadmin Started by seqadmin, 10-17-2024, 07:29 AM	0 responses 58 views 0 likes	Last Post by seqadmin 10-17-2024, 07:29 AM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News