Seqanswers Leaderboard Ad

**gringer** · 05-09-2017, 10:49 PM

Originally posted by Brian Bushnell View Post

for SNPs and short indels alone, without phasing, you can currently get better results for less money with Illumina HiSeq 2500.

This depends a lot on how you define cost. Nanopore is currently getting about 5-10 gigabases per 2-day run for experienced labs with careful sample prep, with some groups getting 10-15 Gb. Internal testing at ONT has higher yield, but that is less useful for people that prefer information on what they can achieve. Assuming 10Gb per run, that's $100 per gigabase using the most expensive flow cell option ($900 USD + $100 reagents), which I think is in the realm of HiSeq / MiSeq. With the cheapest bulk flow cell option ($500 USD + $100 reagents), it's $60 per gigabase, which is nipping at the toes of HiSeq and NextSeq using currently-available flow cell costs and yields. That ignores the advantage conferred by long reads, which is substantial when considering things like isoform detection for cDNASeq.

However, what I have found that most labs care about (certainly small labs) is the minimum cost of sequencing. A MinION purchase gives you a couple of flow cells to play around with, which means that the sequencing cost is effectively capital-free. Factoring in additional reagents and training time, an initial pilot study can be done with the MinION starting from nothing in a basic lab (with pipettes and centrifuges) for about $2000 USD, with delivery of MinION and flow cells happening within a couple of weeks. After that, it's no more than $1000 per run, with results that can be analysed within a few minutes of the run starting.

**Brian Bushnell** · 05-09-2017, 11:21 PM

I won't contest any of that, since you're better-informed with Nanopore sequencing costs than I am (so far, I think we got them all for free). However, I would still prefer short Illumina reads for SNP, deletion, or short insertion calling. But the OP's question is which platform would be ideal for all variant calling for minimal cost. I can't answer that directly, because I don't think that it is currently possible to do accurate variant-calling covering SNPs, indels, CNVs, and SVs, using a single platform. But I would suggest that Illumina should be part of the equation, for now.

**WhatsOEver** · 05-09-2017, 11:37 PM

Originally posted by gringer View Post

Studies exist. Whether or not they're "convincing enough" is entirely up to the readers.

mhh, the discussion about ONT vs others seems to me a little like apple vs samsung: Either you hate it or you love it

Originally posted by gringer View Post

Accuracy is a thin straw because it is almost entirely a software problem.

Can't be, because this would imply that you already know that the signal differences between all bases with all modifications are large enough to distinguish them from the noise within the system, which you don't as you already mentioned. There might be point were a new technical innovation is required to improve accuracy. A simple example would be the new pore ONT is now using. Besides that, I fully agree with your statements on software development and accuracy.

Originally posted by gringer View Post

The current ONT basecalling is trained mostly on bacteriophage lambda and E. coli, which have much simpler unmodelled DNA context. For ONT to be able to correctly call human genomic sequence, they need to add all the possible DNA base modifications into their calling model, and that's going to take quite a long time. Until then, single-base consensus accuracy will be lower than expected even at infinite coverage. It may be that the majority of the systematic [non-homopolymer] base-calling error is associated with modified bases, but we're not going to know that until a sufficiently complete model exists.

That is an extremely valuable piece of information for me - Thanks!

Originally posted by Brian Bushnell View Post

For human sequencing (in which, unlike bacterial sequencing, the reagent costs outweigh the library-prep costs) it seems like it might be prudent to pursue a dual-library approach, with short and long reads on different platforms. In that case you don't need to pick a single platform that's optimal for everything.

Why do you think so for human (or more generally multiploid organisms) WGS? For human data, we are currently unable to do whole genome reconstruction with short reads alone using the existing reference. If we create a scaffold of our genome of interest with long reads, we would still be unable to map the short reads accurately. As an example: How would a dual platform approach help me to resolve highly repetitive regions in the genome like MHC or mucins?

Originally posted by seq_bio View Post

But as of now, as a standalone system, It's not really ready for human WGS correct ? I think that's what Whatsoever was interested in if I understood that correctly.

True, and it is my conclusion from this discussion as well. We would probably be able to do CNV calling and RNA-Seq, but for identifying SNPs on whole genome level it is not ready, yet. I think our next step now must be a test run on the MinION as suggested to see how well our libraries are represented in the data.

**WhatsOEver** · 05-09-2017, 11:44 PM

Originally posted by Brian Bushnell View Post

I won't contest any of that, since you're better-informed with Nanopore sequencing costs than I am (so far, I think we got them all for free). However, I would still prefer short Illumina reads for SNP, deletion, or short insertion calling. But the OP's question is which platform would be ideal for all variant calling for minimal cost. I can't answer that directly, because I don't think that it is currently possible to do accurate variant-calling covering SNPs, indels, CNVs, and SVs, using a single platform. But I would suggest that Illumina should be part of the equation, for now.

Its actually the variant calling for minimal cost on whole genome level which is the critical part for me

I'm totally fine with our existing Illumina-Agilent-WES variant calling pipeline.

**gringer** · 05-09-2017, 11:53 PM

Originally posted by WhatsOEver View Post

Can't be, because this would imply that you already know that the signal differences between all bases with all modifications are large enough to distinguish them from the noise within the system, which you don't as you already mentioned.

Sensor noise is negligible in comparison to the shift from one base to another. All existing known base modifications produce a large current shift in the signal. Distinguishing between two different pyrimidines (i.e. C/T) is probably one of the most difficult things at the moment, because their chemical structure is so similar.

But there's a whole lot of context that isn't included in the current models. The current basecallers typically only look at the absolute signal level, and pay limited (if any) attention to the change in signal from the previous value(s), and also don't account for base transition time (except for calling homopolymers). I had a look at event information a couple of years ago for a single read, and in spite of being overwhelmed with the amount of information there was, I found a lot of suggestions that base calling could be improved by looking beyond the single base that was found in the middle of the pore at the time the signal was read.

**seq_bio** · 05-10-2017, 12:00 AM

Do let us know which way you decide in the end - Sequel or Promethion or neither. This was a very interesting discussion. In general, it looks like it's going to be tough to wean people away from Illumina as of now.
For pacbio, it looks like they need to get their costs down (throughput up) to be one platform for doing SNP, indels, CNV,SV that currently matter to a good chunk of researchers. They claim that they will increase throughput by 32x by end of 2018 fwiw - http://www.pacb.com/videos/agbt-pacb...t-lower-costs/
For ONT, accuracy appears to be a bugbear.

**WhatsOEver** · 05-10-2017, 12:15 AM

Originally posted by gringer View Post

Sensor noise is negligible in comparison to the shift from one base to another. All existing known base modifications produce a large current shift in the signal. Distinguishing between two different pyrimidines (i.e. C/T) is probably one of the most difficult things at the moment, because their chemical structure is so similar.

Although this becomes a little off-topic now, what do you think in this context of the genia sequencer which (more or less) specifically addresses the difficulties in distinguishing similar signals by adding tags to the bases? It seems like it might become the biggest competitor to ONT (especially with Roche behind it).

**WhatsOEver** · 05-10-2017, 12:29 AM

Originally posted by seq_bio View Post

Do let us know which way you decide in the end - Sequel or Promethion or neither. This was a very interesting discussion. In general, it looks like it's going to be tough to wean people away from Illumina as of now.

I will. And yes, it was for me indeed a very interesting and helpful discussion - thanks a lot to all participators!

Originally posted by seq_bio View Post

For pacbio, it looks like they need to get their costs down (throughput up) to be one platform for doing SNP, indels, CNV,SV that currently matter to a good chunk of researchers. They claim that they will increase throughput by 32x by end of 2018 fwiw - http://www.pacb.com/videos/agbt-pacb...t-lower-costs/
For ONT, accuracy appears to be a bugbear.

Yepp, in the end its "accuracy vs cost". Pacbio with higher accuracy but higher cost, ONT with lower accuracy but lower cost.

**nucacidhunter** · 05-10-2017, 01:11 AM

For the purpose of this thread it might worth considering 10x Genomics linked-reads as well. It has the advantage of Illumina platforms high accuracy, requires very low input DNA and currently is very cost effective in comparison to both Nanopore and PacBio, and can phase indels, SNVs and SVs over 10 Mb haplotype blocks.

**Ola** · 05-10-2017, 11:58 AM

Originally posted by WhatsOEver View Post

Although this becomes a little off-topic now, what do you think in this context of the genia sequencer which (more or less) specifically addresses the difficulties in distinguishing similar signals by adding tags to the bases? It seems like it might become the biggest competitor to ONT (especially with Roche behind it).

From what I have seen they can generate a ton of raw data and some sequence (bacterial genomes), but the quality per read is not impressive (yet). It relies on a polymerase which is much slower than the helicase ONT uses so yield will be lower unless they can have a much higher number of pores, and while polymerases can be highly accurate they also have a much larger variation in incorporation time so signal processing seems very challenging.

**gringer** · 05-10-2017, 12:08 PM

Originally posted by nucacidhunter View Post

For the purpose of this thread it might worth considering 10x Genomics linked-reads as well. It has the advantage of Illumina platforms high accuracy, requires very low input DNA and currently is very cost effective in comparison to both Nanopore and PacBio, and can phase indels, SNVs and SVs over 10 Mb haplotype blocks.

10X won't be able to resolve large tandem repeats with long repeat unit lengths, because it relies on short reads for assembly. A structure like this one, for example, won't be properly resolved:

Larger version

**gringer** · 05-10-2017, 12:21 PM

Originally posted by WhatsOEver View Post

Although this becomes a little off-topic now, what do you think in this context of the genia sequencer which (more or less) specifically addresses the difficulties in distinguishing similar signals by adding tags to the bases? It seems like it might become the biggest competitor to ONT (especially with Roche behind it).

That would probably work. It would make the computational side of things a bit easier, but we've by no means hit the limit of what can be done with just an electrical signal.

ONT is only the first commercial offering in what I expect will eventually be a crowded market of sequencing-by-observation devices.

**nucacidhunter** · 05-10-2017, 05:03 PM

Originally posted by gringer View Post

10X won't be able to resolve large tandem repeats with long repeat unit lengths, because it relies on short reads for assembly.

Amplifying single cell DNA results in fragments of 10kb average size so the input fragment length will be the limiting factor not the platform ability to sequence larger fragments. 10x linked-reads essentially amplifies large fragments to short ones for sequencing and can rebuild starting long DNA fragments.

**gringer** · 05-10-2017, 09:49 PM

If the repeat unit size is longer than the sequenced length (e.g. repeat unit size of 171bp, repeated 150 times, with a sequenced length of 125bp), and the repeat units are similar enough, then it's not possible to see from the sequence overlap alone if there is any tandemly-repeated structure at all.

A careful assembly might discover an odd increase of read coverage within a particular region, but extending that observation to a fully-resolved tandem repeat structure would be difficult.

**WhatsOEver** · 05-10-2017, 11:24 PM

Originally posted by gringer View Post

10X won't be able to resolve large tandem repeats with long repeat unit lengths, because it relies on short reads for assembly. A structure like this one, for example, won't be properly resolved:

Larger version

Is this human data? Could you share the raw data behind this? This might be an interesting region to evaluate our aims but I would need to check whether we are able to target this region with our approach.

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 18 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News