SOLiD for Genomes - SEQanswers

Brian Bushnell replied

07-05-2016, 12:20 PM
Originally posted by cement_head View Post

So I took another look at this and it strikes me that the whole problem is the use of only four fluors for 16 combinations. (Seems odd that this wasn't the primary issue attempted to be solved; i.e generating 16 distinct fluors.) Once I got that part, it became obvious why there's an issue with colourspace. Curiously, I just found out that MiniSeq and NextSeq from Illumina use only two fluors - seems like a huge potential issue is one isn't resequencing a human genome...

Yes, if Solid had used 16 colors it might have been substantially better, though that would have added its own unique issues (like potentially taking 4x as long to sequence).

Illumina's 2-color chemistry is not like Solid Colorspace, though. It's just a binary encoding of bases -> colors; no information is lost (since no two bases share the same pair of color polarities), except that you can no longer distinguish between no signal and one of the bases. It works fairly well in practice (for de-novo sequencing) and you don't need to align sequences to determine what they are. The 2-color platforms have weaknesses, but it is not clear that the weaknesses are linked to the number of dyes.

Last edited by Brian Bushnell; 07-05-2016, 12:22 PM.
Leave a comment:
cement_head replied

07-05-2016, 12:09 PM
Originally posted by Chipper View Post

The quoted error rate (<0.1%) must be after reference-based correction. The problem with SOLiD was the high raw error rate of the ligation based chemistry (compared to Illumina) and the short read lengths which makes it essentially useless for de novo assembly.

I think the best option today for a large genome and a low budget would be to use the 10x Chromium with HiseqX (~$2000 for one lane PE150 linked reads from long fragments).

So I took another look at this and it strikes me that the whole problem is the use of only four fluors for 16 combinations. (Seems odd that this wasn't the primary issue attempted to be solved; i.e generating 16 distinct fluors.) Once I got that part, it became obvious why there's an issue with colourspace. Curiously, I just found out that MiniSeq and NextSeq from Illumina use only two fluors - seems like a huge potential issue is one isn't resequencing a human genome...
Leave a comment:
Chipper replied

07-05-2016, 06:55 AM
Originally posted by cement_head View Post

I guess I still don't understand the "issues" with deconvoluting colour-space. It seems as though it would be much more accurate than sequencing in basespace (e.g. Illumina). That's if I'm reading this paper correctly (attached).

The quoted error rate (<0.1%) must be after reference-based correction. The problem with SOLiD was the high raw error rate of the ligation based chemistry (compared to Illumina) and the short read lengths which makes it essentially useless for de novo assembly.

I think the best option today for a large genome and a low budget would be to use the 10x Chromium with HiseqX (~$2000 for one lane PE150 linked reads from long fragments).
Leave a comment:
gringer replied

07-02-2016, 01:14 PM
Originally posted by cement_head View Post

It seems as though it would be much more accurate than sequencing in basespace (e.g. Illumina). That's if I'm reading this paper correctly.

If our preferred model of DNA were colour-space, then it might have been more accurate with sufficient technology development. As it is, Illumina has had plenty of opportunity to improve the accuracy of their technology, and benefits from their chemical model being almost a direct representation of the DNA model that we use for sequencing.
Leave a comment:
cement_head replied

07-02-2016, 01:00 PM
Originally posted by gringer View Post

I suspect I've discussed this with you previously, but I might as well say things I haven't said before:

Homopolymers look identical in colour-space, which causes havoc for transcriptome assemblies (e.g. distinguishing between poly-T and poly-A sequences). Other simple repeats would also cause issues for genomic assembly (e.g. ACACACACAC and GTGTGTGTGT are identical, despite having both a base shift and a complementation). The assemblies are only likely to be useful in colour-space, because colour-space errors propagate through as very different sequences in base-space. Also, every contig has four possible base-space representations, which among other things makes it quite difficult to use other genome assemblies as scaffolds for a colour-space assembly.

I guess I still don't understand the "issues" with deconvoluting colour-space. It seems as though it would be much more accurate than sequencing in basespace (e.g. Illumina). That's if I'm reading this paper correctly (attached).
Attached Files

nrg.2016.49.pdf (2.11 MB, 70 views)
Leave a comment:
gringer replied

07-02-2016, 06:07 AM
Originally posted by westerman View Post

Going off the topic here (which is that the SOLiD is not good for denovo work) I wonder where you get that statement. It seems to me that 60 quality bases would be enough to place accurately except for long repeat regions (e.g., LTRs).

I suspect I've discussed this with you previously, but I might as well say things I haven't said before:

Homopolymers look identical in colour-space, which causes havoc for transcriptome assemblies (e.g. distinguishing between poly-T and poly-A sequences). Other simple repeats would also cause issues for genomic assembly (e.g. ACACACACAC and GTGTGTGTGT are identical, despite having both a base shift and a complementation). The assemblies are only likely to be useful in colour-space, because colour-space errors propagate through as very different sequences in base-space. Also, every contig has four possible base-space representations, which among other things makes it quite difficult to use other genome assemblies as scaffolds for a colour-space assembly.
Leave a comment:
cement_head replied

07-01-2016, 05:38 AM
Ok, thanks
Leave a comment:
RickC7 replied

06-30-2016, 11:40 AM
Reagent support for SOLiD until May2017 or sooner per demand.

We use/used SOLiD for SAGE, great for short reads but more expensive than Illumina runs. Converting everything over to Illumina adapters now...

The couple times we did targeted reseq or whole transciptome, reverse read quality was bad.
Leave a comment:
colindaven replied

05-25-2016, 12:00 AM
@westerman

It wasn't clear from the start whether the topic was de novo or reference based assembly.

Have a look at the genome mappability score which came out of Mike Schatz's lab as one example (http://bioinformatics.oxfordjournals...8/16/2097.full).

Even with 100bp perfect simulated single reads there are regions which cannot be mapped to reliably. Therefore, 60 bp reads containing errors won't be so nice to deal with. I remember working on human twin genomes and getting ~40-50,000 differences in VCF despite various SNP callers and stringent mapping quality filters.

http://bioinformatics.oxfordjournals.org/content/28/16/2097/T1.expansion.html

By the way, I work on plant genomes, and repetitive regions can be > 80%, so I thought the original poster might have similar issues.
Leave a comment:
westerman replied

05-24-2016, 11:43 AM
Originally posted by colindaven View Post

A 60bp SE read is too short to place accurately in many/most genomes.

Going off the topic here (which is that the SOLiD is not good for denovo work) I wonder where you get that statement. It seems to me that 60 quality bases would be enough to place accurately except for long repeat regions (e.g., LTRs).
Leave a comment:
Brian Bushnell replied

05-24-2016, 09:12 AM
My experience with Solid 4 was that it had terrible accuracy... on both read 1 and read 2.
Leave a comment:
colindaven replied

05-24-2016, 02:42 AM
There are still quite a few SOLiDs out there, see for example this data just into the SRA:

http://www.ncbi.nlm.nih.gov/sra/ERX1488475[accn]

Raw read accuracy is excellent, but keep in mind paired end reads do not really work at all (R1 was ~ 75 bp, 60bp after trimming, and R2 was just pure rubbish).

A 60bp SE read is too short to place accurately in many/most genomes. Also de novo assembly simply does not work, which rules out all other than resequencing applications (you need a very good reference genome too).
Leave a comment:
cmbetts replied

05-23-2016, 02:34 PM
They may both use sequencing by ligation, but SOLiD and Complete Genomics are different technologies. As far as I can tell, SOLiD has been discontinued, having been beaten by Illumina and replace by Ion Torrent long ago.
Either would still be inappropriate for de novo genome sequencing. Complete has always been exclusively for human genome resequencing, and the colorspace reads of SOLiD were best when a reference was available because sequencing errors introduced frameshifts in the base encoding.
Leave a comment:
cement_head replied

05-23-2016, 09:17 AM
Hello,

It is not obsolete - Complete Genomics (BGI) use sequencing-by-ligation?

URL: http://bgi-international.com/service...her-platforms/

-Andor
Leave a comment:
Chipper replied

05-23-2016, 08:21 AM
No. Besides that it is obsolete it gave far too short reads.
Leave a comment:

Previous 1 2 template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 57 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 51 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 56 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News