Header Leaderboard Ad

Collapse

SOLiD for Genomes

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • SOLiD for Genomes

    Does SOLiD work well for genomes with a lot of repeats? Theoretically it should, but in practice?

    Thanks,

  • #2
    No. Besides that it is obsolete it gave far too short reads.

    Comment


    • #3
      Hello,

      It is not obsolete - Complete Genomics (BGI) use sequencing-by-ligation?

      URL: http://bgi-international.com/service...her-platforms/

      -Andor

      Comment


      • #4
        They may both use sequencing by ligation, but SOLiD and Complete Genomics are different technologies. As far as I can tell, SOLiD has been discontinued, having been beaten by Illumina and replace by Ion Torrent long ago.
        Either would still be inappropriate for de novo genome sequencing. Complete has always been exclusively for human genome resequencing, and the colorspace reads of SOLiD were best when a reference was available because sequencing errors introduced frameshifts in the base encoding.

        Comment


        • #5
          There are still quite a few SOLiDs out there, see for example this data just into the SRA:

          http://www.ncbi.nlm.nih.gov/sra/ERX1488475[accn]

          Raw read accuracy is excellent, but keep in mind paired end reads do not really work at all (R1 was ~ 75 bp, 60bp after trimming, and R2 was just pure rubbish).

          A 60bp SE read is too short to place accurately in many/most genomes. Also de novo assembly simply does not work, which rules out all other than resequencing applications (you need a very good reference genome too).

          Comment


          • #6
            My experience with Solid 4 was that it had terrible accuracy... on both read 1 and read 2.

            Comment


            • #7
              Originally posted by colindaven View Post
              A 60bp SE read is too short to place accurately in many/most genomes.
              Going off the topic here (which is that the SOLiD is not good for denovo work) I wonder where you get that statement. It seems to me that 60 quality bases would be enough to place accurately except for long repeat regions (e.g., LTRs).

              Comment


              • #8
                @westerman

                It wasn't clear from the start whether the topic was de novo or reference based assembly.

                Have a look at the genome mappability score which came out of Mike Schatz's lab as one example (http://bioinformatics.oxfordjournals...8/16/2097.full).

                Even with 100bp perfect simulated single reads there are regions which cannot be mapped to reliably. Therefore, 60 bp reads containing errors won't be so nice to deal with. I remember working on human twin genomes and getting ~40-50,000 differences in VCF despite various SNP callers and stringent mapping quality filters.

                http://bioinformatics.oxfordjournals...expansion.html

                By the way, I work on plant genomes, and repetitive regions can be > 80%, so I thought the original poster might have similar issues.

                Comment


                • #9
                  Reagent support for SOLiD until May2017 or sooner per demand.

                  We use/used SOLiD for SAGE, great for short reads but more expensive than Illumina runs. Converting everything over to Illumina adapters now...

                  The couple times we did targeted reseq or whole transciptome, reverse read quality was bad.

                  Comment


                  • #10
                    Ok, thanks

                    Comment


                    • #11
                      Originally posted by westerman View Post
                      Going off the topic here (which is that the SOLiD is not good for denovo work) I wonder where you get that statement. It seems to me that 60 quality bases would be enough to place accurately except for long repeat regions (e.g., LTRs).
                      I suspect I've discussed this with you previously, but I might as well say things I haven't said before:

                      Homopolymers look identical in colour-space, which causes havoc for transcriptome assemblies (e.g. distinguishing between poly-T and poly-A sequences). Other simple repeats would also cause issues for genomic assembly (e.g. ACACACACAC and GTGTGTGTGT are identical, despite having both a base shift and a complementation). The assemblies are only likely to be useful in colour-space, because colour-space errors propagate through as very different sequences in base-space. Also, every contig has four possible base-space representations, which among other things makes it quite difficult to use other genome assemblies as scaffolds for a colour-space assembly.

                      Comment


                      • #12
                        Originally posted by gringer View Post
                        I suspect I've discussed this with you previously, but I might as well say things I haven't said before:

                        Homopolymers look identical in colour-space, which causes havoc for transcriptome assemblies (e.g. distinguishing between poly-T and poly-A sequences). Other simple repeats would also cause issues for genomic assembly (e.g. ACACACACAC and GTGTGTGTGT are identical, despite having both a base shift and a complementation). The assemblies are only likely to be useful in colour-space, because colour-space errors propagate through as very different sequences in base-space. Also, every contig has four possible base-space representations, which among other things makes it quite difficult to use other genome assemblies as scaffolds for a colour-space assembly.
                        I guess I still don't understand the "issues" with deconvoluting colour-space. It seems as though it would be much more accurate than sequencing in basespace (e.g. Illumina). That's if I'm reading this paper correctly (attached).
                        Attached Files

                        Comment


                        • #13
                          Originally posted by cement_head View Post
                          It seems as though it would be much more accurate than sequencing in basespace (e.g. Illumina). That's if I'm reading this paper correctly.
                          If our preferred model of DNA were colour-space, then it might have been more accurate with sufficient technology development. As it is, Illumina has had plenty of opportunity to improve the accuracy of their technology, and benefits from their chemical model being almost a direct representation of the DNA model that we use for sequencing.

                          Comment


                          • #14
                            Originally posted by cement_head View Post
                            I guess I still don't understand the "issues" with deconvoluting colour-space. It seems as though it would be much more accurate than sequencing in basespace (e.g. Illumina). That's if I'm reading this paper correctly (attached).
                            The quoted error rate (<0.1%) must be after reference-based correction. The problem with SOLiD was the high raw error rate of the ligation based chemistry (compared to Illumina) and the short read lengths which makes it essentially useless for de novo assembly.

                            I think the best option today for a large genome and a low budget would be to use the 10x Chromium with HiseqX (~$2000 for one lane PE150 linked reads from long fragments).

                            Comment


                            • #15
                              Originally posted by Chipper View Post
                              The quoted error rate (<0.1%) must be after reference-based correction. The problem with SOLiD was the high raw error rate of the ligation based chemistry (compared to Illumina) and the short read lengths which makes it essentially useless for de novo assembly.

                              I think the best option today for a large genome and a low budget would be to use the 10x Chromium with HiseqX (~$2000 for one lane PE150 linked reads from long fragments).
                              So I took another look at this and it strikes me that the whole problem is the use of only four fluors for 16 combinations. (Seems odd that this wasn't the primary issue attempted to be solved; i.e generating 16 distinct fluors.) Once I got that part, it became obvious why there's an issue with colourspace. Curiously, I just found out that MiniSeq and NextSeq from Illumina use only two fluors - seems like a huge potential issue is one isn't resequencing a human genome...

                              Comment

                              Working...
                              X