Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Errors and SNPs easier to detect in CS.

    Let's assume I have a SNP. Then the color-space reads would look like:

    (CS 1) T1113113
    (CS 2) T1112013

    Note that *two* color-space numbers are different for a single SNP. In base-space these reads are:

    (BS 1) GTGCACG
    (BS 2) GTGAACG

    Note the SNP in the middle which is a C to A.

    Double encoded, primer trimmed, these sequences look like:

    (DET 1) CCTCCT
    (DET 2) CCGACT

    Put these into a traditional alignment program and you will get an alignment but now it looks like a "double-SNP" instead of the true single SNP.

    Likewise sequencing errors show up easily in color-space but do not in double-encoding-using traditional programs. Let's say that our reads have a single number difference:

    (CS 3) T1113113
    (CS 4) T1112113

    As double-encoded trimmed these look like:

    (DET 3) CCTCCT
    (DET 4) CCGCCT

    Which a *traditional* program will happily work with and give you back a SNP. But what is the actual base-space alignment?

    (BS 3) GTGCACG
    (BS 4) GTGACAT

    Ooops!

    Note that this is one of the great strengths of color-space: sequencing errors stand out as a single number change and can be discarded or corrected. In the above case a color-space aware program would throw out the read that does not match the reference. Or in a de-novo assembly project throw out the read(s) that do not match other reads.

    In fact I think that the above is so important I will repeat it. In color space sequencing errors are different than SNPs and thus are easily detected as errors. This is immense power over traditional sequencing representations.
    Last edited by westerman; 02-19-2009, 10:42 AM.

    Comment


    • #17
      Summary

      I hope that the above three messages help show why *traditional* color-space-unaware programs can not cope with color-space even in the guise of double-encoded sequences. As I mentioned in my first post today, it is not a matter of the alphabet (0, 1, 2, 3 vs. A, C, G, T) but rather how the traditional programs work if they are unaware of color-space power and weaknesses.

      Personally I wish ABI had never come up with 'double-encoding'. IMHO it is an abomination that lets people think that can use programs that they should not. In computer-geek talk double-encoding causes a massing case of GIGO. Only sometimes in computational biology it is hard to recognize the "GO".

      Comment


      • #18
        ah-ha!

        Thanks westerman for taking the time to post such detailed (and gentle!) replies

        Comment


        • #19
          There is some talk within de novo assembly circles to modify programs to take advantage of color space to assemble solid reads. There is a simple change in the preprocessing step in EULER to do this, and I imagine the VELVET tourbus method may be modified to align with dual-base encoding rather than base encoding.

          -mark

          Comment


          • #20
            ISAS (color space version) works natively in colorspace, and even "Valid Adjacent". See the thread about ISAS for explanation.

            Comment


            • #21
              Roald

              I am using CLC Genomics Workbench 3 ... is this already outdated considering your posting here:

              "We have just included native color space assembly in our NGS Cell software"

              ... I have some "issues" with the lack of connectivity of denovo contigs with WB3 ... i mean, the reads are there, but from what i am seeing WB3 is too finicky ... have been in touch with your colleagues in Denmark, but my "issues" pile up faster than their responses

              would the new algorithm in the NGS Cell software improve my results? should people be buying that instead of the WB3?

              RudyS

              Comment


              • #22
                Hi RudyS,

                Thank you for your interest.

                We are now at version 3.2 with the Genomics Workbench and I recommend you to get the updated version if you have not done so already.
                At the moment we are working on a number of improvements, amongst which is de novo assembly in color space, so stay tuned for that.
                If you write me a personal message with the details, I will be happy to look into your support issues.

                Regarding the NGS Cell, the algorithms and data structures used are the same as those within the Workbench. The Cell just offers these in a command line environment, suited for integration into a pipeline in a scripting environment.

                Hope this helps.

                Cheers

                Roald

                Comment


                • #23
                  We have used BFAST for own alignment purposes (admittedly I am the author). We have don't use "valid adjacent" rules (i.e. heuristic), but a full dynamic programming algorithm (equivalent to solving a shortest path dag or HMM) to identify errors, SNPs and indels. I don't believe that other aligners (beyond SHRiMP) support alignment with indels, but I may be mistaken.

                  Finally, if there is a sequencing error (color error), without proper identification of the error (some would call this correction), all decoded bases after the color will be mismatches compared to the reference (or true underlying sequence). This is why you cannot use other alignment tools, since the truly you are dealing with encoded bases, which rely on a reference to identify errors. Now consider more difficult situations, where there is truly a variant (SNP) and a color error occurs in the first color encoding the variant. Similarly, there can be the pattern error, match, error which is equally likely to be a variant (SNP) and an error, but what do we prefer? It gets complicated, so I would instead use an aligner that supports color space (MAQ, Bowtie, BFAST, SHRiMP) so that you do not have to modify alignment algorithms.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Strategies for Sequencing Challenging Samples
                    by seqadmin


                    Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                    03-22-2024, 06:39 AM
                  • seqadmin
                    Techniques and Challenges in Conservation Genomics
                    by seqadmin



                    The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                    Avian Conservation
                    Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                    03-08-2024, 10:41 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, Yesterday, 06:37 PM
                  0 responses
                  12 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, Yesterday, 06:07 PM
                  0 responses
                  10 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-22-2024, 10:03 AM
                  0 responses
                  51 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 03-21-2024, 07:32 AM
                  0 responses
                  68 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X