Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • westerman
    Rick Westerman
    • Jun 2008
    • 1104

    #16
    Errors and SNPs easier to detect in CS.

    Let's assume I have a SNP. Then the color-space reads would look like:

    (CS 1) T1113113
    (CS 2) T1112013

    Note that *two* color-space numbers are different for a single SNP. In base-space these reads are:

    (BS 1) GTGCACG
    (BS 2) GTGAACG

    Note the SNP in the middle which is a C to A.

    Double encoded, primer trimmed, these sequences look like:

    (DET 1) CCTCCT
    (DET 2) CCGACT

    Put these into a traditional alignment program and you will get an alignment but now it looks like a "double-SNP" instead of the true single SNP.

    Likewise sequencing errors show up easily in color-space but do not in double-encoding-using traditional programs. Let's say that our reads have a single number difference:

    (CS 3) T1113113
    (CS 4) T1112113

    As double-encoded trimmed these look like:

    (DET 3) CCTCCT
    (DET 4) CCGCCT

    Which a *traditional* program will happily work with and give you back a SNP. But what is the actual base-space alignment?

    (BS 3) GTGCACG
    (BS 4) GTGACAT

    Ooops!

    Note that this is one of the great strengths of color-space: sequencing errors stand out as a single number change and can be discarded or corrected. In the above case a color-space aware program would throw out the read that does not match the reference. Or in a de-novo assembly project throw out the read(s) that do not match other reads.

    In fact I think that the above is so important I will repeat it. In color space sequencing errors are different than SNPs and thus are easily detected as errors. This is immense power over traditional sequencing representations.
    Last edited by westerman; 02-19-2009, 10:42 AM.

    Comment

    • westerman
      Rick Westerman
      • Jun 2008
      • 1104

      #17
      Summary

      I hope that the above three messages help show why *traditional* color-space-unaware programs can not cope with color-space even in the guise of double-encoded sequences. As I mentioned in my first post today, it is not a matter of the alphabet (0, 1, 2, 3 vs. A, C, G, T) but rather how the traditional programs work if they are unaware of color-space power and weaknesses.

      Personally I wish ABI had never come up with 'double-encoding'. IMHO it is an abomination that lets people think that can use programs that they should not. In computer-geek talk double-encoding causes a massing case of GIGO. Only sometimes in computational biology it is hard to recognize the "GO".

      Comment

      • Mr Mutundes
        Member
        • Jan 2009
        • 17

        #18
        ah-ha!

        Thanks westerman for taking the time to post such detailed (and gentle!) replies

        Comment

        • mchaisso
          Member
          • Apr 2008
          • 84

          #19
          There is some talk within de novo assembly circles to modify programs to take advantage of color space to assemble solid reads. There is a simple change in the preprocessing step in EULER to do this, and I imagine the VELVET tourbus method may be modified to align with dual-base encoding rather than base encoding.

          -mark

          Comment

          • BioWizard
            Member
            • Mar 2009
            • 27

            #20
            ISAS (color space version) works natively in colorspace, and even "Valid Adjacent". See the thread about ISAS for explanation.

            Comment

            • RudyS
              Member
              • May 2008
              • 20

              #21
              Roald

              I am using CLC Genomics Workbench 3 ... is this already outdated considering your posting here:

              "We have just included native color space assembly in our NGS Cell software"

              ... I have some "issues" with the lack of connectivity of denovo contigs with WB3 ... i mean, the reads are there, but from what i am seeing WB3 is too finicky ... have been in touch with your colleagues in Denmark, but my "issues" pile up faster than their responses

              would the new algorithm in the NGS Cell software improve my results? should people be buying that instead of the WB3?

              RudyS

              Comment

              • Roald
                Director at CLC bio
                • Aug 2008
                • 26

                #22
                Hi RudyS,

                Thank you for your interest.

                We are now at version 3.2 with the Genomics Workbench and I recommend you to get the updated version if you have not done so already.
                At the moment we are working on a number of improvements, amongst which is de novo assembly in color space, so stay tuned for that.
                If you write me a personal message with the details, I will be happy to look into your support issues.

                Regarding the NGS Cell, the algorithms and data structures used are the same as those within the Workbench. The Cell just offers these in a command line environment, suited for integration into a pipeline in a scripting environment.

                Hope this helps.

                Cheers

                Roald

                Comment

                • nilshomer
                  Nils Homer
                  • Nov 2008
                  • 1283

                  #23
                  We have used BFAST for own alignment purposes (admittedly I am the author). We have don't use "valid adjacent" rules (i.e. heuristic), but a full dynamic programming algorithm (equivalent to solving a shortest path dag or HMM) to identify errors, SNPs and indels. I don't believe that other aligners (beyond SHRiMP) support alignment with indels, but I may be mistaken.

                  Finally, if there is a sequencing error (color error), without proper identification of the error (some would call this correction), all decoded bases after the color will be mismatches compared to the reference (or true underlying sequence). This is why you cannot use other alignment tools, since the truly you are dealing with encoded bases, which rely on a reference to identify errors. Now consider more difficult situations, where there is truly a variant (SNP) and a color error occurs in the first color encoding the variant. Similarly, there can be the pattern error, match, error which is equally likely to be a variant (SNP) and an error, but what do we prefer? It gets complicated, so I would instead use an aligner that supports color space (MAQ, Bowtie, BFAST, SHRiMP) so that you do not have to modify alignment algorithms.

                  Comment

                  Latest Articles

                  Collapse

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, 06-05-2026, 10:09 AM
                  0 responses
                  13 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-04-2026, 08:59 AM
                  0 responses
                  24 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-02-2026, 12:03 PM
                  0 responses
                  28 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-02-2026, 11:40 AM
                  0 responses
                  22 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...