Unconfigured Ad

**westerman** · 02-19-2009, 10:30 AM

Errors and SNPs easier to detect in CS.

Let's assume I have a SNP. Then the color-space reads would look like:

(CS 1) T1113113
(CS 2) T1112013

Note that *two* color-space numbers are different for a single SNP. In base-space these reads are:

(BS 1) GTGCACG
(BS 2) GTGAACG

Note the SNP in the middle which is a C to A.

Double encoded, primer trimmed, these sequences look like:

(DET 1) CCTCCT
(DET 2) CCGACT

Put these into a traditional alignment program and you will get an alignment but now it looks like a "double-SNP" instead of the true single SNP.

Likewise sequencing errors show up easily in color-space but do not in double-encoding-using traditional programs. Let's say that our reads have a single number difference:

(CS 3) T1113113
(CS 4) T1112113

As double-encoded trimmed these look like:

(DET 3) CCTCCT
(DET 4) CCGCCT

Which a *traditional* program will happily work with and give you back a SNP. But what is the actual base-space alignment?

(BS 3) GTGCACG
(BS 4) GTGACAT

Ooops!

Note that this is one of the great strengths of color-space: sequencing errors stand out as a single number change and can be discarded or corrected. In the above case a color-space aware program would throw out the read that does not match the reference. Or in a de-novo assembly project throw out the read(s) that do not match other reads.

In fact I think that the above is so important I will repeat it. In color space sequencing errors are different than SNPs and thus are easily detected as errors. This is immense power over traditional sequencing representations.

**westerman** · 02-19-2009, 10:38 AM

Summary

I hope that the above three messages help show why *traditional* color-space-unaware programs can not cope with color-space even in the guise of double-encoded sequences. As I mentioned in my first post today, it is not a matter of the alphabet (0, 1, 2, 3 vs. A, C, G, T) but rather how the traditional programs work if they are unaware of color-space power and weaknesses.

Personally I wish ABI had never come up with 'double-encoding'. IMHO it is an abomination that lets people think that can use programs that they should not. In computer-geek talk double-encoding causes a massing case of GIGO. Only sometimes in computational biology it is hard to recognize the "GO".

**Mr Mutundes** · 02-19-2009, 12:53 PM

ah-ha!

Thanks westerman for taking the time to post such detailed (and gentle!) replies

**mchaisso** · 02-19-2009, 01:15 PM

There is some talk within de novo assembly circles to modify programs to take advantage of color space to assemble solid reads. There is a simple change in the preprocessing step in EULER to do this, and I imagine the VELVET tourbus method may be modified to align with dual-base encoding rather than base encoding.

-mark

**BioWizard** · 03-10-2009, 07:41 PM

ISAS (color space version) works natively in colorspace, and even "Valid Adjacent". See the thread about ISAS for explanation.

**RudyS** · 03-11-2009, 10:53 AM

Roald

I am using CLC Genomics Workbench 3 ... is this already outdated considering your posting here:

"We have just included native color space assembly in our NGS Cell software"

... I have some "issues" with the lack of connectivity of denovo contigs with WB3 ... i mean, the reads are there, but from what i am seeing WB3 is too finicky ... have been in touch with your colleagues in Denmark, but my "issues" pile up faster than their responses

would the new algorithm in the NGS Cell software improve my results? should people be buying that instead of the WB3?

RudyS

**Roald** · 03-19-2009, 02:53 AM

Hi RudyS,

Thank you for your interest.

We are now at version 3.2 with the Genomics Workbench and I recommend you to get the updated version if you have not done so already.
At the moment we are working on a number of improvements, amongst which is de novo assembly in color space, so stay tuned for that.
If you write me a personal message with the details, I will be happy to look into your support issues.

Regarding the NGS Cell, the algorithms and data structures used are the same as those within the Workbench. The Cell just offers these in a command line environment, suited for integration into a pipeline in a scripting environment.

Hope this helps.

Cheers

Roald

**nilshomer** · 04-21-2009, 12:14 AM

We have used BFAST for own alignment purposes (admittedly I am the author). We have don't use "valid adjacent" rules (i.e. heuristic), but a full dynamic programming algorithm (equivalent to solving a shortest path dag or HMM) to identify errors, SNPs and indels. I don't believe that other aligners (beyond SHRiMP) support alignment with indels, but I may be mistaken.

Finally, if there is a sequencing error (color error), without proper identification of the error (some would call this correction), all decoded bases after the color will be mismatches compared to the reference (or true underlying sequence). This is why you cannot use other alignment tools, since the truly you are dealing with encoded bases, which rely on a reference to identify errors. Now consider more difficult situations, where there is truly a variant (SNP) and a color error occurs in the first color encoding the variant. Similarly, there can be the pattern error, match, error which is equally likely to be a variant (SNP) and an error, but what do we prefer? It gets complicated, so I would instead use an aligner that supports color space (MAQ, Bowtie, BFAST, SHRiMP) so that you do not have to modify alignment algorithms.

Topics	Statistics	Last Post
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 13 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 24 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 28 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM
DNA Methylation Study Reveals How Epigenetic Changes Pass Between Generations by SEQadmin2 Started by SEQadmin2, 06-02-2026, 11:40 AM	0 responses 22 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 11:40 AM

Unconfigured Ad

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News