Seqanswers Leaderboard Ad

**nilshomer** · 04-12-2010, 02:46 PM

Originally posted by throwaway View Post

I would like to find some information on the distribution of errors in SOLiD data. I'm planning to use it to simulate a pooling sequencing strategy like the "DNA Sudoku" approach, and assess how badly SOLiD's error rate hurts the capacity to uniquely resolve variants in this scheme.

If I just assume errors are uniformly distributed along reads with a frequency of 0.03%, I am pretty sure the answer will be "Not much, go for it!" But I suspect that error model is too optimistic, and there are errors which correlate with sequence position and context. Ideally, I'd like to find a paper like "Substantial biases in ultra-short read data sets from high-throughput DNA sequencing", but for SOLiD rather than Illumina. Is there such a paper?

Another possibility would be a large corpus of public SOLiD data from loci which have been sequenced by other methods, so I could compare and look for and characterize errors myself.

You can take a look at the BAMs found here: http://genome.ucla.edu/U87
They store both the decoded bases as well as the origin color sequences (in the CS tag).

When you say that the error will be uniform, does that mean the error in the two-base encoded sequence will be uniform? The per-color sequencing error can range from 1-15% from the 5' to the 3' end of the read. After alignment, the base error rate is 0.5-1%.

Note that sequencing error and base differences are not the same like in Illumina. Sequencing errors may be identified during alignment, and when the two-base encoded sequence is decoded into bases, those identified errors may be corrected (usually this is done simultaneously). Alternatively, sequencing error may occur such that a false SNP is decoded, which would lead to a base difference. Such is the beauty and power of color space.

**drio** · 04-12-2010, 04:35 PM

Here you have a 50bp Fragment run and the mismatch rate per cycle. Before and after color space corrections:

Topics	Statistics	Last Post
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 12 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 59 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 54 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM

Seqanswers Leaderboard Ad

Announcement

Any data on SOLiD error characteristics?

Comment

Comment

Latest Articles

ad_right_rmr

News