Seqanswers Leaderboard Ad

**ShaunMahony** · 05-29-2014, 02:51 PM

Hi Alessandro1976,

You can't convert colorspace reads (e.g. from SOLiD) into sequence space with any degree of accuracy. Since colorspace bases are defined relative to the previous base, sequencing errors are propagated through the rest of the read. It's explained well here:

SOLiD seq process: Covert colorspace to basespace - SEQanswers

http://seqanswers.com/forums/showthread.php?t=16115

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

**gringer** · 05-31-2014, 05:05 AM

Programs that map colour-space are likely to also correct the reads when reporting the sequences in the BAM files (I know bowtie does this, and it makes sense for others to do this as well). The indexes that are mapped against must be in colour-space, and there are a few nice error-correction features in colour-space that mean it can be easier to distinguish between sequence changes and instrument error (e.g. a SNP needs two adjacent colour changes).

However, you will always run into issues when trying to interpret or compare the results of a colour-space run (e.g. in a genome browser) because colour-space is a completely different beast to base-space and doesn't make sense to humans -- see the post ShaunMahoney linked to for more details. Here's my recommended approach for carrying out such a comparison:

Transfer all the colour-space files onto an external hard disk
Delete all other copies of the colour-space files
Remove the hard drive from the computer
Use a sledgehammer or similar to squash the disk platters closer together
Withdraw $500 from the bank
Place the $500 on top of the hard drive
Return the hard drive (with the money) back to the client
Report to the client that there was insufficient data for a suitable analysis, and recommend that the experiment is repeated using a base-space sequencer

**Brian Bushnell** · 05-31-2014, 09:14 AM

Agreed.

Colorspace was a terrible design decision, and the fact that colorspace data persists wastes a lot of people's time and energy. It will always give inferior results in anything other than purely quantitative analysis like chip-seq. But, because of Solid's high error rate, it will give inferior results there, too.

Topics	Statistics	Last Post
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, Yesterday, 05:31 AM	0 responses 10 views 0 likes	Last Post by seqadmin Yesterday, 05:31 AM
Small Blood Stem Cell Subset Linked to Immune System Aging by seqadmin Started by seqadmin, 10-24-2024, 06:58 AM	0 responses 20 views 0 likes	Last Post by seqadmin 10-24-2024, 06:58 AM
New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types by seqadmin Started by seqadmin, 10-23-2024, 08:43 AM	0 responses 50 views 0 likes	Last Post by seqadmin 10-23-2024, 08:43 AM
Microbes in Urban Spaces Adapt to Disinfectants and Scarce Resources by seqadmin Started by seqadmin, 10-17-2024, 07:29 AM	0 responses 58 views 0 likes	Last Post by seqadmin 10-17-2024, 07:29 AM

Seqanswers Leaderboard Ad

Announcement

Problem with SOLiD data

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News