Seqanswers Leaderboard Ad

**Bukowski** · 02-11-2013, 12:32 PM

It's a fastq file from a SOLiD sequencer - so the base encodings are not in 'base space' but in 'color space'. It's 2 base encoding.

Take a look at:

Applied Biosystems | Thermo Fisher Scientific - US

http://marketing.appliedbiosystems.com/images/Product_Microsites/Solid_Knowledge_MS/pdf/CSHL_Fu.pdf

With a comprehensive portfolio of products, Applied Biosystems solutions from Thermo Fisher Scientific empower you to address today’s most pressing genetic challenges.

**wlangdon** · 02-12-2013, 11:21 AM

Dear Bukowski,
Many thanks for your rapid and helpful reply:-)
(I must admit I did not follow CSHL_Fu.pdf but I guess thats not necessary to use
the data.)

I am now using bowtie with --color. However I guess it will take 4 or 5 hours for
bowtie-build to create me a colorspace index.

BTW is there any reason why bowtie does not read-convert colorspace files
and use them with its usual indexes?
[I guess a more helpful error message would not go amiss either.]

Alternatively does anyone have a colorsequence to fasta conversion tool.

Many thanks
Bill

**Bukowski** · 02-13-2013, 12:36 AM

Originally posted by wlangdon View Post

Dear Bukowski,
Many thanks for your rapid and helpful reply:-)
(I must admit I did not follow CSHL_Fu.pdf but I guess thats not necessary to use
the data.)

I am now using bowtie with --color. However I guess it will take 4 or 5 hours for
bowtie-build to create me a colorspace index.

BTW is there any reason why bowtie does not read-convert colorspace files
and use them with its usual indexes?
[I guess a more helpful error message would not go amiss either.]

Alternatively does anyone have a colorsequence to fasta conversion tool.

Many thanks
Bill

I haven't done any work with SOLiD data for a couple of years, but the di-base encoding means that if you have an error in a base when you do your color space > base space conversion (and there are tools for this, but I never used them) then all subsequent bases in the read are wrong (as each colour encodes the transition between bases).

There's a good (brief) overview of some of the considerations in the SHRiMP paper:

SHRiMP: Accurate Mapping of Short Color-space Reads

http://www.ploscompbiol.org/article/info:doi/10.1371/journal.pcbi.1000386

Author Summary Next Generation Sequencing (NGS) technologies are revolutionizing the way biologists acquire and analyze genomic data. NGS machines, such as Illumina/Solexa and AB SOLiD, are able to sequence genomes more cheaply by 200-fold than previous methods. One of the main application areas of NGS technologies is the discovery of genomic variation within a given species. The first step in discovering this variation is the mapping of reads sequenced from a donor individual to a known (“reference”) genome. Differences between the reference and the reads are indicative either of polymorphisms, or of sequencing errors. Since the introduction of NGS technologies, many methods have been devised for mapping reads to reference genomes. However, these algorithms often sacrifice sensitivity for fast running time. While they are successful at mapping reads from organisms that exhibit low polymorphism rates, they do not perform well at mapping reads from highly polymorphic organisms. We present a novel read mapping method, SHRiMP, that can handle much greater amounts of polymorphism. Using Ciona savignyi as our target organism, we demonstrate that our method discovers significantly more variation than other methods. Additionally, we develop color-space extensions to classical alignment algorithms, allowing us to map color-space, or “dibase”, reads generated by AB SOLiD sequencers.

Consequently it is better to do the alignment in a color space aware tool, and if I worked with SOLiD data anymore that is what I would do. However some tools (such as BWA) have already dropped color space support.

**wlangdon** · 02-14-2013, 12:07 PM

Dear Bukowski,
Once again thank you for your very helpful reply.

It took just under 3 hours for bowtie-build to create a colorspace index
for the human genome (NCBI 37.5 ASM). It seems to be working well.

Thanks again
Bill

http://www.cs.ucl.ac.uk/staff/W.Langdon

**wlangdon** · 02-14-2013, 12:09 PM

Dear Bukowski,
Once again thank you for your very helpful reply.

It took just under 3 hours for bowtie-build to create a colorspace index
for the human genome (NCBI 37.5 ASM). It seems to be working well.

Thanks again
Bill

Langdon, William, W B

http://www.cs.ucl.ac.uk/staff/W.Langdon

william b langdon www home page

Topics	Statistics	Last Post
Study Reveals How Bacteria Defend Against Viral Attacks by seqadmin Started by seqadmin, 08-27-2024, 04:40 AM	0 responses 16 views 0 likes	Last Post by seqadmin 08-27-2024, 04:40 AM
New Single-Molecule Sequencing Platform Introduces Advanced Features for High-Throughput Genomics by seqadmin Started by seqadmin, 08-22-2024, 05:00 AM	0 responses 293 views 0 likes	Last Post by seqadmin 08-22-2024, 05:00 AM
New DNA Code Discovered Revealing Complex Gene Regulation Mechanisms by seqadmin Started by seqadmin, 08-21-2024, 10:49 AM	0 responses 135 views 0 likes	Last Post by seqadmin 08-21-2024, 10:49 AM
Epigenetic Clocks Derived from Retroelements Offer New Insights into Aging by seqadmin Started by seqadmin, 08-19-2024, 05:12 AM	0 responses 124 views 0 likes	Last Post by seqadmin 08-19-2024, 05:12 AM

Seqanswers Leaderboard Ad

Announcement

1000genomes SRR107002.filt.fastq.gz bad format?

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News