Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • doxologist
    Member
    • Jan 2009
    • 96

    csfasta --> fasta conversion

    I have a fast trivial question:
    what's the fastest/easier way to "decode" or convert the csfasta to fasta? I'm just doing this for a handful at a time for code-checking.

    thanks in advance.
  • lgoff
    Member
    • Feb 2008
    • 82

    #2
    Comparing?

    Are you looking to benchmark methods or just need to decode a set of sequences? Contact me directly if you would like a python module for SOLiD sequence manipulation(s) including csfasta --> fasta.

    Comment

    • Rao
      Member
      • Oct 2008
      • 36

      #3
      You mean converting colorspace seq.. to basespace seq...

      Comment

      • westerman
        Rick Westerman
        • Jun 2008
        • 1104

        #4
        The ABI 'corona lite' programs (which are free) include 'encodeFasta.py' which will encode and decode to/from color-space, base-space and that abomination 'double-encoded'-space.

        Comment

        • doxologist
          Member
          • Jan 2009
          • 96

          #5
          Originally posted by lgoff View Post
          Are you looking to benchmark methods or just need to decode a set of sequences? Contact me directly if you would like a python module for SOLiD sequence manipulation(s) including csfasta --> fasta.
          just for trivial conversion... decode

          Comment

          • jsun529
            Member
            • Apr 2009
            • 52

            #6
            Originally posted by westerman View Post
            The ABI 'corona lite' programs (which are free) include 'encodeFasta.py' which will encode and decode to/from color-space, base-space and that abomination 'double-encoded'-space.
            I get an error message run that code with :
            ImportError: No module named agapython.util.Dibase

            Where do I get the module? I run both code on Linux(ubuntu) and mac terminal, neither work

            Comment

            • westerman
              Rick Westerman
              • Jun 2008
              • 1104

              #7
              The module should come with corona lite. I suspect that you do not have your corona lite setup environment set up properly. From the README:

              3) Configure your environment *

              For csh/tcsh:
              % setenv CORONAROOT <INSTALL_DIR>/corona_lite
              % source $CORONAROOT/etc/profile.d/corona.csh

              For sh/ksh/bash:
              %export CORONAROOT=<INSTALL_DIR>/corona_lite
              %source $CORONAROOT/etc/profile.d/corona.sh

              * Remember to update your shell's init script (.cshrc, .bashrc,
              etc.) for future sessions with Corona Lite.

              Comment

              • roedel
                Junior Member
                • Jun 2009
                • 2

                #8
                csfasta -&gt; fasta

                When I tried to register at ABI to download the CORONA-lite program, I did not receive a confirmation. Then I used the colour scheme given in



                to write a perl script that does the conversion. As far as I understood, the first base in the csfasta is part of the adaptor sequence and should therefore be omitted in the fasta. This can be triggered by setting the shift parameter to 1 (0 would repeat the first base).

                ./csfasta2fasta.pl seqence.csfasta 1 > output.fasta

                If anyone could tell me if this does approximately the same as the CORONA-lite conversion script, I would be happy.
                Attached Files

                Comment

                • chiuchengliu
                  Junior Member
                  • Apr 2009
                  • 1

                  #9
                  Originally posted by roedel View Post
                  When I tried to register at ABI to download the CORONA-lite program, I did not receive a confirmation. Then I used the colour scheme given in



                  to write a perl script that does the conversion. As far as I understood, the first base in the csfasta is part of the adaptor sequence and should therefore be omitted in the fasta. This can be triggered by setting the shift parameter to 1 (0 would repeat the first base).

                  ./csfasta2fasta.pl seqence.csfasta 1 > output.fasta

                  If anyone could tell me if this does approximately the same as the CORONA-lite conversion script, I would be happy.
                  Your script works well except for an extra ">\n" in the output file.

                  ps: the translation of cs to bs loses the independent quality of adjacent color spaces. say, one miscalled colorspace in the middle will spoil the latter half bases.

                  Comment

                  • yoyoq
                    Junior Member
                    • Jul 2009
                    • 9

                    #10
                    thank you for that tool,

                    what the hell is double encoded fasta?

                    Comment

                    • westerman
                      Rick Westerman
                      • Jun 2008
                      • 1104

                      #11
                      'Double-encoded' is where a color-space file is encoded as ACGT. Said ACGT is not base space but a way to encode the 0123 of color-space into something that non color-space aware programs can use.

                      As an example, given the base-space sequence:

                      GTGCACCGTGCACG

                      This encodes into color-space:

                      G1131103113113

                      And can be double-encoded into:

                      GCCTCCATCCTCCT

                      Double-encoding is simple. 0 goes to 'A', 1 to 'C', etc. As I mention it is simply a way to make color-space into ACGT. I call it an abomination since it means nothing biologically useful yet looks like a biological sequence. It can lead to all sorts of false results if one does not realize what one is dealing with.

                      Comment

                      • yoyoq
                        Junior Member
                        • Jul 2009
                        • 9

                        #12
                        thanks

                        thanks,
                        yes i can confirm that it leads to biological confusion.

                        Comment

                        • yoyoq
                          Junior Member
                          • Jul 2009
                          • 9

                          #13
                          slight mod to conversion perl script

                          modified the conversion to avoid making that huge hash.
                          i was hitting memory limits the old way.
                          Attached Files

                          Comment

                          • mbreese
                            Junior Member
                            • Sep 2009
                            • 5

                            #14
                            The included colorspace -> basespace mapping is missing a few entries. Basically anything that includes a '4' or '.' is an N.

                            (Python format)
                            __colorspace = {
                            'A0': 'A',
                            'A1': 'C',
                            'A2': 'G',
                            'A3': 'T',
                            'A4': 'N',
                            'A.': 'N',
                            'C0': 'C',
                            'C1': 'A',
                            'C2': 'T',
                            'C3': 'G',
                            'C4': 'N',
                            'C.': 'N',
                            'G0': 'G',
                            'G1': 'T',
                            'G2': 'A',
                            'G3': 'C',
                            'G4': 'N',
                            'G.': 'N',
                            'T0': 'T',
                            'T1': 'G',
                            'T2': 'C',
                            'T3': 'A'
                            'T4': 'N',
                            'T.': 'N',
                            'N0': 'N',
                            'N1': 'N',
                            'N2': 'N',
                            'N3': 'N',
                            'N.': 'N',
                            }

                            Comment

                            • westerman
                              Rick Westerman
                              • Jun 2008
                              • 1104

                              #15
                              Actually you are also missing '5' and '6'. Also what about base-space that isn't an N (e.g., R, Y, etc.). Using a table like the above -- which is what the ABI-provided encodeFasta.py program uses -- is a poor way of handling the conversion IMHO. Unless you want to force non-1,2,3,4 to being a 4 and non-A,C,G,T to an N.

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM
                              • SEQadmin2
                                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                by SEQadmin2

                                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                05-06-2026, 09:04 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 08:59 AM
                              0 responses
                              14 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              22 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              19 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              32 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...