Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • doxologist
    Member
    • Jan 2009
    • 96

    csfasta --> fasta conversion

    I have a fast trivial question:
    what's the fastest/easier way to "decode" or convert the csfasta to fasta? I'm just doing this for a handful at a time for code-checking.

    thanks in advance.
  • lgoff
    Member
    • Feb 2008
    • 82

    #2
    Comparing?

    Are you looking to benchmark methods or just need to decode a set of sequences? Contact me directly if you would like a python module for SOLiD sequence manipulation(s) including csfasta --> fasta.

    Comment

    • Rao
      Member
      • Oct 2008
      • 36

      #3
      You mean converting colorspace seq.. to basespace seq...

      Comment

      • westerman
        Rick Westerman
        • Jun 2008
        • 1104

        #4
        The ABI 'corona lite' programs (which are free) include 'encodeFasta.py' which will encode and decode to/from color-space, base-space and that abomination 'double-encoded'-space.

        Comment

        • doxologist
          Member
          • Jan 2009
          • 96

          #5
          Originally posted by lgoff View Post
          Are you looking to benchmark methods or just need to decode a set of sequences? Contact me directly if you would like a python module for SOLiD sequence manipulation(s) including csfasta --> fasta.
          just for trivial conversion... decode

          Comment

          • jsun529
            Member
            • Apr 2009
            • 52

            #6
            Originally posted by westerman View Post
            The ABI 'corona lite' programs (which are free) include 'encodeFasta.py' which will encode and decode to/from color-space, base-space and that abomination 'double-encoded'-space.
            I get an error message run that code with :
            ImportError: No module named agapython.util.Dibase

            Where do I get the module? I run both code on Linux(ubuntu) and mac terminal, neither work

            Comment

            • westerman
              Rick Westerman
              • Jun 2008
              • 1104

              #7
              The module should come with corona lite. I suspect that you do not have your corona lite setup environment set up properly. From the README:

              3) Configure your environment *

              For csh/tcsh:
              % setenv CORONAROOT <INSTALL_DIR>/corona_lite
              % source $CORONAROOT/etc/profile.d/corona.csh

              For sh/ksh/bash:
              %export CORONAROOT=<INSTALL_DIR>/corona_lite
              %source $CORONAROOT/etc/profile.d/corona.sh

              * Remember to update your shell's init script (.cshrc, .bashrc,
              etc.) for future sessions with Corona Lite.

              Comment

              • roedel
                Junior Member
                • Jun 2009
                • 2

                #8
                csfasta -&gt; fasta

                When I tried to register at ABI to download the CORONA-lite program, I did not receive a confirmation. Then I used the colour scheme given in



                to write a perl script that does the conversion. As far as I understood, the first base in the csfasta is part of the adaptor sequence and should therefore be omitted in the fasta. This can be triggered by setting the shift parameter to 1 (0 would repeat the first base).

                ./csfasta2fasta.pl seqence.csfasta 1 > output.fasta

                If anyone could tell me if this does approximately the same as the CORONA-lite conversion script, I would be happy.
                Attached Files

                Comment

                • chiuchengliu
                  Junior Member
                  • Apr 2009
                  • 1

                  #9
                  Originally posted by roedel View Post
                  When I tried to register at ABI to download the CORONA-lite program, I did not receive a confirmation. Then I used the colour scheme given in



                  to write a perl script that does the conversion. As far as I understood, the first base in the csfasta is part of the adaptor sequence and should therefore be omitted in the fasta. This can be triggered by setting the shift parameter to 1 (0 would repeat the first base).

                  ./csfasta2fasta.pl seqence.csfasta 1 > output.fasta

                  If anyone could tell me if this does approximately the same as the CORONA-lite conversion script, I would be happy.
                  Your script works well except for an extra ">\n" in the output file.

                  ps: the translation of cs to bs loses the independent quality of adjacent color spaces. say, one miscalled colorspace in the middle will spoil the latter half bases.

                  Comment

                  • yoyoq
                    Junior Member
                    • Jul 2009
                    • 9

                    #10
                    thank you for that tool,

                    what the hell is double encoded fasta?

                    Comment

                    • westerman
                      Rick Westerman
                      • Jun 2008
                      • 1104

                      #11
                      'Double-encoded' is where a color-space file is encoded as ACGT. Said ACGT is not base space but a way to encode the 0123 of color-space into something that non color-space aware programs can use.

                      As an example, given the base-space sequence:

                      GTGCACCGTGCACG

                      This encodes into color-space:

                      G1131103113113

                      And can be double-encoded into:

                      GCCTCCATCCTCCT

                      Double-encoding is simple. 0 goes to 'A', 1 to 'C', etc. As I mention it is simply a way to make color-space into ACGT. I call it an abomination since it means nothing biologically useful yet looks like a biological sequence. It can lead to all sorts of false results if one does not realize what one is dealing with.

                      Comment

                      • yoyoq
                        Junior Member
                        • Jul 2009
                        • 9

                        #12
                        thanks

                        thanks,
                        yes i can confirm that it leads to biological confusion.

                        Comment

                        • yoyoq
                          Junior Member
                          • Jul 2009
                          • 9

                          #13
                          slight mod to conversion perl script

                          modified the conversion to avoid making that huge hash.
                          i was hitting memory limits the old way.
                          Attached Files

                          Comment

                          • mbreese
                            Junior Member
                            • Sep 2009
                            • 5

                            #14
                            The included colorspace -> basespace mapping is missing a few entries. Basically anything that includes a '4' or '.' is an N.

                            (Python format)
                            __colorspace = {
                            'A0': 'A',
                            'A1': 'C',
                            'A2': 'G',
                            'A3': 'T',
                            'A4': 'N',
                            'A.': 'N',
                            'C0': 'C',
                            'C1': 'A',
                            'C2': 'T',
                            'C3': 'G',
                            'C4': 'N',
                            'C.': 'N',
                            'G0': 'G',
                            'G1': 'T',
                            'G2': 'A',
                            'G3': 'C',
                            'G4': 'N',
                            'G.': 'N',
                            'T0': 'T',
                            'T1': 'G',
                            'T2': 'C',
                            'T3': 'A'
                            'T4': 'N',
                            'T.': 'N',
                            'N0': 'N',
                            'N1': 'N',
                            'N2': 'N',
                            'N3': 'N',
                            'N.': 'N',
                            }

                            Comment

                            • westerman
                              Rick Westerman
                              • Jun 2008
                              • 1104

                              #15
                              Actually you are also missing '5' and '6'. Also what about base-space that isn't an N (e.g., R, Y, etc.). Using a table like the above -- which is what the ABI-provided encodeFasta.py program uses -- is a poor way of handling the conversion IMHO. Unless you want to force non-1,2,3,4 to being a 4 and non-A,C,G,T to an N.

                              Comment

                              Latest Articles

                              Collapse

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              21 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              27 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              38 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              61 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...