Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • csfasta --> fasta conversion

    I have a fast trivial question:
    what's the fastest/easier way to "decode" or convert the csfasta to fasta? I'm just doing this for a handful at a time for code-checking.

    thanks in advance.

  • #2
    Comparing?

    Are you looking to benchmark methods or just need to decode a set of sequences? Contact me directly if you would like a python module for SOLiD sequence manipulation(s) including csfasta --> fasta.

    Comment


    • #3
      You mean converting colorspace seq.. to basespace seq...

      Comment


      • #4
        The ABI 'corona lite' programs (which are free) include 'encodeFasta.py' which will encode and decode to/from color-space, base-space and that abomination 'double-encoded'-space.

        Comment


        • #5
          Originally posted by lgoff View Post
          Are you looking to benchmark methods or just need to decode a set of sequences? Contact me directly if you would like a python module for SOLiD sequence manipulation(s) including csfasta --> fasta.
          just for trivial conversion... decode

          Comment


          • #6
            Originally posted by westerman View Post
            The ABI 'corona lite' programs (which are free) include 'encodeFasta.py' which will encode and decode to/from color-space, base-space and that abomination 'double-encoded'-space.
            I get an error message run that code with :
            ImportError: No module named agapython.util.Dibase

            Where do I get the module? I run both code on Linux(ubuntu) and mac terminal, neither work

            Comment


            • #7
              The module should come with corona lite. I suspect that you do not have your corona lite setup environment set up properly. From the README:

              3) Configure your environment *

              For csh/tcsh:
              % setenv CORONAROOT <INSTALL_DIR>/corona_lite
              % source $CORONAROOT/etc/profile.d/corona.csh

              For sh/ksh/bash:
              %export CORONAROOT=<INSTALL_DIR>/corona_lite
              %source $CORONAROOT/etc/profile.d/corona.sh

              * Remember to update your shell's init script (.cshrc, .bashrc,
              etc.) for future sessions with Corona Lite.

              Comment


              • #8
                csfasta -&gt; fasta

                When I tried to register at ABI to download the CORONA-lite program, I did not receive a confirmation. Then I used the colour scheme given in

                www.iscb.org/uploaded/css/36/12104.pdf

                to write a perl script that does the conversion. As far as I understood, the first base in the csfasta is part of the adaptor sequence and should therefore be omitted in the fasta. This can be triggered by setting the shift parameter to 1 (0 would repeat the first base).

                ./csfasta2fasta.pl seqence.csfasta 1 > output.fasta

                If anyone could tell me if this does approximately the same as the CORONA-lite conversion script, I would be happy.
                Attached Files

                Comment


                • #9
                  Originally posted by roedel View Post
                  When I tried to register at ABI to download the CORONA-lite program, I did not receive a confirmation. Then I used the colour scheme given in

                  www.iscb.org/uploaded/css/36/12104.pdf

                  to write a perl script that does the conversion. As far as I understood, the first base in the csfasta is part of the adaptor sequence and should therefore be omitted in the fasta. This can be triggered by setting the shift parameter to 1 (0 would repeat the first base).

                  ./csfasta2fasta.pl seqence.csfasta 1 > output.fasta

                  If anyone could tell me if this does approximately the same as the CORONA-lite conversion script, I would be happy.
                  Your script works well except for an extra ">\n" in the output file.

                  ps: the translation of cs to bs loses the independent quality of adjacent color spaces. say, one miscalled colorspace in the middle will spoil the latter half bases.

                  Comment


                  • #10
                    thank you for that tool,

                    what the hell is double encoded fasta?

                    Comment


                    • #11
                      'Double-encoded' is where a color-space file is encoded as ACGT. Said ACGT is not base space but a way to encode the 0123 of color-space into something that non color-space aware programs can use.

                      As an example, given the base-space sequence:

                      GTGCACCGTGCACG

                      This encodes into color-space:

                      G1131103113113

                      And can be double-encoded into:

                      GCCTCCATCCTCCT

                      Double-encoding is simple. 0 goes to 'A', 1 to 'C', etc. As I mention it is simply a way to make color-space into ACGT. I call it an abomination since it means nothing biologically useful yet looks like a biological sequence. It can lead to all sorts of false results if one does not realize what one is dealing with.

                      Comment


                      • #12
                        thanks

                        thanks,
                        yes i can confirm that it leads to biological confusion.

                        Comment


                        • #13
                          slight mod to conversion perl script

                          modified the conversion to avoid making that huge hash.
                          i was hitting memory limits the old way.
                          Attached Files

                          Comment


                          • #14
                            The included colorspace -> basespace mapping is missing a few entries. Basically anything that includes a '4' or '.' is an N.

                            (Python format)
                            __colorspace = {
                            'A0': 'A',
                            'A1': 'C',
                            'A2': 'G',
                            'A3': 'T',
                            'A4': 'N',
                            'A.': 'N',
                            'C0': 'C',
                            'C1': 'A',
                            'C2': 'T',
                            'C3': 'G',
                            'C4': 'N',
                            'C.': 'N',
                            'G0': 'G',
                            'G1': 'T',
                            'G2': 'A',
                            'G3': 'C',
                            'G4': 'N',
                            'G.': 'N',
                            'T0': 'T',
                            'T1': 'G',
                            'T2': 'C',
                            'T3': 'A'
                            'T4': 'N',
                            'T.': 'N',
                            'N0': 'N',
                            'N1': 'N',
                            'N2': 'N',
                            'N3': 'N',
                            'N.': 'N',
                            }

                            Comment


                            • #15
                              Actually you are also missing '5' and '6'. Also what about base-space that isn't an N (e.g., R, Y, etc.). Using a table like the above -- which is what the ABI-provided encodeFasta.py program uses -- is a poor way of handling the conversion IMHO. Unless you want to force non-1,2,3,4 to being a 4 and non-A,C,G,T to an N.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM
                              • seqadmin
                                Techniques and Challenges in Conservation Genomics
                                by seqadmin



                                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                                Avian Conservation
                                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                                03-08-2024, 10:41 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:37 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 06:07 PM
                              0 responses
                              9 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-22-2024, 10:03 AM
                              0 responses
                              49 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 03-21-2024, 07:32 AM
                              0 responses
                              67 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X