Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Translating nucleotide fasta file to AA

    What can I use to translate a large number of short nucleotides to all AA open reading frames?

    I have a 27 GB illumina library in fasta format containing dna nucleotide sequences. I would like to convert the entire file into an AA fasta file containing all open reading frames.

    I'm surprised and confused that I haven't found an easy way to do this. There are plenty of online tools, but I can't find a simple script that i can run on a server. I wrote a python script, but its too buggy. Does anyone know of a python script or something on git hub that I can download and run?


    Server specs: Ubuntu 12.04, 24 cores, 50 GB of RAM

  • #2
    Hmmm, I'm actually writing something to do that. Should be done later today.

    Comment


    • #3
      Well that's convenient.

      I'm very grateful, Brian.

      Comment


      • #4
        Transeq or Sixpack from EMBOSS should work (http://emboss.open-bio.org/) while you wait for Brian to make his program available.

        Comment


        • #5
          I recommend you use EMBOSS because all of the major translation tables are available for one reason. Also, you can access the methods through the web interface, with EBI's web services (programmatic access), from BioPerl, or through the command line. These programs are well tested so you likely won't find issues. If you do run into any issues, there is a large user base and getting help is easy.

          Comment


          • #6
            Ok, my tool's done and uploaded.

            translate6frames.sh in=nucleotides.fa out=acids.fa

            It handles fastq I/O also, translating the qualities. And it's very, very fast.
            Last edited by Brian Bushnell; 04-03-2014, 10:29 AM. Reason: translated6frames -> translate6frames

            Comment


            • #7
              Hi there.. I am Vicki and am a grad student learning the nuances of bioinformatics. I am looking forward to being an active member of this community.

              Cheers,
              Vicki

              Comment


              • #8
                BBmap

                Originally posted by Brian Bushnell View Post
                Ok, my tool's done and uploaded.

                translate6frames.sh in=nucleotides.fa out=acids.fa

                It handles fastq I/O also, translating the qualities. And it's very, very fast.
                Ok, this is a great program and screams! I have no issues with it running on FASTA or FASTQ file. However, I received a file from a colleague, but don't quite understand how to convert it to FASTA. In fact, I don't even know what the format of this file is. Can someone help me please? The file came from Sanger ABI 3730.

                Here is what a representative header looks like:

                TYPE: EST
                STATUS: New
                Clone#: Treelapse_B_1
                CLONE: Passaged_embryonic_stem_cell_FL_01_01_A01-C10B_A01
                SEQUENCE:

                Any help would be most appreciated.

                Cheers,
                Vicki

                Comment


                • #9
                  Hmmm... I don't recognize it, but if it is a standard output format for that machine, then Life Technologies should have a conversion tool. If there are only a few dozen sequences, you could even convert them to fasta manually.

                  This program claims to "convert anything to fasta", including a couple of ABI's normal output formats, so perhaps it would work?

                  FASTA to multi-Fasta format converter. Program merges FASTA files into a single multi-FASTA. Sample file format converter/merger. SCF, ABI, SEQ, FASTA. Convert your Fasta files in seconds! FASTA to multiFASTA format converter. This program will merge all FASTA protein files in a folder into a single multi-FASTA


                  I've never tried it, though.

                  Comment


                  • #10
                    translation

                    Thanks a bunch, Brian. Well, I have 1000s of sequences and, therefore, manual editing is out of the question. I will try the program you have suggested and report back.

                    Comment


                    • #11
                      If actual sequence starts after the SEQUENCE tag in your example then a simple script may be what you need to get the CLONE#/CLONE and the SEQUENCE out of the file and in to a new one.

                      Comment


                      • #12
                        Well, I tried the trial version of DNAbaser, but it took for ever (not like your translate6frame.sh) and ultimately didn't do a good job of the conversion. So, I uninstalled it promptly. Ultimately, I got my colleague from Germany to send me a fasta formatted file. Thanks for the tips, however.

                        Comment


                        • #13
                          Thanks! I was thinking along these lines, but thought that someone may have already solved this problem.

                          Comment

                          Latest Articles

                          Collapse

                          • seqadmin
                            Recent Advances in Sequencing Analysis Tools
                            by seqadmin


                            The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                            05-06-2024, 07:48 AM
                          • seqadmin
                            Essential Discoveries and Tools in Epitranscriptomics
                            by seqadmin




                            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                            04-22-2024, 07:01 AM

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by seqadmin, 05-14-2024, 07:03 AM
                          0 responses
                          20 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 05-10-2024, 06:35 AM
                          0 responses
                          44 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 05-09-2024, 02:46 PM
                          0 responses
                          54 views
                          0 likes
                          Last Post seqadmin  
                          Started by seqadmin, 05-07-2024, 06:57 AM
                          0 responses
                          42 views
                          0 likes
                          Last Post seqadmin  
                          Working...
                          X