Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GenBank to .tbl (Sequin format)

    Hi everyone,

    I'm working on submitting a set of whole genome shotgun sequencing projects to GenBank/NCBI. For this set of genomes, I have annotations which were generated using the RAST system (in GenBank and FFF format). However, in order to submit to GenBank/NCBI, these annotations need to be converted to what NCBI calls a 'feature table' (Sequin format/.tbl file). The file format is detailed here: http://www.ncbi.nlm.nih.gov/Sequin/table.html

    I've searched the web for parsers to create the required table format using either GenBank or FFF formated files, and have asked the NCBI support staff if they know of such a parser. However, I have not been able to find one. Does anyone know where I can find something to convert between GenBank or FFF and the NCBI feature table format?

    Thanks in advance!

    Sincerely,
    Erin

  • #2
    I thought you could give them GenBank/EMBL files too? Maybe I'm thinking of EMBL not the NCBI...

    P.S. What is this "FFF format"? I thought it was a typo for GFF, but you did it three times.

    Comment


    • #3
      Originally posted by maubp View Post
      I thought you could give them GenBank/EMBL files too? Maybe I'm thinking of EMBL not the NCBI...
      Unfortunately not Bit crazy. But it's easy to write a conversion script between the two. I've got one somewhere.

      Comment


      • #4
        I asked and they won't except GenBank files. It seems a bit crazy, since that's what they're going to make out of the .tbl/Sequin file anyway.

        I'm sure I could write my own conversion script, but I'm a bit new to this whole scripting business, so it may take me a whole. I thought it was worth checking with the community to see if someone had one handy before I go through the trouble.

        And yes, FFF was a typo for GFF. Guess my thinking cap was a bit loose at the end of the day. Sorry for the confusion.

        Comment


        • #5
          Just found one parser that claims to convert between GenBank and Sequin, but it appears to work for only one contig at a time (created table ends after the last gene of the first contig) and ignores tRNAs.

          Comment


          • #6
            I'll try and dig out my script.

            If it's any help, Torsten Seemann's automated annotation pipeline can output sequin and/or table format:

            Comment


            • #7
              Thanks nickloman, we've thought about just re-doing the annotations through NCBI's pipeline, but the problem is we already used the annotations we have for all of our analyses and want to have them associated with the genomes when we submit them. I'm working on seeing if I can use the parser I posted above if I pre-split the files into contigs and add the tRNAs/rRNAs by hand, but I'll keep an eye out in case you find your script first!

              Comment


              • #8
                Found it! Hope it's vaguely useful:

                genbank_to_tbl.py. GitHub Gist: instantly share code, notes, and snippets.

                Comment


                • #9
                  Great! Thanks!

                  Erin

                  Comment


                  • #10
                    Hey nikloman,

                    Just as an fyi and a note for potential future users of your script, the code you linked to broke at the first CDS feature in my GBK. I made a couple of minor changes and it seems to work now, although it doesn't pick up the annotations for the tRNAs/rRNAs. At this point I figure it's relatively trivial to go through and add those in by hand for a small number of genomes. In the future I will be submitting an additional ~70 genomes, and will (hopefully) post an updated script with that feature fixed.

                    I've attached my edits as a plain text file (the forum wont accept a .py file).

                    Thank you again!

                    Erin
                    Attached Files

                    Comment


                    • #11
                      Ah OK, well it's like most scripts - you get it working for your problem and then you forget about it. But glad you could make it run for you!

                      Comment


                      • #12
                        Have either of you found a gff to the Sequin format/.tbl file converter?

                        Comment


                        • #13
                          Originally posted by oudacontrol View Post
                          Have either of you found a gff to the Sequin format/.tbl file converter?
                          nickloman's script works fine for the format conversion itself, but then there are a myriad of changes that must be made to your original annotations to conform with GenBank naming conventions. For the number of genomes I'm submitting, I found it easier to just submit the fasta files for re-submission through NCBI's pipeline, which spits out Sequin formatted files.

                          Comment


                          • #14
                            Hi everyone.

                            for people who have Artemis intalled on their computer, you can also open the .gbk with the soft and use the 'SAVE AS' menu to save it under the sequin/tbl format.

                            All features are kept, as well as tRNA and rRNA information.

                            hope it may help.

                            seb.

                            Comment


                            • #15
                              Thanks, it helps me, but Artemis can only read and convert the first contig in a muti-genbank file.

                              Originally posted by seb.lees View Post
                              Hi everyone.

                              for people who have Artemis intalled on their computer, you can also open the .gbk with the soft and use the 'SAVE AS' menu to save it under the sequin/tbl format.

                              All features are kept, as well as tRNA and rRNA information.

                              hope it may help.

                              seb.
                              Last edited by wanyu; 06-15-2015, 03:26 AM.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Latest Developments in Precision Medicine
                                by seqadmin



                                Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                                Somatic Genomics
                                “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                                05-24-2024, 01:16 PM
                              • seqadmin
                                Recent Advances in Sequencing Analysis Tools
                                by seqadmin


                                The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                                05-06-2024, 07:48 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 05-24-2024, 07:15 AM
                              0 responses
                              13 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 05-23-2024, 10:28 AM
                              0 responses
                              17 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 05-23-2024, 07:35 AM
                              0 responses
                              20 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 05-22-2024, 02:06 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X