Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ErinL
    Junior Member
    • Jun 2011
    • 7

    GenBank to .tbl (Sequin format)

    Hi everyone,

    I'm working on submitting a set of whole genome shotgun sequencing projects to GenBank/NCBI. For this set of genomes, I have annotations which were generated using the RAST system (in GenBank and FFF format). However, in order to submit to GenBank/NCBI, these annotations need to be converted to what NCBI calls a 'feature table' (Sequin format/.tbl file). The file format is detailed here: http://www.ncbi.nlm.nih.gov/Sequin/table.html

    I've searched the web for parsers to create the required table format using either GenBank or FFF formated files, and have asked the NCBI support staff if they know of such a parser. However, I have not been able to find one. Does anyone know where I can find something to convert between GenBank or FFF and the NCBI feature table format?

    Thanks in advance!

    Sincerely,
    Erin
  • maubp
    Peter (Biopython etc)
    • Jul 2009
    • 1544

    #2
    I thought you could give them GenBank/EMBL files too? Maybe I'm thinking of EMBL not the NCBI...

    P.S. What is this "FFF format"? I thought it was a typo for GFF, but you did it three times.

    Comment

    • nickloman
      Senior Member
      • Jul 2009
      • 355

      #3
      Originally posted by maubp View Post
      I thought you could give them GenBank/EMBL files too? Maybe I'm thinking of EMBL not the NCBI...
      Unfortunately not Bit crazy. But it's easy to write a conversion script between the two. I've got one somewhere.

      Comment

      • ErinL
        Junior Member
        • Jun 2011
        • 7

        #4
        I asked and they won't except GenBank files. It seems a bit crazy, since that's what they're going to make out of the .tbl/Sequin file anyway.

        I'm sure I could write my own conversion script, but I'm a bit new to this whole scripting business, so it may take me a whole. I thought it was worth checking with the community to see if someone had one handy before I go through the trouble.

        And yes, FFF was a typo for GFF. Guess my thinking cap was a bit loose at the end of the day. Sorry for the confusion.

        Comment

        • ErinL
          Junior Member
          • Jun 2011
          • 7

          #5
          Just found one parser that claims to convert between GenBank and Sequin, but it appears to work for only one contig at a time (created table ends after the last gene of the first contig) and ignores tRNAs.

          Comment

          • nickloman
            Senior Member
            • Jul 2009
            • 355

            #6
            I'll try and dig out my script.

            If it's any help, Torsten Seemann's automated annotation pipeline can output sequin and/or table format:

            Comment

            • ErinL
              Junior Member
              • Jun 2011
              • 7

              #7
              Thanks nickloman, we've thought about just re-doing the annotations through NCBI's pipeline, but the problem is we already used the annotations we have for all of our analyses and want to have them associated with the genomes when we submit them. I'm working on seeing if I can use the parser I posted above if I pre-split the files into contigs and add the tRNAs/rRNAs by hand, but I'll keep an eye out in case you find your script first!

              Comment

              • nickloman
                Senior Member
                • Jul 2009
                • 355

                #8
                Found it! Hope it's vaguely useful:

                genbank_to_tbl.py. GitHub Gist: instantly share code, notes, and snippets.

                Comment

                • ErinL
                  Junior Member
                  • Jun 2011
                  • 7

                  #9
                  Great! Thanks!

                  Erin

                  Comment

                  • ErinL
                    Junior Member
                    • Jun 2011
                    • 7

                    #10
                    Hey nikloman,

                    Just as an fyi and a note for potential future users of your script, the code you linked to broke at the first CDS feature in my GBK. I made a couple of minor changes and it seems to work now, although it doesn't pick up the annotations for the tRNAs/rRNAs. At this point I figure it's relatively trivial to go through and add those in by hand for a small number of genomes. In the future I will be submitting an additional ~70 genomes, and will (hopefully) post an updated script with that feature fixed.

                    I've attached my edits as a plain text file (the forum wont accept a .py file).

                    Thank you again!

                    Erin
                    Attached Files

                    Comment

                    • nickloman
                      Senior Member
                      • Jul 2009
                      • 355

                      #11
                      Ah OK, well it's like most scripts - you get it working for your problem and then you forget about it. But glad you could make it run for you!

                      Comment

                      • oudacontrol
                        Junior Member
                        • Jan 2011
                        • 3

                        #12
                        Have either of you found a gff to the Sequin format/.tbl file converter?

                        Comment

                        • ErinL
                          Junior Member
                          • Jun 2011
                          • 7

                          #13
                          Originally posted by oudacontrol View Post
                          Have either of you found a gff to the Sequin format/.tbl file converter?
                          nickloman's script works fine for the format conversion itself, but then there are a myriad of changes that must be made to your original annotations to conform with GenBank naming conventions. For the number of genomes I'm submitting, I found it easier to just submit the fasta files for re-submission through NCBI's pipeline, which spits out Sequin formatted files.

                          Comment

                          • seb.lees
                            Member
                            • Sep 2012
                            • 12

                            #14
                            Hi everyone.

                            for people who have Artemis intalled on their computer, you can also open the .gbk with the soft and use the 'SAVE AS' menu to save it under the sequin/tbl format.

                            All features are kept, as well as tRNA and rRNA information.

                            hope it may help.

                            seb.

                            Comment

                            • wanyu
                              Junior Member
                              • May 2015
                              • 4

                              #15
                              Thanks, it helps me, but Artemis can only read and convert the first contig in a muti-genbank file.

                              Originally posted by seb.lees View Post
                              Hi everyone.

                              for people who have Artemis intalled on their computer, you can also open the .gbk with the soft and use the 'SAVE AS' menu to save it under the sequin/tbl format.

                              All features are kept, as well as tRNA and rRNA information.

                              hope it may help.

                              seb.
                              Last edited by wanyu; 06-15-2015, 03:26 AM.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Pathogen Surveillance with Advanced Genomic Tools
                                by seqadmin




                                The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                                03-24-2025, 11:48 AM
                              • seqadmin
                                New Genomics Tools and Methods Shared at AGBT 2025
                                by seqadmin


                                This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                                The Headliner
                                The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                                03-03-2025, 01:39 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 03-20-2025, 05:03 AM
                              0 responses
                              41 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-19-2025, 07:27 AM
                              0 responses
                              49 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-18-2025, 12:50 PM
                              0 responses
                              36 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-03-2025, 01:15 PM
                              0 responses
                              192 views
                              0 reactions
                              Last Post seqadmin  
                              Working...