Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • augustus gene finder

    Hello fellows,

    I've been trying to train Augustus with Anolis Carolinensis dataset (available through ENSEMBL website). I've downloaded genbank files and run the command etraining, specifying the training file and the species name for which to train. I'm getting the following error message:


    mRNA contains character c
    GBProcessor::getGeneList(): GBProcessor::getJoin( ): failed!!!
    Encountered error after reading 0 annotations.

    etraining: ERROR
    No genbank sequences found.


    Has any of you faced anything similar to that?

    This is how the genbank file looks like: http://s13.postimg.org/nlg8peh6f/Cap...2_20_58_58.png

    I've been trying to find a solution for almost 3 hours now and nothing.
    Any help will be greatly appreciated!

    Thank you in advance!

  • #2
    Based on the example data provided it appears that Augustus may be expecting the training sequences to be in this format:

    Code:
    LOCUS       HS04636   9453 bp  DNA
    FEATURES             Location/Qualifiers
         source          1..9453
         CDS             join(966..1017,1818..1934,2055..2198,2852..2995,3426..3607,
                         4340..4423,4543..4789,5072..5358,5860..6007,6494..6903)
    BASE COUNT     2937 a   1716 c  1710 g   3090 t
    ORIGIN
            1 gagctcacat taactattta cagggtaact gcttaggacc agtattatga ggagaattta
           61 cctttcccgc ctctctttcc aagaaacaag gagggggtga aggtacggag aacagtattt
          121 cttctgttga aagcaactta gctacaaaga taaattacag ctatgtacac tgaaggtagc
          181 tatttcattc cacaaaataa gagtttttta aaaagctatg tatgtatgtg ctgcatatag
          241 agcagatata cagcctatta agcgtcgtca ctaaaacata aaacatgtca gcctttctta
          301 accttactcg ccccagtctg tcccgacgtg acttcctcga ccctctaaag acgtacagac
          361 cagacacggc ggcggcggcg ggagagggga ttccctgcgc ccccggacct cagggccgct
          421 cagattcctg gagaggaagc caagtgtcct tctgccctcc cccggtatcc catccaaggc
          481 gatcagtcca gaactggctc tcggaagcgc tcgggcaaag actgcgaaga agaaaagaca
          541 tctggcggaa acctgtgcgc ctggggcggt ggaactcggg gaggagaggg agggatcaga
    
    so on to the next record
    
         9241 acactgttca ctgttttttt taaaaaaaaa acttgatttg ttattaacat tgatctgctg
         9301 acaaaacctg ggaatttggg ttgtgtatgc gaatgtttca gtgcctcaga caaatgtgta
         9361 tttaacttat gtaaaagata agtctggaaa taaatgtctg tttatttttg tactatttaa
         9421 aaaaaaaaaa aaaaatcgat gtcgactcga gtc
    //
    LOCUS       HS08198   2344 bp  DNA
    FEATURES             Location/Qualifiers
         source          1..2344
         CDS             join(445..582,758..894,1053..1123,1208..1315,1587..1688,177
                         2..1810,1890..1903)
    BASE COUNT     400 a   730 c  778 g   436 t
    ORIGIN
            1 agcgggcggc ggtcgtgggc ggggttgcag gcgaggctca acgaacgctg gtctgaccgt
           61 cggcgctccc tgttgccggg ccctgagcaa gtggcttcat gaaccccgtg acgttggcca
          121 tggagataag accactgggt gatggtttaa ggaagataac gtgtaaaggg ctaaggactg
          181 tcggtggaaa tcaggggtgc aggagaaatg gataaacagc cagaggtcaa ctcggacttt

    Comment


    • #3
      GenoMax, do you know any tool I could use to convert from one format to the one in the example?

      Thanks.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM
      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Today, 08:47 AM
      0 responses
      12 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      60 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      59 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      54 views
      0 likes
      Last Post seqadmin  
      Working...
      X