Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fungal Genome Annotation

    I have a genome (fungal genome), it looks like-
    >contig1
    ATTAAATATACCCCACAAAATAGAGACAGAGACACATATTAA
    >contig2
    ATATCGAGAGAGGGCGCGCGCCGCGCGGCCGCGAGGAGAGTATA
    >contig3
    ATGCGCGATAGAGCTATATCTATCTCTCTATATAGAGA

    the genome is approx (50MB)
    i would to annotate the fungal genome, is there a simple and freely available server or tools which do that.

    I appreciate it.

    Best !
    Shashank

  • #2
    Unless most of your contigs are much longer and more complex than those, don't bother. Do you happen to know the N50? If not, you can calculate it with with my assembly stats tool:

    stats.sh contigs.fasta

    ...just post the results in this thread.

    Comment


    • #3
      Actually i am new to command line. anyways, when i used command

      qiime@qiime-VirtualBox:~/Desktop/bbmap$ stats.sh 454AllContigs.fna > new.txt
      A C G T N IUPAC Other GC GC_stdev
      0.2355 0.2637 0.2626 0.2382 0.0000 0.0000 0.0000 0.5263 0.0399

      Main genome scaffold total: 10457
      Main genome contig total: 10457
      Main genome scaffold sequence total: 17.071 MB
      Main genome contig sequence total: 17.071 MB 0.003% gap
      Main genome scaffold N/L50: 2961/1.977 KB
      Main genome contig N/L50: 2960/1.977 KB
      Max scaffold length: 11.239 KB
      Max contig length: 11.239 KB
      Number of scaffolds > 50 KB: 0
      % main genome in scaffolds > 50 KB: 0.00%


      Minimum Number Number Total Total Scaffold
      Scaffold of of Scaffold Contig Contig
      Length Scaffolds Contigs Length Length Coverage
      -------- -------------- -------------- -------------- -------------- --------
      All 10,457 10,457 17,071,168 17,070,688 100.00%
      50 10,457 10,457 17,071,168 17,070,688 100.00%
      100 10,457 10,457 17,071,168 17,070,688 100.00%
      250 10,004 10,004 16,994,544 16,994,072 100.00%
      500 9,421 9,421 16,780,800 16,780,339 100.00%
      1 KB 7,751 7,751 15,427,669 15,427,255 100.00%
      2.5 KB 1,668 1,668 5,688,859 5,688,774 100.00%
      5 KB 113 113 693,645 693,638 100.00%
      10 KB 4 4 42,301 42,301 100.00%
      Last edited by shashankgupta; 02-03-2015, 11:27 PM.

      Comment


      • #4
        That's close - I think the problem is the spaces in the path. Try this:

        bash stats.sh in="../Jitender Fungal genome/454AllContigs.fna"

        That should work. If not, you can copy the assembly into the local folder like this:
        cp file destination

        Comment


        • #5
          P.S. I changed the command (as mentioned above) and i believe i got the expected result.

          Comment


          • #6
            OK, that's not bad - most of the assembly is in fragments over 1900bp, which will give reasonable annotation. Unfortunately, if the genome is expected to be 50Mbp, you only assembled 17Mbp of it, or ~34%. If possible, I recommend trying different assemblers, different parameters, or different preprocessing to obtain the longest possible contigs and highest genome recovery possible before you start annotation.

            I don't know of a good, simple, standalone tool. This is the JGI's standard procedure:



            ...but I'm not directly involved in the annotation, and it looks quite complicated, using lots of different programs. They should all be free, though.

            Edit: Looking through that in more depth, it does not really look possible to replicate outside of JGI. Hopefully someone else will have a suggestion. I will recommend to the fungal team that they package their annotation pipeline in a Docker container, but that may take a few years
            Last edited by Brian Bushnell; 02-03-2015, 11:53 PM.

            Comment


            • #7
              sounds great !
              but i can't wait for few years
              the command i used stats.sh is done for the genome which is having 17 MB in it.
              i try JGI procedure, but it looks very complicated to me.

              Comment


              • #8
                If you do not mind to upload it to a server, you can use NCBI's eukaryotic annotation pipeline;



                If you just want to predict genes, go for Augustus:


                Or use MAKER for both predicting and annotating genes:

                Comment


                • #9
                  While the JGI SOP is a nice writeup on freely available tools, you'd need to gather some command line experience or team up with a skilled bioinformatician to run and merge all the results from these tools, until they release a complete installation package...

                  As an alternative you may want to try Augustus (http://bioinf.uni-greifswald.de/augustus/) to predict genes - they also offer Web submissions to their servers, if you are not skilled running tools on the command line.

                  For the downstream functional annotation you could run InterProScan on the resultant CDS or peptides (http://www.ebi.ac.uk/interpro/interproscan.html). I'd use the download version but there is a possibility to use their servers (with some limits) via web submission. The command line version is quite straightforward to use and integrates many complementary predicition and comparative tools.

                  Comment


                  • #10
                    Well i give it a try.
                    I tried Augustus server, i think server have some limitation about the maximum MB, mine is approx 17 MB. So uploading failed in the server.

                    i downloaded the AUGUSTUS, but i am not able to run it. in the tutorial i got stuck in point 3
                    i.e.

                    3. set environment variable AUGUSTUS_CONFIG_PATH

                    > export AUGUSTUS_CONFIG_PATH=/my_path_to_AUGUSTUS/augustus/config/

                    The program requires that the environment variable AUGUSTUS_CONFIG_PATH is set to the config directory that contains the
                    configuration and parameter files. This is the directory 'augustus/config'. You probably want to add this line to a startup script (like ~/.bashrc).
                    Alternatively, you can specify this directory on the command line when you run augustus:
                    --AUGUSTUS_CONFIG_PATH=/my_path_to_AUGUSTUS/augustus/config/
                    You may want to add the path of the executable to the PATH environment variable or copy augustus into a common directory (e.g. /usr/bin/).


                    Thanx

                    Comment


                    • #11
                      Where did you install Augustus on your machine (i.e. under which path)? Then execute the "export" command as indicated by the tutorial, replacing "my_path_to_AUGUSTUS/augustus" with the installation path...

                      Comment


                      • #12


                        How to upload in the server ? Does this do annotation for fungal genome ?

                        Comment


                        • #13
                          Augustus is installed in

                          /root/Desktop/augustus.2.5.5

                          Comment


                          • #14
                            Originally posted by shashankgupta View Post
                            http://www.ncbi.nlm.nih.gov/genome/a...n_euk/process/

                            How to upload in the server ? Does this do annotation for fungal genome ?
                            You need to submit your assembly first:
                            Some eukaryotic genome assemblies are annotated using the NCBI Eukaryotic Genome Annotation Pipeline (EGAP) and are included in RefSeq. They are chosen using the following criteria: Taxonomic scope: In scope: Vertebrates, higher plants, arthropods, and some other invertebrates. Out-of-scope: Fungi, nematodes, and protozoans. Assembly quality: Contiguity: Genomes assembled to the level of chromosomes, and genomes with high contig and scaffold N50 values are preferred.

                            Comment


                            • #15
                              Originally posted by shashankgupta View Post
                              Augustus is installed in

                              /root/Desktop/augustus.2.5.5
                              So did you try

                              Code:
                              export AUGUSTUS_CONFIG_PATH=/root/Desktop/augustus.2.5.5/augustus/config/
                              or

                              Code:
                              export AUGUSTUS_CONFIG_PATH=/root/Desktop/augustus.2.5.5/config/
                              (in case there is no "augustus" subfolder inside "augustus.2.5.5")

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM
                              • seqadmin
                                Strategies for Sequencing Challenging Samples
                                by seqadmin


                                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                                03-22-2024, 06:39 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              30 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              32 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 09:21 AM
                              0 responses
                              28 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-04-2024, 09:00 AM
                              0 responses
                              53 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X