Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Fungal Genome Annotation

    I have a genome (fungal genome), it looks like-
    >contig1
    ATTAAATATACCCCACAAAATAGAGACAGAGACACATATTAA
    >contig2
    ATATCGAGAGAGGGCGCGCGCCGCGCGGCCGCGAGGAGAGTATA
    >contig3
    ATGCGCGATAGAGCTATATCTATCTCTCTATATAGAGA

    the genome is approx (50MB)
    i would to annotate the fungal genome, is there a simple and freely available server or tools which do that.

    I appreciate it.

    Best !
    Shashank

  • #2
    Unless most of your contigs are much longer and more complex than those, don't bother. Do you happen to know the N50? If not, you can calculate it with with my assembly stats tool:

    stats.sh contigs.fasta

    ...just post the results in this thread.

    Comment


    • #3
      Actually i am new to command line. anyways, when i used command

      qiime@qiime-VirtualBox:~/Desktop/bbmap$ stats.sh 454AllContigs.fna > new.txt
      A C G T N IUPAC Other GC GC_stdev
      0.2355 0.2637 0.2626 0.2382 0.0000 0.0000 0.0000 0.5263 0.0399

      Main genome scaffold total: 10457
      Main genome contig total: 10457
      Main genome scaffold sequence total: 17.071 MB
      Main genome contig sequence total: 17.071 MB 0.003% gap
      Main genome scaffold N/L50: 2961/1.977 KB
      Main genome contig N/L50: 2960/1.977 KB
      Max scaffold length: 11.239 KB
      Max contig length: 11.239 KB
      Number of scaffolds > 50 KB: 0
      % main genome in scaffolds > 50 KB: 0.00%


      Minimum Number Number Total Total Scaffold
      Scaffold of of Scaffold Contig Contig
      Length Scaffolds Contigs Length Length Coverage
      -------- -------------- -------------- -------------- -------------- --------
      All 10,457 10,457 17,071,168 17,070,688 100.00%
      50 10,457 10,457 17,071,168 17,070,688 100.00%
      100 10,457 10,457 17,071,168 17,070,688 100.00%
      250 10,004 10,004 16,994,544 16,994,072 100.00%
      500 9,421 9,421 16,780,800 16,780,339 100.00%
      1 KB 7,751 7,751 15,427,669 15,427,255 100.00%
      2.5 KB 1,668 1,668 5,688,859 5,688,774 100.00%
      5 KB 113 113 693,645 693,638 100.00%
      10 KB 4 4 42,301 42,301 100.00%
      Last edited by shashankgupta; 02-03-2015, 11:27 PM.

      Comment


      • #4
        That's close - I think the problem is the spaces in the path. Try this:

        bash stats.sh in="../Jitender Fungal genome/454AllContigs.fna"

        That should work. If not, you can copy the assembly into the local folder like this:
        cp file destination

        Comment


        • #5
          P.S. I changed the command (as mentioned above) and i believe i got the expected result.

          Comment


          • #6
            OK, that's not bad - most of the assembly is in fragments over 1900bp, which will give reasonable annotation. Unfortunately, if the genome is expected to be 50Mbp, you only assembled 17Mbp of it, or ~34%. If possible, I recommend trying different assemblers, different parameters, or different preprocessing to obtain the longest possible contigs and highest genome recovery possible before you start annotation.

            I don't know of a good, simple, standalone tool. This is the JGI's standard procedure:



            ...but I'm not directly involved in the annotation, and it looks quite complicated, using lots of different programs. They should all be free, though.

            Edit: Looking through that in more depth, it does not really look possible to replicate outside of JGI. Hopefully someone else will have a suggestion. I will recommend to the fungal team that they package their annotation pipeline in a Docker container, but that may take a few years
            Last edited by Brian Bushnell; 02-03-2015, 11:53 PM.

            Comment


            • #7
              sounds great !
              but i can't wait for few years
              the command i used stats.sh is done for the genome which is having 17 MB in it.
              i try JGI procedure, but it looks very complicated to me.

              Comment


              • #8
                If you do not mind to upload it to a server, you can use NCBI's eukaryotic annotation pipeline;



                If you just want to predict genes, go for Augustus:


                Or use MAKER for both predicting and annotating genes:

                Comment


                • #9
                  While the JGI SOP is a nice writeup on freely available tools, you'd need to gather some command line experience or team up with a skilled bioinformatician to run and merge all the results from these tools, until they release a complete installation package...

                  As an alternative you may want to try Augustus (http://bioinf.uni-greifswald.de/augustus/) to predict genes - they also offer Web submissions to their servers, if you are not skilled running tools on the command line.

                  For the downstream functional annotation you could run InterProScan on the resultant CDS or peptides (http://www.ebi.ac.uk/interpro/interproscan.html). I'd use the download version but there is a possibility to use their servers (with some limits) via web submission. The command line version is quite straightforward to use and integrates many complementary predicition and comparative tools.

                  Comment


                  • #10
                    Well i give it a try.
                    I tried Augustus server, i think server have some limitation about the maximum MB, mine is approx 17 MB. So uploading failed in the server.

                    i downloaded the AUGUSTUS, but i am not able to run it. in the tutorial i got stuck in point 3
                    i.e.

                    3. set environment variable AUGUSTUS_CONFIG_PATH

                    > export AUGUSTUS_CONFIG_PATH=/my_path_to_AUGUSTUS/augustus/config/

                    The program requires that the environment variable AUGUSTUS_CONFIG_PATH is set to the config directory that contains the
                    configuration and parameter files. This is the directory 'augustus/config'. You probably want to add this line to a startup script (like ~/.bashrc).
                    Alternatively, you can specify this directory on the command line when you run augustus:
                    --AUGUSTUS_CONFIG_PATH=/my_path_to_AUGUSTUS/augustus/config/
                    You may want to add the path of the executable to the PATH environment variable or copy augustus into a common directory (e.g. /usr/bin/).


                    Thanx

                    Comment


                    • #11
                      Where did you install Augustus on your machine (i.e. under which path)? Then execute the "export" command as indicated by the tutorial, replacing "my_path_to_AUGUSTUS/augustus" with the installation path...

                      Comment


                      • #12


                        How to upload in the server ? Does this do annotation for fungal genome ?

                        Comment


                        • #13
                          Augustus is installed in

                          /root/Desktop/augustus.2.5.5

                          Comment


                          • #14
                            Originally posted by shashankgupta View Post
                            http://www.ncbi.nlm.nih.gov/genome/a...n_euk/process/

                            How to upload in the server ? Does this do annotation for fungal genome ?
                            You need to submit your assembly first:
                            RefSeq genome assemblies are selected from public data to represent organisms across the tree of life. In general, only the best quality genomes or genomes of the highest value to the scientific community are included in RefSeq for each species, but selection rules vary based on taxonomic superkingdom. Genes are annotated on all RefSeq genome assemblies, either at NCBI, or by the assembly submitters. Vertebrates, Higher Plants, Arthropods, and some Invertebrates For these organism groups, the genome assemblies in RefSeq are annotated using the NCBI Eukaryotic Genome Annotation Pipeline (EGAP).

                            Comment


                            • #15
                              Originally posted by shashankgupta View Post
                              Augustus is installed in

                              /root/Desktop/augustus.2.5.5
                              So did you try

                              Code:
                              export AUGUSTUS_CONFIG_PATH=/root/Desktop/augustus.2.5.5/augustus/config/
                              or

                              Code:
                              export AUGUSTUS_CONFIG_PATH=/root/Desktop/augustus.2.5.5/config/
                              (in case there is no "augustus" subfolder inside "augustus.2.5.5")

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Exploring the Dynamics of the Tumor Microenvironment
                                by seqadmin




                                The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                                07-08-2024, 03:19 PM
                              • seqadmin
                                Exploring Human Diversity Through Large-Scale Omics
                                by seqadmin


                                In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                                06-25-2024, 06:43 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 07-19-2024, 07:20 AM
                              0 responses
                              25 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 07-16-2024, 05:49 AM
                              0 responses
                              40 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 07-15-2024, 06:53 AM
                              0 responses
                              45 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 07-10-2024, 07:30 AM
                              0 responses
                              42 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X