Seqanswers Leaderboard Ad

**Brian Bushnell** · 02-03-2015, 10:19 PM

Unless most of your contigs are much longer and more complex than those, don't bother. Do you happen to know the N50? If not, you can calculate it with with my assembly stats tool:

stats.sh contigs.fasta

...just post the results in this thread.

**shashankgupta** · 02-03-2015, 11:16 PM

Actually i am new to command line. anyways, when i used command

qiime@qiime-VirtualBox:~/Desktop/bbmap$ stats.sh 454AllContigs.fna > new.txt
A C G T N IUPAC Other GC GC_stdev
0.2355 0.2637 0.2626 0.2382 0.0000 0.0000 0.0000 0.5263 0.0399

Main genome scaffold total: 10457
Main genome contig total: 10457
Main genome scaffold sequence total: 17.071 MB
Main genome contig sequence total: 17.071 MB 0.003% gap
Main genome scaffold N/L50: 2961/1.977 KB
Main genome contig N/L50: 2960/1.977 KB
Max scaffold length: 11.239 KB
Max contig length: 11.239 KB
Number of scaffolds > 50 KB: 0
% main genome in scaffolds > 50 KB: 0.00%

Minimum Number Number Total Total Scaffold
Scaffold of of Scaffold Contig Contig
Length Scaffolds Contigs Length Length Coverage
-------- -------------- -------------- -------------- -------------- --------
All 10,457 10,457 17,071,168 17,070,688 100.00%
50 10,457 10,457 17,071,168 17,070,688 100.00%
100 10,457 10,457 17,071,168 17,070,688 100.00%
250 10,004 10,004 16,994,544 16,994,072 100.00%
500 9,421 9,421 16,780,800 16,780,339 100.00%
1 KB 7,751 7,751 15,427,669 15,427,255 100.00%
2.5 KB 1,668 1,668 5,688,859 5,688,774 100.00%
5 KB 113 113 693,645 693,638 100.00%
10 KB 4 4 42,301 42,301 100.00%

**Brian Bushnell** · 02-03-2015, 11:23 PM

That's close - I think the problem is the spaces in the path. Try this:

bash stats.sh in="../Jitender Fungal genome/454AllContigs.fna"

That should work. If not, you can copy the assembly into the local folder like this:
cp file destination

**shashankgupta** · 02-03-2015, 11:29 PM

P.S. I changed the command (as mentioned above) and i believe i got the expected result.

**Brian Bushnell** · 02-03-2015, 11:44 PM

OK, that's not bad - most of the assembly is in fragments over 1900bp, which will give reasonable annotation. Unfortunately, if the genome is expected to be 50Mbp, you only assembled 17Mbp of it, or ~34%. If possible, I recommend trying different assemblers, different parameters, or different preprocessing to obtain the longest possible contigs and highest genome recovery possible before you start annotation.

I don't know of a good, simple, standalone tool. This is the JGI's standard procedure:

http://genome.jgi-psf.org/programs/fungi/FungalGenomeAnnotationSOP.pdf

...but I'm not directly involved in the annotation, and it looks quite complicated, using lots of different programs. They should all be free, though.

Edit: Looking through that in more depth, it does not really look possible to replicate outside of JGI. Hopefully someone else will have a suggestion. I will recommend to the fungal team that they package their annotation pipeline in a Docker container, but that may take a few years

**shashankgupta** · 02-04-2015, 12:33 AM

sounds great !
but i can't wait for few years

the command i used stats.sh is done for the genome which is having 17 MB in it.
i try JGI procedure, but it looks very complicated to me.

**boetsie** · 02-04-2015, 01:46 AM

If you do not mind to upload it to a server, you can use NCBI's eukaryotic annotation pipeline;

The NCBI Eukaryotic Genome Annotation Pipeline

http://www.ncbi.nlm.nih.gov/genome/annotation_euk/process/

If you just want to predict genes, go for Augustus:

Augustus: gene prediction

http://bioinf.uni-greifswald.de/augustus/

Or use MAKER for both predicting and annotating genes:

Yandell Lab - Software - MAKER

http://www.yandell-lab.org/software/maker.html

**sarvidsson** · 02-04-2015, 01:51 AM

While the JGI SOP is a nice writeup on freely available tools, you'd need to gather some command line experience or team up with a skilled bioinformatician to run and merge all the results from these tools, until they release a complete installation package...

As an alternative you may want to try Augustus (http://bioinf.uni-greifswald.de/augustus/) to predict genes - they also offer Web submissions to their servers, if you are not skilled running tools on the command line.

For the downstream functional annotation you could run InterProScan on the resultant CDS or peptides (http://www.ebi.ac.uk/interpro/interproscan.html). I'd use the download version but there is a possibility to use their servers (with some limits) via web submission. The command line version is quite straightforward to use and integrates many complementary predicition and comparative tools.

**shashankgupta** · 02-04-2015, 01:58 AM

Well i give it a try.
I tried Augustus server, i think server have some limitation about the maximum MB, mine is approx 17 MB. So uploading failed in the server.

i downloaded the AUGUSTUS, but i am not able to run it. in the tutorial i got stuck in point 3
i.e.

3. set environment variable AUGUSTUS_CONFIG_PATH

> export AUGUSTUS_CONFIG_PATH=/my_path_to_AUGUSTUS/augustus/config/

The program requires that the environment variable AUGUSTUS_CONFIG_PATH is set to the config directory that contains the
configuration and parameter files. This is the directory 'augustus/config'. You probably want to add this line to a startup script (like ~/.bashrc).
Alternatively, you can specify this directory on the command line when you run augustus:
--AUGUSTUS_CONFIG_PATH=/my_path_to_AUGUSTUS/augustus/config/
You may want to add the path of the executable to the PATH environment variable or copy augustus into a common directory (e.g. /usr/bin/).

Thanx

**sarvidsson** · 02-04-2015, 02:01 AM

Where did you install Augustus on your machine (i.e. under which path)? Then execute the "export" command as indicated by the tutorial, replacing "my_path_to_AUGUSTUS/augustus" with the installation path...

**shashankgupta** · 02-04-2015, 02:02 AM

The NCBI Eukaryotic Genome Annotation Pipeline

http://www.ncbi.nlm.nih.gov/genome/annotation_euk/process/

How to upload in the server ? Does this do annotation for fungal genome ?

**shashankgupta** · 02-04-2015, 02:03 AM

Augustus is installed in

/root/Desktop/augustus.2.5.5

**sarvidsson** · 02-04-2015, 02:05 AM

Originally posted by shashankgupta View Post

http://www.ncbi.nlm.nih.gov/genome/a...n_euk/process/

How to upload in the server ? Does this do annotation for fungal genome ?

You need to submit your assembly first:

Genomes Selected for RefSeq Annotation

http://www.ncbi.nlm.nih.gov/genome/annotation_euk/policy/

Some eukaryotic genome assemblies are annotated using the NCBI Eukaryotic Genome Annotation Pipeline (EGAP) and are included in RefSeq. They are chosen using the following criteria: Taxonomic scope: In scope: Vertebrates, higher plants, arthropods, and some other invertebrates. Out-of-scope: Fungi, nematodes, and protozoans. Assembly quality: Contiguity: Genomes assembled to the level of chromosomes, and genomes with high contig and scaffold N50 values are preferred.

**sarvidsson** · 02-04-2015, 02:08 AM

Originally posted by shashankgupta View Post

Augustus is installed in

/root/Desktop/augustus.2.5.5

So did you try

Code:

export AUGUSTUS_CONFIG_PATH=/root/Desktop/augustus.2.5.5/augustus/config/

or

Code:

export AUGUSTUS_CONFIG_PATH=/root/Desktop/augustus.2.5.5/config/

(in case there is no "augustus" subfolder inside "augustus.2.5.5")

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 30 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 32 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 28 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 53 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

Fungal Genome Annotation

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News