Hi,
For a completly novel genome I used a gene prediction software called SNAP (http://korflab.ucdavis.edu/software.html) and it provided me a GFF and Fasta file (see below).
SNAP's GFF:
SNAP's fasta:
In next step I used the SNAP's fasta file to Blastp against UniRef90 and I have got the following XML file:
How would be possible to generate a GFF3 or GTF file with annotations for SNPeff (http://snpeff.sourceforge.net/)?
Is there any good annotation pipeline avaible?
Thank you in advance.
For a completly novel genome I used a gene prediction software called SNAP (http://korflab.ucdavis.edu/software.html) and it provided me a GFF and Fasta file (see below).
SNAP's GFF:
Code:
XA8 SNAP Einit 6161 7325 -5.800 - . XA8r-snap.4 XA8 SNAP Eterm 5974 6008 5.650 - . XA8r-snap.4
Code:
>XA8-snap.4 MAAHPPTLLDRAYGVNNIKSHIPIILDNNDHNYDAWRELLLTHCQSFEVA GHLDGTLLPTDDNDQLWIKRDGLVKLWLYGTISKDLFRSVFKTGGTSREI WTRIENYFRDNKEARAIRLDHELRNKTIGDLTIHAYRQDLKSISELLANV ESPVSERTLVTYMINGLSAKFDNIINVIMHRQPFPTFEQARSMLILEEER LNKGDKSPLVKDSPSSDKVLNVSATSQPPATTQQPQQQQRFYNNRGSKRN NRGRGRNYNNNQRPMYNQWGVPFWPNAYSFWGNQQQAPWGQQQFNNQGIL GPRPSQQAHQVQTQGQFPSAAPFVPTTDFASAFNTMTLTDPTDHQWYMDS GATAHLTNNPGNLKSILNTGTKQTVKVANGDIIPITKTGPSNSTDNSPQ*
Code:
<?xml version="1.0"?> <!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" "http://www.ncbi.nlm.nih.gov/dtd/NCBI_BlastOutput.dtd"> <BlastOutput> <BlastOutput_program>blastp</BlastOutput_program> <BlastOutput_version>BLASTP 2.2.26+</BlastOutput_version> <BlastOutput_reference>Stephen F. Altschul, Thomas L. Madden, Alejandro A. Sch&auml;ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), "Gapped BLAST and PSI-BLAST: a new generation of protein database search programs", Nucleic Acids Res. 25:3389-3402.</BlastOutput_reference> <BlastOutput_db>/sw/db/uniprot/uniref90</BlastOutput_db> <BlastOutput_query-ID>Query_1</BlastOutput_query-ID> <BlastOutput_query-def>XA8-snap.1</BlastOutput_query-def> <BlastOutput_query-len>96</BlastOutput_query-len> <BlastOutput_param> <Parameters> <Parameters_matrix>BLOSUM62</Parameters_matrix> <Parameters_expect>1e-05</Parameters_expect> <Parameters_gap-open>11</Parameters_gap-open> <Parameters_gap-extend>1</Parameters_gap-extend> <Parameters_filter>F</Parameters_filter> </Parameters> </BlastOutput_param> <BlastOutput_iterations> <Iteration> <Iteration_iter-num>4</Iteration_iter-num> <Iteration_query-ID>Query_4</Iteration_query-ID> <Iteration_query-def>XA8-snap.4</Iteration_query-def> <Iteration_query-len>400</Iteration_query-len> <Iteration_hits> <Hit> <Hit_num>1</Hit_num> <Hit_id>UR090:UniRef90_Q9FX16</Hit_id> <Hit_def>F12G12.10 protein n=1 Tax=Arabidopsis thaliana RepID=Q9FX16_ARATH</Hit_def> <Hit_accession>UR090:UniRef90_Q9FX16</Hit_accession> <Hit_len>308</Hit_len> <Hit_hsps> <Hsp> <Hsp_num>1</Hsp_num> <Hsp_bit-score>278.1</Hsp_bit-score> <Hsp_score>710</Hsp_score> <Hsp_evalue>1.87694e-87</Hsp_evalue> <Hsp_query-from>10</Hsp_query-from> <Hsp_query-to>290</Hsp_query-to> <Hsp_hit-from>8</Hsp_hit-from> <Hsp_hit-to>286</Hsp_hit-to> <Hsp_query-frame>0</Hsp_query-frame> <Hsp_hit-frame>0</Hsp_hit-frame> <Hsp_identity>146</Hsp_identity> <Hsp_positive>192</Hsp_positive> <Hsp_gaps>10</Hsp_gaps> <Hsp_align-len>285</Hsp_align-len> <Hsp_qseq>DRAYGVNNIKSHIPIILDNNDHNYDAWRELLLTHCQSFEVAGHLDGTLLPTDDNDQLWIKRDGLVKLWLYGTISKDLFRSVFKTGGTSREIWTRIENYFRDNKEARAIRLDHELRNKTIGDLTIHAYRQDLKSISELLANVESPVSERTLVTYMINGLSAKFDNIINVIMHRQPFPTFEQARSMLILEEERLNKGDK-SPLVKDSPSSDKVLNVSATSQPPATT-QQPQQQQRFYNNRGSKRN-NRGRGRNYNNNQRPMYNQWGV-PFWPNAYSFWGNQQQAPWG</Hsp_qseq> <Hsp_hseq>EQIYGVSNIKSHIPVMLDIEESNYDAWRELFLTHCLSFDVMGHIDGTLLPTNANDVNWQKRDGIVKLSLYGTLTPKQFQGSFVTSSTSRDIWLRIKNQFRNNKDARALRLDSELRTKDIGDMRVADYYRKMKKLADSLRNVDVPVTDRNLVMYVLNGLNPKFDNIINVIKHRQPFPSFDDAATMLQEEEDRLKRAIKPNPTHVDHSSSSTVL--ACSEAPPVTNFQRSGGNQMGYRGRGRGNNIFRGRGGRFSYYNMPTFNSWNRPPFYQNSYQMWNH----PWG</Hsp_hseq> <Hsp_midline>++ YGV+NIKSHIP++LD + NYDAWREL LTHC SF+V GH+DGTLLPT+ ND W KRDG+VKL LYGT++ F+ F T TSR+IW RI+N FR+NK+ARA+RLD ELR K IGD+ + Y + +K +++ L NV+ PV++R LV Y++NGL+ KFDNIINVI HRQPFP+F+ A +ML EE+RL + K +P D SS VL + + PP T Q+ Q Y RG N RGRG ++ P +N W PF+ N+Y W + PWG</Hsp_midline> </Hsp> </Hit_hsps> </Hit> ...
Is there any good annotation pipeline avaible?
Thank you in advance.