I'm using UniRef100 which I downloaded from here:
ftp://ftp.ebi.ac.uk/pub/databases/un...ef100.fasta.gz
I wanted extended annotation, so I got the flatfile here on the same day:
ftp://ftp.ebi.ac.uk/pub/databases/un..._trembl.dat.gz
When I search the FASTA file I get hits to entries like this one:
UniRef100_Q6GZV6 Putative serine/threonine-protein kinase 019R n=1 Tax=Frog virus 3 (isolate Goorha) TaxID=654924 RepID=019R_FRG3G
When I search 'Q6GZV6' in that uniprot_trembl.dat file, there are no hits to that accession. It's simply not there. I thought maybe I was looking up some needed ID mapping, so I found this:
ftp://ftp.uniprot.org/pub/databases/...mapping.dat.gz
Searching that I only found self-referential identifiers related to UniRef100, or other identifiers which also weren't in the DAT file:
$ grep Q6GZV6 idmapping.dat
Q6GZV6 UniProtKB-ID 019R_FRG3G
Q6GZV6 Gene_ORFName FV3-019R
Q6GZV6 UniRef100 UniRef100_Q6GZV6
Q6GZV6 UniParc UPI00003B0FE5
Q6GZV6 EMBL AY548484
Q6GZV6 EMBL-CDS AAT09678.1
Q6GZV6 GeneID 2947739
Am I missing an authoritative file which map UniRef100 accessions to their annotations?
ftp://ftp.ebi.ac.uk/pub/databases/un...ef100.fasta.gz
I wanted extended annotation, so I got the flatfile here on the same day:
ftp://ftp.ebi.ac.uk/pub/databases/un..._trembl.dat.gz
When I search the FASTA file I get hits to entries like this one:
UniRef100_Q6GZV6 Putative serine/threonine-protein kinase 019R n=1 Tax=Frog virus 3 (isolate Goorha) TaxID=654924 RepID=019R_FRG3G
When I search 'Q6GZV6' in that uniprot_trembl.dat file, there are no hits to that accession. It's simply not there. I thought maybe I was looking up some needed ID mapping, so I found this:
ftp://ftp.uniprot.org/pub/databases/...mapping.dat.gz
Searching that I only found self-referential identifiers related to UniRef100, or other identifiers which also weren't in the DAT file:
$ grep Q6GZV6 idmapping.dat
Q6GZV6 UniProtKB-ID 019R_FRG3G
Q6GZV6 Gene_ORFName FV3-019R
Q6GZV6 UniRef100 UniRef100_Q6GZV6
Q6GZV6 UniParc UPI00003B0FE5
Q6GZV6 EMBL AY548484
Q6GZV6 EMBL-CDS AAT09678.1
Q6GZV6 GeneID 2947739
Am I missing an authoritative file which map UniRef100 accessions to their annotations?