greetings, i apologize if this is a naive question, but i'm having a difficult time figuring out which directory and file contain the most current release of the RefSeq database. I have accessed NCBI-RefSeq on a unix terminal, and can see several different directories once inside the release directory:
ftp> ls
227 Entering Passive Mode (130,14,29,30,234,42)
150 Opening ASCII mode data connection for file list
-r--r--r-- 1 ftp anonymous 2103 Jan 11 14:33 NOTICE_OF_FILE_FORMAT_CHANGE
-r--r--r-- 1 ftp anonymous 4444 Jan 11 14:33 README
dr-xr-xr-x 2 ftp anonymous 1417216 Jan 14 18:15 complete
dr-xr-xr-x 2 ftp anonymous 8192 Jan 11 21:04 fungi
dr-xr-xr-x 2 ftp anonymous 12288 Jan 11 16:58 invertebrate
dr-xr-xr-x 2 ftp anonymous 991232 Jan 14 18:10 microbial
dr-xr-xr-x 2 ftp anonymous 4096 Jan 11 14:37 mitochondrion
dr-xr-xr-x 2 ftp anonymous 8192 Jan 11 14:37 plant
dr-xr-xr-x 2 ftp anonymous 4096 Jan 11 20:59 plasmid
dr-xr-xr-x 2 ftp anonymous 4096 Jan 11 16:14 plastid
dr-xr-xr-x 2 ftp anonymous 8192 Jan 11 16:53 protozoa
dr-xr-xr-x 3 ftp anonymous 4096 Jan 14 15:07 release-catalog
dr-xr-xr-x 2 ftp anonymous 4096 Jan 11 14:33 release-error-notice
dr-xr-xr-x 3 ftp anonymous 4096 Jan 14 18:18 release-notes
dr-xr-xr-x 3 ftp anonymous 4096 Jan 25 15:26 release-statistics
dr-xr-xr-x 2 ftp anonymous 61440 Jan 11 16:46 vertebrate_mammalian
dr-xr-xr-x 2 ftp anonymous 8192 Jan 11 14:41 vertebrate_other
dr-xr-xr-x 2 ftp anonymous 4096 Jan 11 16:49 viral
226 Transfer complete
Each of these group/species directories contains a large amount of nucleotide and protein files. However, I was under the impression that there would be a single, large .gz file, of all of these groups/species. Does this exist, or does each directory need to be downloaded independently, and then concatenated? I am most interested in the microbial data-set. I looked inside this directory, and found hundreds of files. Some with like the this;
microbial.80.protein.gpff.gz
Does this file is not too large. Would downloading this, in addition to 1-79 constitute the most recent microbial RefSeq database? Thanks,
-Tony
ftp> ls
227 Entering Passive Mode (130,14,29,30,234,42)
150 Opening ASCII mode data connection for file list
-r--r--r-- 1 ftp anonymous 2103 Jan 11 14:33 NOTICE_OF_FILE_FORMAT_CHANGE
-r--r--r-- 1 ftp anonymous 4444 Jan 11 14:33 README
dr-xr-xr-x 2 ftp anonymous 1417216 Jan 14 18:15 complete
dr-xr-xr-x 2 ftp anonymous 8192 Jan 11 21:04 fungi
dr-xr-xr-x 2 ftp anonymous 12288 Jan 11 16:58 invertebrate
dr-xr-xr-x 2 ftp anonymous 991232 Jan 14 18:10 microbial
dr-xr-xr-x 2 ftp anonymous 4096 Jan 11 14:37 mitochondrion
dr-xr-xr-x 2 ftp anonymous 8192 Jan 11 14:37 plant
dr-xr-xr-x 2 ftp anonymous 4096 Jan 11 20:59 plasmid
dr-xr-xr-x 2 ftp anonymous 4096 Jan 11 16:14 plastid
dr-xr-xr-x 2 ftp anonymous 8192 Jan 11 16:53 protozoa
dr-xr-xr-x 3 ftp anonymous 4096 Jan 14 15:07 release-catalog
dr-xr-xr-x 2 ftp anonymous 4096 Jan 11 14:33 release-error-notice
dr-xr-xr-x 3 ftp anonymous 4096 Jan 14 18:18 release-notes
dr-xr-xr-x 3 ftp anonymous 4096 Jan 25 15:26 release-statistics
dr-xr-xr-x 2 ftp anonymous 61440 Jan 11 16:46 vertebrate_mammalian
dr-xr-xr-x 2 ftp anonymous 8192 Jan 11 14:41 vertebrate_other
dr-xr-xr-x 2 ftp anonymous 4096 Jan 11 16:49 viral
226 Transfer complete
Each of these group/species directories contains a large amount of nucleotide and protein files. However, I was under the impression that there would be a single, large .gz file, of all of these groups/species. Does this exist, or does each directory need to be downloaded independently, and then concatenated? I am most interested in the microbial data-set. I looked inside this directory, and found hundreds of files. Some with like the this;
microbial.80.protein.gpff.gz
Does this file is not too large. Would downloading this, in addition to 1-79 constitute the most recent microbial RefSeq database? Thanks,
-Tony
Comment