Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • andreanna05
    Junior Member
    • Mar 2012
    • 6

    blast+ and pre-formatted databases

    Hi,

    I have been using blast+ for a little while now to make custom local databases from fasta files, and I'm thinking about downloading and using the GenBank pre-formatted nr database. Before committing hard drive space to the whole thing, I downloaded and unzipped the first directories (nr.00 - nr.02) to give it a try, but I'm having a hard time figuring out what exactly to do with these.

    I looked in the BLAST+ manual and the only pertinent section I could find just says this:

    "The NCBI makes databases that are searchable on the NCBI web site (such as nr, refseq_rna, and swissprot) available on its FTP site. It is better to download the preformatted databases rather than starting with FASTA. The databases on the FTP site contain taxonomic information for each sequence, include the identifier indices for lookups, and can be up to four times smaller than the FASTA. The original FASTA can be generated from the BLAST database using blastdbcmd."


    Thinking that each directory already contained a blast database, I tried the command:

    blastn -db nr.00 -query query.fa -out Results.out

    BLAST Database error: No alias or index file found for nucleotide database [nr.00] in search path [/Users/Username/Desktop/Example/NonRedundant_BLAST/nr.00::/usr/bin/ncbi-blast-2.2.28+/db:]


    After this I tried the command to make the directory using various input files:

    makeblastdb -in nr -input_type blastdb -dbtype nucl -parse_seqids -out NonRedundant -title "GenBank Non-redundant"

    makeblastdb -in nr.00 -input_type blastdb -dbtype nucl -parse_seqids -out NonRedundant -title "GenBank Non-redundant"

    makeblastdb -in nr.00.phd -input_type blastdb -dbtype nucl -parse_seqids -out NonRedundant -title "GenBank Non-redundant"

    and so on for each of the files in the directory. And I get the same error as above.


    I see that the nr.00 directory has a file called nr.pal that has these contents:

    #
    # Alias file created 12/08/2013 01:27:33
    #
    TITLE All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF excluding environmental samples from WGS projects
    DBLIST nr.00 nr.01 nr.02 nr.03 nr.04 nr.05 nr.06 nr.07 nr.08 nr.09 nr.10 nr.11 nr.12 nr.13 nr.14
    NSEQ 34869290
    LENGTH 12261267790

    Being the optimist, I tried to modify this file to just list nr.00 - nr.02 and had no luck (I know the NSEQ and LENGTH would be wrong but figured it was worth a shot).

    So, would I have to download the whole nr database in order to try it? What I really want is just the sequences from one model organism, but I don't see a species-specific pre-formatted blast database for it. And if I download the whole thing, then what? Should I put all of the files from each separately downloaded nr directory into one directory? And try to build a single database using the nr.pal file? I'm probably missing something super-obvious here, but I'm stuck.

    Thanks,
    Andreanna
    Last edited by andreanna05; 12-12-2013, 07:25 AM.
  • mastal
    Senior Member
    • Mar 2009
    • 666

    #2
    Have you read the online documentation about the pre-formatted
    blast databases:

    ftp://ftp.ncbi.nlm.nih.gov/blast/documents/blastdb.html

    The preformatted database files are already formatted, so you don't
    need to run makeblastdb.

    Comment

    • LeightonP
      Member
      • Feb 2011
      • 29

      #3
      Originally posted by andreanna05
      What I really want is just the sequences from one model organism, but I don't see a species-specific pre-formatted blast database for it.
      Your best option then is to download the sequences from that model organism, and use makeblastdb to construct a BLAST database from them.

      You can find the makeblastdb documentation here: http://nebc.nerc.ac.uk/bioinformatic...keblastdb.html

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        #4
        Two possible options to consider if you are only interested in creating a db of sequences from a specific organism. In either case you can create your own blast db (makeblastdb) once you get the sequences together.

        1. If you are not averse to downloading files (there are multiple) for the nr blast index than you could use the blastdbcmd command to extract sequences specific to your organism. Look for the section on extracting sequences using blastdbcmd in this manual: http://www.ncbi.nlm.nih.gov/books/NBK1763/

        From NCBI:
        Extract all human sequences from the nr database

        Although one cannot select GIs by taxonomy from a database, a combination of unix command line tools will accomplish this:

        $ blastdbcmd -db nr -entry all -outfmt "%g %T" | \
        awk ' { if ($2 == 9606) { print $1 } } ' | \
        blastdbcmd -db nr -entry_batch - -out human_sequences.txt
        2. You could also use NCBI eutils to perform a query to get the sequence data you need. Manual for that is here: http://www.ncbi.nlm.nih.gov/books/NBK1058/
        Application #3 retrieving large datasets may work.

        Comment

        Latest Articles

        Collapse

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by SEQadmin2, Today, 06:09 AM
        0 responses
        9 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-09-2026, 11:58 AM
        0 responses
        33 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-05-2026, 10:09 AM
        0 responses
        38 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-04-2026, 08:59 AM
        0 responses
        43 views
        0 reactions
        Last Post SEQadmin2  
        Working...