Header Leaderboard Ad

Collapse

Deploying a local version of blast, having issues

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Deploying a local version of blast, having issues

    Compiling ncbi-blast itself appears to have gone fine. Where I have issues with is multi-partite blast databases. The error messages are worrying, but maybe it's not a big deal after all?

    For instance:
    Code:
    update_blastdb.pl nt
    Connected to NCBI
    Downloading nt (16 volumes) ...
    Downloading nt.00.tar.gz... [OK]
    [...]
    Untarring iteratively, from 1->15 (tar zxvpf nt.$N.tar.gz, based on the example from NCBI's website, where tar zxvpf on a 16S microbial)
    Code:
    nt.nal
    nt.00.nhd
    nt.00.nhi
    ....
    Followed by blastdbcheck

    Code:
    blastdbcheck -db nt
    Writing messages to <stdout> at verbosity (Summary)
    ISAM testing is ENABLED.
    Legacy testing is DISABLED.
    By default, testing 200 randomly sampled OIDs.
    
    Testing 16 volume(s).
      /data5/Programs/blast_db/nt.00 / MetaData:   [ERROR] caught exception.
      /data5/Programs/blast_db/nt.01 / MetaData:   [ERROR] caught exception.
      /data5/Programs/blast_db/nt.02 / MetaData:   [ERROR] caught exception.
      /data5/Programs/blast_db/nt.03 / MetaData:   [ERROR] caught exception.
      /data5/Programs/blast_db/nt.04 / MetaData:   [ERROR] caught exception.
      /data5/Programs/blast_db/nt.05 / MetaData:   [ERROR] caught exception.
      /data5/Programs/blast_db/nt.06 / MetaData:   [ERROR] caught exception.
      /data5/Programs/blast_db/nt.07 / MetaData:   [ERROR] caught exception.
      /data5/Programs/blast_db/nt.08 / MetaData:   [ERROR] caught exception.
      /data5/Programs/blast_db/nt.09 / MetaData:   [ERROR] caught exception.
      /data5/Programs/blast_db/nt.10 / MetaData:   [ERROR] caught exception.
      /data5/Programs/blast_db/nt.11 / MetaData:   [ERROR] caught exception.
      /data5/Programs/blast_db/nt.12 / MetaData:   [ERROR] caught exception.
      /data5/Programs/blast_db/nt.13 / MetaData:   [ERROR] caught exception.
      /data5/Programs/blast_db/nt.14 / MetaData:   [ERROR] caught exception.
      /data5/Programs/blast_db/nt.15 / MetaData:   [ERROR] caught exception.
     Result=FAILURE. 16 errors reported in 16 volume(s).
    Testing 1 alias(es).
     Result=SUCCESS. No errors reported for 1 alias(es).
    
    Total errors: 16
    Edit: Though if I ignore the error message

    Code:
    blastn -db nt
    BLASTN 2.2.27+
    
    
    Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
    Miller (2000), "A greedy algorithm for aligning DNA sequences", J
    Comput Biol 2000; 7(1-2):203-14.
    
    
    
    Database: Nucleotide collection (nt)
               20,064,200 sequences; 50,694,274,412 total letters
    
    
    BLAST engine error: Empty CBlastQueryVector
    Does this mean working as intended? (...and there is indeed a newer version of ncbi blast, 2.2.28+, now that I think about it). But strangely, nr (through the same workflow) gives me the following error message, even though both are extracted to the same path.

    Code:
    blastn -db nr
    BLAST Database error: No alias or index file found for nucleotide database [nr] in search path [/data5/Programs/blast_db:/Programs/blastdb::]
    Last edited by winsettz; 10-25-2013, 06:28 AM.

  • #2
    blastn -db nr throws an error because blastn is for nucleotides and nr is a protein db. blastp -db nr would probably display stats on your local nr..

    It's right there in your error message:

    Code:
    No alias or index file found for [B]nucleotide[/B] database [nr]
    Last edited by rhinoceros; 10-25-2013, 09:12 AM.
    savetherhino.org

    Comment


    • #3
      Originally posted by rhinoceros View Post
      blastn -db nr throws an error because blastn is for nucleotides and nr is a protein db. blastp -db nr would probably display stats on your local nr..
      How silly of me. Thought nr was non redundant nucleotide. I will test soon.

      Any thoughts on the blastdbcheck messages?

      Comment


      • #4
        Originally posted by winsettz View Post
        Any thoughts on the blastdbcheck messages?
        I only have local nr on this computer. Output of blastdbcheck -db nr looks very different, final three lines are:

        Code:
         Result=SUCCESS. No errors reported for 11 volume(s).
        Testing 1 alias(es).
         Result=SUCCESS. No errors reported for 1 alias(es).
        I don't know. Are you sure "tar zxvpf nt.$N.tar.gz" does what you think it does (not only the $ but also the -p flag)? I always write some simple for loop if I need to extract a bunch of files. What does your db folder look like (ls -la)?
        Last edited by rhinoceros; 10-25-2013, 09:29 AM.
        savetherhino.org

        Comment


        • #5
          NCBI is probably partly responsible since they refer to "Nucleotide collection (nr/nt)" in many blast related things.

          Comment


          • #6
            Originally posted by rhinoceros View Post
            I only have local nr on this computer. Output of blastdbcheck -db nr looks very different, final three lines are:

            Code:
             Result=SUCCESS. No errors reported for 11 volume(s).
            Testing 1 alias(es).
             Result=SUCCESS. No errors reported for 1 alias(es).
            I don't know. Are you sure "tar zxvpf nt.$N.tar.gz" does what you think it does (not only the $ but also the -p flag)? I always write some simple for loop if I need to extract a bunch of files. What does your db folder look like (ls -la)?
            Indeed, my code is
            Code:
            update_blastdb.pl nt
            for N in {00..13}; do tar zxvpf nr.$N.tar.gz; done
            blastp -db nr
            Getting
            Code:
            blastp -db nr
            BLASTP 2.2.27+
            
            [...]
            Database: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF
            excluding environmental samples from WGS projects
                       33,037,292 sequences; 11,526,254,694 total letters
            
            
            
              Database: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF
            excluding environmental samples from WGS projects
                Posted date:  Oct 12, 2013  4:14 AM
              Number of letters in database: 11,526,254,694
              Number of sequences in database:  33,037,292
            
            
            
            Matrix: BLOSUM62
            Gap Penalties: Existence: 11, Extension: 1
            Neighboring words threshold: 11
            Window for multiple hits: 40
            Looks like I have nr and nt databases operational. Not sure what the glitch was, and still not sure why blastdbcheck is putting out strange messages.

            Code:
            blastdbcheck -db nr
            Writing messages to <stdout> at verbosity (Summary)
            ISAM testing is ENABLED.
            Legacy testing is DISABLED.
            By default, testing 200 randomly sampled OIDs.
            
            Testing 14 volume(s).
              /data5/Programs/blast_db/nr.00 / MetaData:   [ERROR] caught exception.
            ...

            Comment


            • #7
              Hi All,

              I have come accross the same error with the blastdb_check command.
              It seems that it is caused by the missing taxonomy reference files.

              In my case, I was issuing
              Code:
              $ blastdbcheck -db refseq_protein
              and got
              Code:
              Writing messages to <stdout> at verbosity (Summary)
              ISAM testing is ENABLED.
              Legacy testing is DISABLED.
              TaxID testing is DISABLED.
              By default, testing 200 randomly sampled OIDs.
              
              Testing 4 volume(s).
                /usr/share/ncbi-blast/db/refseq_protein.00 / MetaData:   [ERROR] caught exception.
                /usr/share/ncbi-blast/db/refseq_protein.01 / MetaData:   [ERROR] caught exception.
                /usr/share/ncbi-blast/db/refseq_protein.02 / MetaData:   [ERROR] caught exception.
              When increasing the verbosity level
              Code:
              $ blastdbcheck -db refseq_protein -verbosity 4
              I got this
              Code:
              Writing messages to <stdout> at verbosity (Minutiae)
              ISAM testing is ENABLED.
              Legacy testing is DISABLED.
              TaxID testing is DISABLED.
              By default, testing 200 randomly sampled OIDs.
              
              Testing 4 volume(s).
               /usr/share/ncbi-blast/db/refseq_protein.00
                /usr/share/ncbi-blast/db/refseq_protein.00 / MetaData:   [ERROR] caught exception.
                /usr/share/ncbi-blast/db/refseq_protein.00 / MetaData: NCBI C++ Exception:
                  "/home/coremake/release_build/build/PrepareRelease_Linux32-Centos_JSID_01_69_130.14.18.6_9056__PrepareRelease_Linux32-Centos_1386773752/c++/compilers/unix/../../src/objtools/blast/seqdb_reader/seqdbimpl.cpp", line 1374: Error: BLASTDB::ncbi::CSeqDBImpl::GetTaxInfo() - Specified taxid was not found.
              
                /usr/share/ncbi-blast/db/refseq_protein.00 / Sample: <testing 200 randomly selected OIDs (200 unique)>
                /usr/share/ncbi-blast/db/refseq_protein.00 / Sample:       Status for OID 26856: PASS
                /usr/share/ncbi-blast/db/refseq_protein.00 / Sample:       Status for OID 71563: PASS
                /usr/share/ncbi-blast/db/refseq_protein.00 / Sample:       Status for OID 73841: PASS
                /usr/share/ncbi-blast/db/refseq_protein.00 / Sample:       Status for OID 80646: PASS
                /usr/share/ncbi-blast/db/refseq_protein.00 / Sample:       Status for OID 138349: PASS
              (many more lines ...)
              So it seems that this is causing the error
              Code:
              Error: BLASTDB::ncbi::CSeqDBImpl::GetTaxInfo() - Specified taxid was not found.
              After adding the taxonomy database the error is gone
              Code:
              $ blastdbcheck -db refseq_protein
              Writing messages to <stdout> at verbosity (Summary)
              ISAM testing is ENABLED.
              Legacy testing is DISABLED.
              TaxID testing is DISABLED.
              By default, testing 200 randomly sampled OIDs.
              
              Testing 4 volume(s).
               Result=SUCCESS. No errors reported for 4 volume(s).
              Testing 1 alias(es).
               Result=SUCCESS. No errors reported for 1 alias(es).
              To the add the taxonomy database I downloaded the file
              ftp://ftp.ncbi.nlm.nih.gov/blast/db/taxdb.tar.gz
              and extracted the two files taxdb.btd and taxdb.bti into the BLASTDB directory.
              Code:
              $ tar xzf taxdb.tar.gz
              $ sudo mv taxdb.btd taxdb.bti /usr/share/ncbi-blast/db
              You may need different commands in your system. More info here
              http://www.ncbi.nlm.nih.gov/books/NB...l_Quick_start_
              Note that
              Code:
              $ update_blastdb.pl taxdb
              only downloads the tar.gz file. It does not extract the needed files.

              Hope it helps

              Carlos

              Comment

              Working...
              X