Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Deploying a local version of blast, having issues

    Compiling ncbi-blast itself appears to have gone fine. Where I have issues with is multi-partite blast databases. The error messages are worrying, but maybe it's not a big deal after all?

    For instance:
    Code:
    update_blastdb.pl nt
    Connected to NCBI
    Downloading nt (16 volumes) ...
    Downloading nt.00.tar.gz... [OK]
    [...]
    Untarring iteratively, from 1->15 (tar zxvpf nt.$N.tar.gz, based on the example from NCBI's website, where tar zxvpf on a 16S microbial)
    Code:
    nt.nal
    nt.00.nhd
    nt.00.nhi
    ....
    Followed by blastdbcheck

    Code:
    blastdbcheck -db nt
    Writing messages to <stdout> at verbosity (Summary)
    ISAM testing is ENABLED.
    Legacy testing is DISABLED.
    By default, testing 200 randomly sampled OIDs.
    
    Testing 16 volume(s).
      /data5/Programs/blast_db/nt.00 / MetaData:   [ERROR] caught exception.
      /data5/Programs/blast_db/nt.01 / MetaData:   [ERROR] caught exception.
      /data5/Programs/blast_db/nt.02 / MetaData:   [ERROR] caught exception.
      /data5/Programs/blast_db/nt.03 / MetaData:   [ERROR] caught exception.
      /data5/Programs/blast_db/nt.04 / MetaData:   [ERROR] caught exception.
      /data5/Programs/blast_db/nt.05 / MetaData:   [ERROR] caught exception.
      /data5/Programs/blast_db/nt.06 / MetaData:   [ERROR] caught exception.
      /data5/Programs/blast_db/nt.07 / MetaData:   [ERROR] caught exception.
      /data5/Programs/blast_db/nt.08 / MetaData:   [ERROR] caught exception.
      /data5/Programs/blast_db/nt.09 / MetaData:   [ERROR] caught exception.
      /data5/Programs/blast_db/nt.10 / MetaData:   [ERROR] caught exception.
      /data5/Programs/blast_db/nt.11 / MetaData:   [ERROR] caught exception.
      /data5/Programs/blast_db/nt.12 / MetaData:   [ERROR] caught exception.
      /data5/Programs/blast_db/nt.13 / MetaData:   [ERROR] caught exception.
      /data5/Programs/blast_db/nt.14 / MetaData:   [ERROR] caught exception.
      /data5/Programs/blast_db/nt.15 / MetaData:   [ERROR] caught exception.
     Result=FAILURE. 16 errors reported in 16 volume(s).
    Testing 1 alias(es).
     Result=SUCCESS. No errors reported for 1 alias(es).
    
    Total errors: 16
    Edit: Though if I ignore the error message

    Code:
    blastn -db nt
    BLASTN 2.2.27+
    
    
    Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
    Miller (2000), "A greedy algorithm for aligning DNA sequences", J
    Comput Biol 2000; 7(1-2):203-14.
    
    
    
    Database: Nucleotide collection (nt)
               20,064,200 sequences; 50,694,274,412 total letters
    
    
    BLAST engine error: Empty CBlastQueryVector
    Does this mean working as intended? (...and there is indeed a newer version of ncbi blast, 2.2.28+, now that I think about it). But strangely, nr (through the same workflow) gives me the following error message, even though both are extracted to the same path.

    Code:
    blastn -db nr
    BLAST Database error: No alias or index file found for nucleotide database [nr] in search path [/data5/Programs/blast_db:/Programs/blastdb::]
    Last edited by winsettz; 10-25-2013, 06:28 AM.

  • #2
    blastn -db nr throws an error because blastn is for nucleotides and nr is a protein db. blastp -db nr would probably display stats on your local nr..

    It's right there in your error message:

    Code:
    No alias or index file found for [B]nucleotide[/B] database [nr]
    Last edited by rhinoceros; 10-25-2013, 09:12 AM.
    savetherhino.org

    Comment


    • #3
      Originally posted by rhinoceros View Post
      blastn -db nr throws an error because blastn is for nucleotides and nr is a protein db. blastp -db nr would probably display stats on your local nr..
      How silly of me. Thought nr was non redundant nucleotide. I will test soon.

      Any thoughts on the blastdbcheck messages?

      Comment


      • #4
        Originally posted by winsettz View Post
        Any thoughts on the blastdbcheck messages?
        I only have local nr on this computer. Output of blastdbcheck -db nr looks very different, final three lines are:

        Code:
         Result=SUCCESS. No errors reported for 11 volume(s).
        Testing 1 alias(es).
         Result=SUCCESS. No errors reported for 1 alias(es).
        I don't know. Are you sure "tar zxvpf nt.$N.tar.gz" does what you think it does (not only the $ but also the -p flag)? I always write some simple for loop if I need to extract a bunch of files. What does your db folder look like (ls -la)?
        Last edited by rhinoceros; 10-25-2013, 09:29 AM.
        savetherhino.org

        Comment


        • #5
          NCBI is probably partly responsible since they refer to "Nucleotide collection (nr/nt)" in many blast related things.

          Comment


          • #6
            Originally posted by rhinoceros View Post
            I only have local nr on this computer. Output of blastdbcheck -db nr looks very different, final three lines are:

            Code:
             Result=SUCCESS. No errors reported for 11 volume(s).
            Testing 1 alias(es).
             Result=SUCCESS. No errors reported for 1 alias(es).
            I don't know. Are you sure "tar zxvpf nt.$N.tar.gz" does what you think it does (not only the $ but also the -p flag)? I always write some simple for loop if I need to extract a bunch of files. What does your db folder look like (ls -la)?
            Indeed, my code is
            Code:
            update_blastdb.pl nt
            for N in {00..13}; do tar zxvpf nr.$N.tar.gz; done
            blastp -db nr
            Getting
            Code:
            blastp -db nr
            BLASTP 2.2.27+
            
            [...]
            Database: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF
            excluding environmental samples from WGS projects
                       33,037,292 sequences; 11,526,254,694 total letters
            
            
            
              Database: All non-redundant GenBank CDS translations+PDB+SwissProt+PIR+PRF
            excluding environmental samples from WGS projects
                Posted date:  Oct 12, 2013  4:14 AM
              Number of letters in database: 11,526,254,694
              Number of sequences in database:  33,037,292
            
            
            
            Matrix: BLOSUM62
            Gap Penalties: Existence: 11, Extension: 1
            Neighboring words threshold: 11
            Window for multiple hits: 40
            Looks like I have nr and nt databases operational. Not sure what the glitch was, and still not sure why blastdbcheck is putting out strange messages.

            Code:
            blastdbcheck -db nr
            Writing messages to <stdout> at verbosity (Summary)
            ISAM testing is ENABLED.
            Legacy testing is DISABLED.
            By default, testing 200 randomly sampled OIDs.
            
            Testing 14 volume(s).
              /data5/Programs/blast_db/nr.00 / MetaData:   [ERROR] caught exception.
            ...

            Comment


            • #7
              Hi All,

              I have come accross the same error with the blastdb_check command.
              It seems that it is caused by the missing taxonomy reference files.

              In my case, I was issuing
              Code:
              $ blastdbcheck -db refseq_protein
              and got
              Code:
              Writing messages to <stdout> at verbosity (Summary)
              ISAM testing is ENABLED.
              Legacy testing is DISABLED.
              TaxID testing is DISABLED.
              By default, testing 200 randomly sampled OIDs.
              
              Testing 4 volume(s).
                /usr/share/ncbi-blast/db/refseq_protein.00 / MetaData:   [ERROR] caught exception.
                /usr/share/ncbi-blast/db/refseq_protein.01 / MetaData:   [ERROR] caught exception.
                /usr/share/ncbi-blast/db/refseq_protein.02 / MetaData:   [ERROR] caught exception.
              When increasing the verbosity level
              Code:
              $ blastdbcheck -db refseq_protein -verbosity 4
              I got this
              Code:
              Writing messages to <stdout> at verbosity (Minutiae)
              ISAM testing is ENABLED.
              Legacy testing is DISABLED.
              TaxID testing is DISABLED.
              By default, testing 200 randomly sampled OIDs.
              
              Testing 4 volume(s).
               /usr/share/ncbi-blast/db/refseq_protein.00
                /usr/share/ncbi-blast/db/refseq_protein.00 / MetaData:   [ERROR] caught exception.
                /usr/share/ncbi-blast/db/refseq_protein.00 / MetaData: NCBI C++ Exception:
                  "/home/coremake/release_build/build/PrepareRelease_Linux32-Centos_JSID_01_69_130.14.18.6_9056__PrepareRelease_Linux32-Centos_1386773752/c++/compilers/unix/../../src/objtools/blast/seqdb_reader/seqdbimpl.cpp", line 1374: Error: BLASTDB::ncbi::CSeqDBImpl::GetTaxInfo() - Specified taxid was not found.
              
                /usr/share/ncbi-blast/db/refseq_protein.00 / Sample: <testing 200 randomly selected OIDs (200 unique)>
                /usr/share/ncbi-blast/db/refseq_protein.00 / Sample:       Status for OID 26856: PASS
                /usr/share/ncbi-blast/db/refseq_protein.00 / Sample:       Status for OID 71563: PASS
                /usr/share/ncbi-blast/db/refseq_protein.00 / Sample:       Status for OID 73841: PASS
                /usr/share/ncbi-blast/db/refseq_protein.00 / Sample:       Status for OID 80646: PASS
                /usr/share/ncbi-blast/db/refseq_protein.00 / Sample:       Status for OID 138349: PASS
              (many more lines ...)
              So it seems that this is causing the error
              Code:
              Error: BLASTDB::ncbi::CSeqDBImpl::GetTaxInfo() - Specified taxid was not found.
              After adding the taxonomy database the error is gone
              Code:
              $ blastdbcheck -db refseq_protein
              Writing messages to <stdout> at verbosity (Summary)
              ISAM testing is ENABLED.
              Legacy testing is DISABLED.
              TaxID testing is DISABLED.
              By default, testing 200 randomly sampled OIDs.
              
              Testing 4 volume(s).
               Result=SUCCESS. No errors reported for 4 volume(s).
              Testing 1 alias(es).
               Result=SUCCESS. No errors reported for 1 alias(es).
              To the add the taxonomy database I downloaded the file
              ftp://ftp.ncbi.nlm.nih.gov/blast/db/taxdb.tar.gz
              and extracted the two files taxdb.btd and taxdb.bti into the BLASTDB directory.
              Code:
              $ tar xzf taxdb.tar.gz
              $ sudo mv taxdb.btd taxdb.bti /usr/share/ncbi-blast/db
              You may need different commands in your system. More info here

              Note that
              Code:
              $ update_blastdb.pl taxdb
              only downloads the tar.gz file. It does not extract the needed files.

              Hope it helps

              Carlos

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Latest Developments in Precision Medicine
                by seqadmin



                Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                Somatic Genomics
                “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                05-24-2024, 01:16 PM
              • seqadmin
                Recent Advances in Sequencing Analysis Tools
                by seqadmin


                The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                05-06-2024, 07:48 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 05-24-2024, 07:15 AM
              0 responses
              16 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-23-2024, 10:28 AM
              0 responses
              18 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-23-2024, 07:35 AM
              0 responses
              22 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 05-22-2024, 02:06 PM
              0 responses
              11 views
              0 likes
              Last Post seqadmin  
              Working...
              X