Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • gene_x
    Senior Member
    • May 2010
    • 108

    BLAST database related question

    Hi, all,
    I'm downloading nt database from BLAST here: ftp://ftp.ncbi.nlm.nih.gov/blast/db/

    These are splited to individual files from nt.00.tar.gz to nt.13.tar.gz. I wonder do I need to somehow put them together after downloading them individually?

    Like this?
    <code>
    cat nt.00 nt.01 ... nt.13
    </code>

    Or it doesn't matter whether I have a single file or multiple files?

    Also, what is the file .md5 accompanying each nt.*tar.gz file?

    Thanks.
  • rflrob
    Member
    • May 2010
    • 50

    #2
    If I recall correctly, you don't need to paste them together.

    The md5 files are checksums. If you're worried that the file didn't download properly, you can run the md5 program on your own computer (it's on most unixes) on the file, then check to make sure that it's the same as the number on the web. I have never needed to do this.

    Comment

    • GenoMax
      Senior Member
      • Feb 2008
      • 7142

      #3
      No you should not merge the files. They all need to be in the same directory though. Unless you are the worrying kind (or your network is not reliable) it may be ok to skip the md5sum checks since that could take some time on large files.

      You will to need to only provide the name of the database as in this minimal example (no numbers needed)

      Code:
      blastn –db nt –query query.fa –out results.out

      Comment

      • gene_x
        Senior Member
        • May 2010
        • 108

        #4
        Thanks for the replies!

        I have another question:
        In the README file here :ftp://ftp.ncbi.nlm.nih.gov/blast/db/README

        nr.*tar.gz | non-redundant protein sequence database with
        | entries from GenPept, Swissprot, PIR, PDF, PDB,
        | and NCBI RefSeq

        nt.*tar.gz | nucleotide sequence database, with entries
        | from all traditional divisions of GenBank,
        | EMBL, and DDBJ excluding bulk divisions (gss,
        | sts, pat, est, and htg divisions. wgs entries
        | are also excluded. Not non-redundant.
        So now nr refers to protein sequence now? I should use nt for DNA?

        Comment

        • GenoMax
          Senior Member
          • Feb 2008
          • 7142

          #5
          Originally posted by gene_x View Post
          Thanks for the replies!

          I have another question:
          In the README file here :ftp://ftp.ncbi.nlm.nih.gov/blast/db/README



          So now nr refers to protein sequence now? I should use nt for DNA?
          The answer is in the text you quoted in post #4.

          Comment

          • gene_x
            Senior Member
            • May 2010
            • 108

            #6
            I know.. I read from somewhere (http://openwetware.org/wiki/Wikiomic...utorial#blastn) where it indicates that nr is also used to refer to nucleotides.. that why it makes me confused about it.

            So previously people use nr for both protein and nucleotides and now it's just proteins?

            Comment

            • GenoMax
              Senior Member
              • Feb 2008
              • 7142

              #7
              Originally posted by gene_x View Post
              I know.. I read from somewhere (http://openwetware.org/wiki/Wikiomic...utorial#blastn) where it indicates that nr is also used to refer to nucleotides.. that why it makes me confused about it.

              So previously people use nr for both protein and nucleotides and now it's just proteins?
              That seems to have changed at some point in time .. not sure when that happened.

              This page at NCBI is still referring to old style options where "nr" could be used for either.

              Comment

              • gene_x
                Senior Member
                • May 2010
                • 108

                #8
                Right.. they need to modify the old pages to clear things up..

                Comment

                Latest Articles

                Collapse

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, 06-09-2026, 11:58 AM
                0 responses
                17 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-05-2026, 10:09 AM
                0 responses
                27 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-04-2026, 08:59 AM
                0 responses
                38 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-02-2026, 12:03 PM
                0 responses
                61 views
                0 reactions
                Last Post SEQadmin2  
                Working...