Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • rbruenn
    Junior Member
    • Oct 2015
    • 5

    New member with a blast database problem

    Hello!

    I am a graduate student at UC Berkeley currently working with raw reads of several transcriptomes in an attempt to find and assemble reads that match a couple of genes I'm studying. This site has already been very useful to me (Thank you!!!!), but I haven't found any answers pertaining to my current problem.

    I was hoping I could get some help with a BLAST problem I'm having. I am working with standalone blast, and am building blast databases from fasta files of transcriptome raw reads. I have successfully used the command:

    makeblastdb -in COST_1_final.fasta -input_type fasta -dbtype nucl -out COST_1_final for several fasta files of raw reads (with varying names of course) but a few of the files result in multiple sets of database files, marked for example <filename>.00.nhr and <filename>.01.nhr with what I believe is an alias file <filename>.nal

    The command as it runs give the usual message:
    Building a new DB, current time: 10/15/2015 10:49:55
    New DB name: COST_u_final
    New DB title: COST_u_final.fasta
    Sequence type: Nucleotide
    Keep Linkouts: T
    Keep MBits: T
    Maximum file size: 1000000000B
    Adding sequences from FASTA; added 16313457 sequences in 937.429 seconds.

    but then results in multiple sets of files.

    Any ideas about how I can make just one set of database files, as has successfully happened with the rest of my fasta files? I would really appreciate any help!!
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    It is normal to get multiple files per blast database. That is how makeblastdb is supposed to work. Just make sure files for a database stay together in the same directory and you use the "basename" for the database (a suggestion: name your database some thing other than your input file name) when you run your searches.

    Comment

    • rbruenn
      Junior Member
      • Oct 2015
      • 5

      #3
      I've always gotten multiple files in the sense of .nhr, .nin, .nsq files, but I am getting 2 of each, like .00.nhr and .01.nhr. Why would that only happen some of the time?

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        #4
        That is probably dependent of the size of the input fasta file. e.g. nt database has 32 fragment files now.

        There should a database_name.nal file that enumerates all the file pieces if there are more than one.

        Comment

        • rbruenn
          Junior Member
          • Oct 2015
          • 5

          #5
          Ahhh, interesting! So if I search using the command

          blastn -db COST_u_final.fasta -query Genefiles -outfmt 6 -out BLASTresults.txt

          will it search all of the files that were made?

          Thank you for your help! I had no idea this wasn't a problem since I've never seen this happen before.

          Comment

          • GenoMax
            Senior Member
            • Feb 2008
            • 7142

            #6
            Yes that is correct.

            Just to make things less confusing (to others, if needed later on) don't use the fasta file name as the -out database basename.

            Comment

            • rbruenn
              Junior Member
              • Oct 2015
              • 5

              #7
              Thank you for the advice, I will change that practice.

              Comment

              • westerman
                Rick Westerman
                • Jun 2008
                • 1104

                #8
                Hum. I often use the fast file name as the blastDB name. Keeps them together. The extensions are going to be different so there should be no confusion. @GenoMax: what do you use?

                Comment

                • GenoMax
                  Senior Member
                  • Feb 2008
                  • 7142

                  #9
                  I generally drop the .fasta/.fa part when naming a blast db. (Like NCBI. They don't call their db's nt.fa or nr.fa).

                  The idea of using just the "basename" when specifying a db index is a new for some. I suppose keeping the .fasta/.fa may be more logical for them.

                  Comment

                  Latest Articles

                  Collapse

                  • GATTACAT
                    Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                    by GATTACAT
                    Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                    07-01-2026, 11:43 AM
                  • SEQadmin2
                    Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                    by SEQadmin2


                    I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                    Here are nine questions we think about, in roughly the order they matter, before...
                    06-18-2026, 07:11 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, 07-02-2026, 11:08 AM
                  0 responses
                  13 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-30-2026, 05:37 AM
                  0 responses
                  15 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-26-2026, 11:10 AM
                  0 responses
                  20 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-17-2026, 06:09 AM
                  0 responses
                  54 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...