Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • npatel
    Junior Member
    • Feb 2013
    • 4

    Creating local blast+ database for mouse build 37

    I am trying to create a local database to blast the MGSCv37 database. I'm on windows 7 using the latest version of blast+ and I have downloaded the fasta files from ftp://ftp.ncbi.nih.gov/genomes/M_mus...VE/BUILD.37.1/ .

    When I try to create the database for an individual chromosome I end up with 1 very long sequence. I assume this happens because the FASTA file on the NCBI website isn't in the correct format. Is there anything I can do to fix this?
  • Apexy
    Member
    • Apr 2011
    • 62

    #2
    Hi Npatel,
    I don't understand what 'long' may refer to in this context, but it shouldn't be a surprise if you are worried about long as in length because chromosomes are general long anyway. Which of the files did your download?

    Comment

    • npatel
      Junior Member
      • Feb 2013
      • 4

      #3
      I was working with chromosome 18. I downloaded mm_ref_chr18.fa.gz.

      I then ran:

      makeblastdb -in ref_chr18.fa -dbtype nucl -out ref_chr18.db

      which gave me:
      Building a new DB, current time: 02é05é2013 02:02:57
      New DB name: ref_char18.db
      New DB title: ref_chr18.fa
      Sequence type: Nucleotide
      Keep Linkouts: T
      Keep Mbits: T
      Maximum file size: 1000000000B
      Adding sequences from FASTA; added 1 sequences in 1.59614 seconds.

      So i assume my database is made at this point.

      From here I am trying to blast the sequence CCGAGGGTGTGTGTCCCGCAAAGCC which I know for a fact is on chromosome 18.

      To do that I input:
      blastn -query sequences.txt -db ref_char18.db -out output.txt

      Where the sequences.txt file is a notepad txt file with only CCGAGGGTGTGTGTCCCGCAAAGCC in it.

      That gives me an output of:
      BLASTN 2.2.27+


      Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
      Miller (2000), "A greedy algorithm for aligning DNA sequences", J
      Comput Biol 2000; 7(1-2):203-14.



      Database: ref_chr18.fa
      1 sequences; 90,772,031 total letters



      Query=
      Length=25


      ***** No hits found *****



      Lambda K H
      1.33 0.621 1.12

      Gapped
      Lambda K H
      1.28 0.460 0.850

      Effective search space used: 453860055


      Database: ref_chr18.fa
      Posted date: Feb 5, 2013 2:02 AM
      Number of letters in database: 90,772,031
      Number of sequences in database: 1



      Matrix: blastn matrix 1 -2
      Gap Penalties: Existence: 0, Extension: 2.5

      That's what I've gotten so far. Not sure where I've gone wrong. Hope this additional information will help you, help me. Thanks for replying!

      Comment

      • Apexy
        Member
        • Apr 2011
        • 62

        #4
        Hi Naptel,
        I decided to replicate your experiment on a linux machine which is what I have access to at the moment with the following commands:
        ../ncbi-blast-2.2.25+/bin/makeblastdb -in mm_ref_chr18.fa -dbtype nucl
        ../ncbi-blast-2.2.25+/bin/blastn -query query.fa -db mm_ref_chr18.fa -out query.out

        And indeed there is no hit. A hit exist only if threshold are satisfied. You may have to change default parameters for this to show up as a hit. I have not thought of which to change. Just to confirm that the string exit as a substring on chr18, I use BLAT like so:
        ~/blat/blat mm_ref_chr18.fa -t=dna query.fa -q=dna -out=blast query.blast

        Eureka! It shows up
        BLASTN 2.2.11 [blat]
        Reference: Kent, WJ. (2002) BLAT - The BLAST-like alignment tool
        Query= string
        (25 letters)
        Database: mm_ref_chr18.fa
        4 sequences; 87,601,031 total letters
        Searching.done
        Score E
        Sequences producing significant alignments: (bits) Value
        gi|149269870|ref|NT_039674.7|Mm18_39714_37 50 3e-06
        >gi|149269870|ref|NT_039674.7|Mm18_39714_37
        Length = 73639148
        Score = 50 bits (128), Expect = 3e-06
        Identities = 25/25 (100%)
        Strand = Plus / Plus
        Query: 1 ccgagggtgtgtgtcccgcaaagcc 25
        |||||||||||||||||||||||||
        Sbjct: 383032 ccgagggtgtgtgtcccgcaaagcc 383056
        Database: mm_ref_chr18.fa

        I am not suggesting here that BLAT is the option for your experiment. This is just a litmus test that the string exit on the chromosome and that same experiment, performed elsewhere gave the same result. So I think your experiment is fine, and only parameters need to be address if you want a desired effect.

        HTH

        Comment

        • npatel
          Junior Member
          • Feb 2013
          • 4

          #5
          Hi Apexy,

          You were right, it was a matter of changing the settings. I imported a saved search strategy from the NCBI web blast using the import_saved_strategy function and I am now getting the result I need. Thanks for your help!

          Comment

          • Apexy
            Member
            • Apr 2011
            • 62

            #6
            Hi Npatel,
            I'm curious how you manage to get this to work. Did you end up doing the run on the web or imported saved_strategy function to your local machine? I'm not familiar with this. Kindly clarify.

            Thanks

            Comment

            • npatel
              Junior Member
              • Feb 2013
              • 4

              #7
              Sorry for the delay. I ran one instance on the web. Saved the search strategy with the specifications i desired and saved it locally. I then imported these specifications to the stand alone blast command line using the import_saved_strategy function. Hope that helps!

              Comment

              • A.N.Other
                Member
                • Feb 2012
                • 26

                #8
                You might have more long changing the -task flag to blastn-short. Default is megablast, which isn't optimised for finding things that small.

                "blastn -help" to get the command line options.

                Comment

                Latest Articles

                Collapse

                • SEQadmin2
                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                  by SEQadmin2


                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                  ...
                  06-02-2026, 10:05 AM
                • SEQadmin2
                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                  by SEQadmin2


                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                  Introduction

                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                  05-22-2026, 06:42 AM
                • SEQadmin2
                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                  by SEQadmin2

                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                  05-06-2026, 09:04 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, 06-02-2026, 12:03 PM
                0 responses
                21 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-02-2026, 11:40 AM
                0 responses
                14 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 05-28-2026, 11:40 AM
                0 responses
                29 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 05-26-2026, 10:12 AM
                0 responses
                31 views
                0 reactions
                Last Post SEQadmin2  
                Working...