Announcement

Collapse
No announcement yet.

Creating local blast+ database for mouse build 37

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating local blast+ database for mouse build 37

    I am trying to create a local database to blast the MGSCv37 database. I'm on windows 7 using the latest version of blast+ and I have downloaded the fasta files from ftp://ftp.ncbi.nih.gov/genomes/M_mus...VE/BUILD.37.1/ .

    When I try to create the database for an individual chromosome I end up with 1 very long sequence. I assume this happens because the FASTA file on the NCBI website isn't in the correct format. Is there anything I can do to fix this?

  • #2
    Hi Npatel,
    I don't understand what 'long' may refer to in this context, but it shouldn't be a surprise if you are worried about long as in length because chromosomes are general long anyway. Which of the files did your download?

    Comment


    • #3
      I was working with chromosome 18. I downloaded mm_ref_chr18.fa.gz.

      I then ran:

      makeblastdb -in ref_chr18.fa -dbtype nucl -out ref_chr18.db

      which gave me:
      Building a new DB, current time: 02é05é2013 02:02:57
      New DB name: ref_char18.db
      New DB title: ref_chr18.fa
      Sequence type: Nucleotide
      Keep Linkouts: T
      Keep Mbits: T
      Maximum file size: 1000000000B
      Adding sequences from FASTA; added 1 sequences in 1.59614 seconds.

      So i assume my database is made at this point.

      From here I am trying to blast the sequence CCGAGGGTGTGTGTCCCGCAAAGCC which I know for a fact is on chromosome 18.

      To do that I input:
      blastn -query sequences.txt -db ref_char18.db -out output.txt

      Where the sequences.txt file is a notepad txt file with only CCGAGGGTGTGTGTCCCGCAAAGCC in it.

      That gives me an output of:
      BLASTN 2.2.27+


      Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
      Miller (2000), "A greedy algorithm for aligning DNA sequences", J
      Comput Biol 2000; 7(1-2):203-14.



      Database: ref_chr18.fa
      1 sequences; 90,772,031 total letters



      Query=
      Length=25


      ***** No hits found *****



      Lambda K H
      1.33 0.621 1.12

      Gapped
      Lambda K H
      1.28 0.460 0.850

      Effective search space used: 453860055


      Database: ref_chr18.fa
      Posted date: Feb 5, 2013 2:02 AM
      Number of letters in database: 90,772,031
      Number of sequences in database: 1



      Matrix: blastn matrix 1 -2
      Gap Penalties: Existence: 0, Extension: 2.5

      That's what I've gotten so far. Not sure where I've gone wrong. Hope this additional information will help you, help me. Thanks for replying!

      Comment


      • #4
        Hi Naptel,
        I decided to replicate your experiment on a linux machine which is what I have access to at the moment with the following commands:
        ../ncbi-blast-2.2.25+/bin/makeblastdb -in mm_ref_chr18.fa -dbtype nucl
        ../ncbi-blast-2.2.25+/bin/blastn -query query.fa -db mm_ref_chr18.fa -out query.out

        And indeed there is no hit. A hit exist only if threshold are satisfied. You may have to change default parameters for this to show up as a hit. I have not thought of which to change. Just to confirm that the string exit as a substring on chr18, I use BLAT like so:
        ~/blat/blat mm_ref_chr18.fa -t=dna query.fa -q=dna -out=blast query.blast

        Eureka! It shows up
        BLASTN 2.2.11 [blat]
        Reference: Kent, WJ. (2002) BLAT - The BLAST-like alignment tool
        Query= string
        (25 letters)
        Database: mm_ref_chr18.fa
        4 sequences; 87,601,031 total letters
        Searching.done
        Score E
        Sequences producing significant alignments: (bits) Value
        gi|149269870|ref|NT_039674.7|Mm18_39714_37 50 3e-06
        >gi|149269870|ref|NT_039674.7|Mm18_39714_37
        Length = 73639148
        Score = 50 bits (128), Expect = 3e-06
        Identities = 25/25 (100%)
        Strand = Plus / Plus
        Query: 1 ccgagggtgtgtgtcccgcaaagcc 25
        |||||||||||||||||||||||||
        Sbjct: 383032 ccgagggtgtgtgtcccgcaaagcc 383056
        Database: mm_ref_chr18.fa

        I am not suggesting here that BLAT is the option for your experiment. This is just a litmus test that the string exit on the chromosome and that same experiment, performed elsewhere gave the same result. So I think your experiment is fine, and only parameters need to be address if you want a desired effect.

        HTH

        Comment


        • #5
          Hi Apexy,

          You were right, it was a matter of changing the settings. I imported a saved search strategy from the NCBI web blast using the import_saved_strategy function and I am now getting the result I need. Thanks for your help!

          Comment


          • #6
            Hi Npatel,
            I'm curious how you manage to get this to work. Did you end up doing the run on the web or imported saved_strategy function to your local machine? I'm not familiar with this. Kindly clarify.

            Thanks

            Comment


            • #7
              Sorry for the delay. I ran one instance on the web. Saved the search strategy with the specifications i desired and saved it locally. I then imported these specifications to the stand alone blast command line using the import_saved_strategy function. Hope that helps!

              Comment


              • #8
                You might have more long changing the -task flag to blastn-short. Default is megablast, which isn't optimised for finding things that small.

                "blastn -help" to get the command line options.

                Comment

                Working...
                X