Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating local blast+ database for mouse build 37

    I am trying to create a local database to blast the MGSCv37 database. I'm on windows 7 using the latest version of blast+ and I have downloaded the fasta files from ftp://ftp.ncbi.nih.gov/genomes/M_mus...VE/BUILD.37.1/ .

    When I try to create the database for an individual chromosome I end up with 1 very long sequence. I assume this happens because the FASTA file on the NCBI website isn't in the correct format. Is there anything I can do to fix this?

  • #2
    Hi Npatel,
    I don't understand what 'long' may refer to in this context, but it shouldn't be a surprise if you are worried about long as in length because chromosomes are general long anyway. Which of the files did your download?

    Comment


    • #3
      I was working with chromosome 18. I downloaded mm_ref_chr18.fa.gz.

      I then ran:

      makeblastdb -in ref_chr18.fa -dbtype nucl -out ref_chr18.db

      which gave me:
      Building a new DB, current time: 02é05é2013 02:02:57
      New DB name: ref_char18.db
      New DB title: ref_chr18.fa
      Sequence type: Nucleotide
      Keep Linkouts: T
      Keep Mbits: T
      Maximum file size: 1000000000B
      Adding sequences from FASTA; added 1 sequences in 1.59614 seconds.

      So i assume my database is made at this point.

      From here I am trying to blast the sequence CCGAGGGTGTGTGTCCCGCAAAGCC which I know for a fact is on chromosome 18.

      To do that I input:
      blastn -query sequences.txt -db ref_char18.db -out output.txt

      Where the sequences.txt file is a notepad txt file with only CCGAGGGTGTGTGTCCCGCAAAGCC in it.

      That gives me an output of:
      BLASTN 2.2.27+


      Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
      Miller (2000), "A greedy algorithm for aligning DNA sequences", J
      Comput Biol 2000; 7(1-2):203-14.



      Database: ref_chr18.fa
      1 sequences; 90,772,031 total letters



      Query=
      Length=25


      ***** No hits found *****



      Lambda K H
      1.33 0.621 1.12

      Gapped
      Lambda K H
      1.28 0.460 0.850

      Effective search space used: 453860055


      Database: ref_chr18.fa
      Posted date: Feb 5, 2013 2:02 AM
      Number of letters in database: 90,772,031
      Number of sequences in database: 1



      Matrix: blastn matrix 1 -2
      Gap Penalties: Existence: 0, Extension: 2.5

      That's what I've gotten so far. Not sure where I've gone wrong. Hope this additional information will help you, help me. Thanks for replying!

      Comment


      • #4
        Hi Naptel,
        I decided to replicate your experiment on a linux machine which is what I have access to at the moment with the following commands:
        ../ncbi-blast-2.2.25+/bin/makeblastdb -in mm_ref_chr18.fa -dbtype nucl
        ../ncbi-blast-2.2.25+/bin/blastn -query query.fa -db mm_ref_chr18.fa -out query.out

        And indeed there is no hit. A hit exist only if threshold are satisfied. You may have to change default parameters for this to show up as a hit. I have not thought of which to change. Just to confirm that the string exit as a substring on chr18, I use BLAT like so:
        ~/blat/blat mm_ref_chr18.fa -t=dna query.fa -q=dna -out=blast query.blast

        Eureka! It shows up
        BLASTN 2.2.11 [blat]
        Reference: Kent, WJ. (2002) BLAT - The BLAST-like alignment tool
        Query= string
        (25 letters)
        Database: mm_ref_chr18.fa
        4 sequences; 87,601,031 total letters
        Searching.done
        Score E
        Sequences producing significant alignments: (bits) Value
        gi|149269870|ref|NT_039674.7|Mm18_39714_37 50 3e-06
        >gi|149269870|ref|NT_039674.7|Mm18_39714_37
        Length = 73639148
        Score = 50 bits (128), Expect = 3e-06
        Identities = 25/25 (100%)
        Strand = Plus / Plus
        Query: 1 ccgagggtgtgtgtcccgcaaagcc 25
        |||||||||||||||||||||||||
        Sbjct: 383032 ccgagggtgtgtgtcccgcaaagcc 383056
        Database: mm_ref_chr18.fa

        I am not suggesting here that BLAT is the option for your experiment. This is just a litmus test that the string exit on the chromosome and that same experiment, performed elsewhere gave the same result. So I think your experiment is fine, and only parameters need to be address if you want a desired effect.

        HTH

        Comment


        • #5
          Hi Apexy,

          You were right, it was a matter of changing the settings. I imported a saved search strategy from the NCBI web blast using the import_saved_strategy function and I am now getting the result I need. Thanks for your help!

          Comment


          • #6
            Hi Npatel,
            I'm curious how you manage to get this to work. Did you end up doing the run on the web or imported saved_strategy function to your local machine? I'm not familiar with this. Kindly clarify.

            Thanks

            Comment


            • #7
              Sorry for the delay. I ran one instance on the web. Saved the search strategy with the specifications i desired and saved it locally. I then imported these specifications to the stand alone blast command line using the import_saved_strategy function. Hope that helps!

              Comment


              • #8
                You might have more long changing the -task flag to blastn-short. Default is megablast, which isn't optimised for finding things that small.

                "blastn -help" to get the command line options.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Best Practices for Single-Cell Sequencing Analysis
                  by seqadmin



                  While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                  06-06-2024, 07:15 AM
                • seqadmin
                  Latest Developments in Precision Medicine
                  by seqadmin



                  Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                  Somatic Genomics
                  “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                  05-24-2024, 01:16 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 07:24 AM
                0 responses
                10 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 06-13-2024, 08:58 AM
                0 responses
                11 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 06-12-2024, 02:20 PM
                0 responses
                16 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 06-07-2024, 06:58 AM
                0 responses
                184 views
                0 likes
                Last Post seqadmin  
                Working...
                X