Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Creating local blast+ database for mouse build 37

    I am trying to create a local database to blast the MGSCv37 database. I'm on windows 7 using the latest version of blast+ and I have downloaded the fasta files from ftp://ftp.ncbi.nih.gov/genomes/M_mus...VE/BUILD.37.1/ .

    When I try to create the database for an individual chromosome I end up with 1 very long sequence. I assume this happens because the FASTA file on the NCBI website isn't in the correct format. Is there anything I can do to fix this?

  • #2
    Hi Npatel,
    I don't understand what 'long' may refer to in this context, but it shouldn't be a surprise if you are worried about long as in length because chromosomes are general long anyway. Which of the files did your download?

    Comment


    • #3
      I was working with chromosome 18. I downloaded mm_ref_chr18.fa.gz.

      I then ran:

      makeblastdb -in ref_chr18.fa -dbtype nucl -out ref_chr18.db

      which gave me:
      Building a new DB, current time: 02é05é2013 02:02:57
      New DB name: ref_char18.db
      New DB title: ref_chr18.fa
      Sequence type: Nucleotide
      Keep Linkouts: T
      Keep Mbits: T
      Maximum file size: 1000000000B
      Adding sequences from FASTA; added 1 sequences in 1.59614 seconds.

      So i assume my database is made at this point.

      From here I am trying to blast the sequence CCGAGGGTGTGTGTCCCGCAAAGCC which I know for a fact is on chromosome 18.

      To do that I input:
      blastn -query sequences.txt -db ref_char18.db -out output.txt

      Where the sequences.txt file is a notepad txt file with only CCGAGGGTGTGTGTCCCGCAAAGCC in it.

      That gives me an output of:
      BLASTN 2.2.27+


      Reference: Zheng Zhang, Scott Schwartz, Lukas Wagner, and Webb
      Miller (2000), "A greedy algorithm for aligning DNA sequences", J
      Comput Biol 2000; 7(1-2):203-14.



      Database: ref_chr18.fa
      1 sequences; 90,772,031 total letters



      Query=
      Length=25


      ***** No hits found *****



      Lambda K H
      1.33 0.621 1.12

      Gapped
      Lambda K H
      1.28 0.460 0.850

      Effective search space used: 453860055


      Database: ref_chr18.fa
      Posted date: Feb 5, 2013 2:02 AM
      Number of letters in database: 90,772,031
      Number of sequences in database: 1



      Matrix: blastn matrix 1 -2
      Gap Penalties: Existence: 0, Extension: 2.5

      That's what I've gotten so far. Not sure where I've gone wrong. Hope this additional information will help you, help me. Thanks for replying!

      Comment


      • #4
        Hi Naptel,
        I decided to replicate your experiment on a linux machine which is what I have access to at the moment with the following commands:
        ../ncbi-blast-2.2.25+/bin/makeblastdb -in mm_ref_chr18.fa -dbtype nucl
        ../ncbi-blast-2.2.25+/bin/blastn -query query.fa -db mm_ref_chr18.fa -out query.out

        And indeed there is no hit. A hit exist only if threshold are satisfied. You may have to change default parameters for this to show up as a hit. I have not thought of which to change. Just to confirm that the string exit as a substring on chr18, I use BLAT like so:
        ~/blat/blat mm_ref_chr18.fa -t=dna query.fa -q=dna -out=blast query.blast

        Eureka! It shows up
        BLASTN 2.2.11 [blat]
        Reference: Kent, WJ. (2002) BLAT - The BLAST-like alignment tool
        Query= string
        (25 letters)
        Database: mm_ref_chr18.fa
        4 sequences; 87,601,031 total letters
        Searching.done
        Score E
        Sequences producing significant alignments: (bits) Value
        gi|149269870|ref|NT_039674.7|Mm18_39714_37 50 3e-06
        >gi|149269870|ref|NT_039674.7|Mm18_39714_37
        Length = 73639148
        Score = 50 bits (128), Expect = 3e-06
        Identities = 25/25 (100%)
        Strand = Plus / Plus
        Query: 1 ccgagggtgtgtgtcccgcaaagcc 25
        |||||||||||||||||||||||||
        Sbjct: 383032 ccgagggtgtgtgtcccgcaaagcc 383056
        Database: mm_ref_chr18.fa

        I am not suggesting here that BLAT is the option for your experiment. This is just a litmus test that the string exit on the chromosome and that same experiment, performed elsewhere gave the same result. So I think your experiment is fine, and only parameters need to be address if you want a desired effect.

        HTH

        Comment


        • #5
          Hi Apexy,

          You were right, it was a matter of changing the settings. I imported a saved search strategy from the NCBI web blast using the import_saved_strategy function and I am now getting the result I need. Thanks for your help!

          Comment


          • #6
            Hi Npatel,
            I'm curious how you manage to get this to work. Did you end up doing the run on the web or imported saved_strategy function to your local machine? I'm not familiar with this. Kindly clarify.

            Thanks

            Comment


            • #7
              Sorry for the delay. I ran one instance on the web. Saved the search strategy with the specifications i desired and saved it locally. I then imported these specifications to the stand alone blast command line using the import_saved_strategy function. Hope that helps!

              Comment


              • #8
                You might have more long changing the -task flag to blastn-short. Default is megablast, which isn't optimised for finding things that small.

                "blastn -help" to get the command line options.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Genetic Variation in Immunogenetics and Antibody Diversity
                  by seqadmin



                  The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                  11-06-2024, 07:24 PM
                • seqadmin
                  Choosing Between NGS and qPCR
                  by seqadmin



                  Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                  10-18-2024, 07:11 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Today, 11:09 AM
                0 responses
                22 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, Today, 06:13 AM
                0 responses
                20 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 11-01-2024, 06:09 AM
                0 responses
                30 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 10-30-2024, 05:31 AM
                0 responses
                21 views
                0 likes
                Last Post seqadmin  
                Working...
                X