Announcement

Collapse
No announcement yet.

blast makeblastdb problem

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • blast makeblastdb problem

    Dear all,

    I had encountered some problem recently with blast makeblastdb.

    $ ./makeblastdb -in transcript.fa -dbtype nucl -hash_index -parse_seqids -out transcript

    makeblastdb protein
    Building a new DB, current time: 07/16/2015 22:06:50
    New DB name: transcript
    New DB title: transcript.fa
    Sequence type: Nucleotide
    Keep Linkouts: T
    Keep MBits: T
    Maximum file size: 1000000000B
    Segmentation fault (core dumped)

    I found that some length of id in transcript.fa file are over 80 characters.

    is there any solution for this ?

    Thanks!!!

  • #2
    What is the size of the transcript.fa file and how much RAM do you have on this machine? Do you get seg fault right away or after some time?

    Comment


    • #3
      The file size of transcript.fa is about 40MB,and the ram of my machine is 300GB.
      I got the seg fault right away.

      Comment


      • #4
        The problem is likely something other than 80 characters. Can you post an example of your fasta sequence ID's?

        Just noticed that you have

        "makeblastdb protein"

        in your first post. Is this nucleotide or protein sequence?
        Last edited by GenoMax; 07-16-2015, 10:02 AM.

        Comment


        • #5
          There are some ID listed below.

          >1017.g16854.t1_RecName|_Full=Lethal(3)malignant_brain_tumor-like_protein_1|_Short=H-l(3)mbt|_Short=H-l(3)mbt_protein|_Short=L(3)mbt-like|_AltName|_Full=L(3)mbt_protein_homolog
          >1056.g17143.t1_RecName|_Full=Lethal(3)malignant_brain_tumor-like_protein_1|_Short=H-l(3)mbt|_Short=H-l(3)mbt_protein|_Short=L(3)mbt-like|_AltName|_Full=L(3)mbt_protein_homolog
          >1017.g16884.t1_RecName|_Full=PRELI_domain-containing_protein_1|_mitochondrial|_AltName|_Full=Px19-like_protein|_Flags|_Precursor_&gt|gi|969170|gb|AAC60046.1|_px19

          It is nucleotide sequence (only a/t/c/g/n).

          Comment


          • #6
            The problem is with format of your ID's. I am able to make a nucleotide database with your ID's (using blast v.2.2.31) but if I try to retrieve the accession numbers then I get the error
            Code:
            $ blastdbcmd -entry all -db ./transcript -outfmt '%a'
            Error: [blastdbcmd] FASTA-style ID LCL|1017.G16854.T1_RECNAME|_FULL=LETHAL(3)MALIGNANT_BRAIN_TUMOR-LIKE_PROTEIN_1|_SHORT=H-L(3)MBT|_SHORT=H-L(3)MBT_PROTEIN|_SHORT=L(3)MBT-LIKE|_ALTNAME|_FULL=L(3)MBT_PROTEIN_HOMOLOG has too many parts.
            Error: [blastdbcmd] FASTA-style ID LCL|1056.G17143.T1_RECNAME|_FULL=LETHAL(3)MALIGNANT_BRAIN_TUMOR-LIKE_PROTEIN_1|_SHORT=H-L(3)MBT|_SHORT=H-L(3)MBT_PROTEIN|_SHORT=L(3)MBT-LIKE|_ALTNAME|_FULL=L(3)MBT_PROTEIN_HOMOLOG has too many parts.
            Error: [blastdbcmd] FASTA-style ID LCL|1017.G16884.T1_RECNAME|_FULL=PRELI_DOMAIN-CONTAINING_PROTEIN_1|_MITOCHONDRIAL|_ALTNAME|_FULL=PX19-LIKE_PROTEIN|_FLAGS|_PRECURSOR_&GT has too many parts.
            If you are able to live with shortened header ID's. e.g. like
            Code:
            $  awk -F "|" '{if (/^>/) print $1; else print $0;}' your_file.fa > new_file.fa
            Which now gives you short ID's

            Code:
            >1017.g16854.t1_RecName
            >1056.g17143.t1_RecName
            >1017.g16884.t1_RecName
            makeblastdb/blastdbcmd will work.

            Comment


            • #7
              Thank you so much!!!!!

              The problem is due to the length of ID (too long),right?

              what should I do ,if I want to keep ID untouched?

              Comment


              • #8
                What version of blast are you using? Have you tried using the latest (v.2.2.31)? I was able to build the database fine with that version.

                The error I saw with your ID's is similar to Peter Cock's blog entry (http://blastedbio.blogspot.com/2012/...argetonly.html) though it is not for the command I am using. the -target_only option is working fine in 2.2.31.

                Perhaps it is the leading "_" that you have in the names that is causing the problem (e.g. _Short=H-l(3)mbt). Let me see if I can find a way to remove those easily.

                Update: That does not seem to be a problem. It must be something else.

                Peter also participates on this forum and he may come along with a suggestion later today.
                Last edited by GenoMax; 07-17-2015, 09:18 AM.

                Comment


                • #9
                  Sorry to dig up an old thread.

                  I am having the same problem with the makeblastdb command. I Get a segmentation fault error even when I type makeblastdb -help. It's like the command doesn't want to run whatsoever.

                  Was there any eventual solution to this problem?

                  Cheers.

                  Comment

                  Working...
                  X