Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Kasfen
    Junior Member
    • Sep 2014
    • 4

    blast makeblastdb problem

    Dear all,

    I had encountered some problem recently with blast makeblastdb.

    $ ./makeblastdb -in transcript.fa -dbtype nucl -hash_index -parse_seqids -out transcript

    makeblastdb protein
    Building a new DB, current time: 07/16/2015 22:06:50
    New DB name: transcript
    New DB title: transcript.fa
    Sequence type: Nucleotide
    Keep Linkouts: T
    Keep MBits: T
    Maximum file size: 1000000000B
    Segmentation fault (core dumped)

    I found that some length of id in transcript.fa file are over 80 characters.

    is there any solution for this ?

    Thanks!!!
  • GenoMax
    Senior Member
    • Feb 2008
    • 7142

    #2
    What is the size of the transcript.fa file and how much RAM do you have on this machine? Do you get seg fault right away or after some time?

    Comment

    • Kasfen
      Junior Member
      • Sep 2014
      • 4

      #3
      The file size of transcript.fa is about 40MB,and the ram of my machine is 300GB.
      I got the seg fault right away.

      Comment

      • GenoMax
        Senior Member
        • Feb 2008
        • 7142

        #4
        The problem is likely something other than 80 characters. Can you post an example of your fasta sequence ID's?

        Just noticed that you have

        "makeblastdb protein"

        in your first post. Is this nucleotide or protein sequence?
        Last edited by GenoMax; 07-16-2015, 10:02 AM.

        Comment

        • Kasfen
          Junior Member
          • Sep 2014
          • 4

          #5
          There are some ID listed below.

          >1017.g16854.t1_RecName|_Full=Lethal(3)malignant_brain_tumor-like_protein_1|_Short=H-l(3)mbt|_Short=H-l(3)mbt_protein|_Short=L(3)mbt-like|_AltName|_Full=L(3)mbt_protein_homolog
          >1056.g17143.t1_RecName|_Full=Lethal(3)malignant_brain_tumor-like_protein_1|_Short=H-l(3)mbt|_Short=H-l(3)mbt_protein|_Short=L(3)mbt-like|_AltName|_Full=L(3)mbt_protein_homolog
          >1017.g16884.t1_RecName|_Full=PRELI_domain-containing_protein_1|_mitochondrial|_AltName|_Full=Px19-like_protein|_Flags|_Precursor_&gt|gi|969170|gb|AAC60046.1|_px19

          It is nucleotide sequence (only a/t/c/g/n).

          Comment

          • GenoMax
            Senior Member
            • Feb 2008
            • 7142

            #6
            The problem is with format of your ID's. I am able to make a nucleotide database with your ID's (using blast v.2.2.31) but if I try to retrieve the accession numbers then I get the error
            Code:
            $ blastdbcmd -entry all -db ./transcript -outfmt '%a'
            Error: [blastdbcmd] FASTA-style ID LCL|1017.G16854.T1_RECNAME|_FULL=LETHAL(3)MALIGNANT_BRAIN_TUMOR-LIKE_PROTEIN_1|_SHORT=H-L(3)MBT|_SHORT=H-L(3)MBT_PROTEIN|_SHORT=L(3)MBT-LIKE|_ALTNAME|_FULL=L(3)MBT_PROTEIN_HOMOLOG has too many parts.
            Error: [blastdbcmd] FASTA-style ID LCL|1056.G17143.T1_RECNAME|_FULL=LETHAL(3)MALIGNANT_BRAIN_TUMOR-LIKE_PROTEIN_1|_SHORT=H-L(3)MBT|_SHORT=H-L(3)MBT_PROTEIN|_SHORT=L(3)MBT-LIKE|_ALTNAME|_FULL=L(3)MBT_PROTEIN_HOMOLOG has too many parts.
            Error: [blastdbcmd] FASTA-style ID LCL|1017.G16884.T1_RECNAME|_FULL=PRELI_DOMAIN-CONTAINING_PROTEIN_1|_MITOCHONDRIAL|_ALTNAME|_FULL=PX19-LIKE_PROTEIN|_FLAGS|_PRECURSOR_&GT has too many parts.
            If you are able to live with shortened header ID's. e.g. like
            Code:
            $  awk -F "|" '{if (/^>/) print $1; else print $0;}' your_file.fa > new_file.fa
            Which now gives you short ID's

            Code:
            >1017.g16854.t1_RecName
            >1056.g17143.t1_RecName
            >1017.g16884.t1_RecName
            makeblastdb/blastdbcmd will work.

            Comment

            • Kasfen
              Junior Member
              • Sep 2014
              • 4

              #7
              Thank you so much!!!!!

              The problem is due to the length of ID (too long),right?

              what should I do ,if I want to keep ID untouched?

              Comment

              • GenoMax
                Senior Member
                • Feb 2008
                • 7142

                #8
                What version of blast are you using? Have you tried using the latest (v.2.2.31)? I was able to build the database fine with that version.

                The error I saw with your ID's is similar to Peter Cock's blog entry (http://blastedbio.blogspot.com/2012/...argetonly.html) though it is not for the command I am using. the -target_only option is working fine in 2.2.31.

                Perhaps it is the leading "_" that you have in the names that is causing the problem (e.g. _Short=H-l(3)mbt). Let me see if I can find a way to remove those easily.

                Update: That does not seem to be a problem. It must be something else.

                Peter also participates on this forum and he may come along with a suggestion later today.
                Last edited by GenoMax; 07-17-2015, 09:18 AM.

                Comment

                • GSviral
                  Member
                  • Dec 2014
                  • 38

                  #9
                  Sorry to dig up an old thread.

                  I am having the same problem with the makeblastdb command. I Get a segmentation fault error even when I type makeblastdb -help. It's like the command doesn't want to run whatsoever.

                  Was there any eventual solution to this problem?

                  Cheers.

                  Comment

                  Latest Articles

                  Collapse

                  • GATTACAT
                    Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                    by GATTACAT
                    Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                    07-01-2026, 11:43 AM
                  • SEQadmin2
                    Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                    by SEQadmin2


                    I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                    Here are nine questions we think about, in roughly the order they matter, before...
                    06-18-2026, 07:11 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, 07-02-2026, 11:08 AM
                  0 responses
                  7 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-30-2026, 05:37 AM
                  0 responses
                  12 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-26-2026, 11:10 AM
                  0 responses
                  20 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-17-2026, 06:09 AM
                  0 responses
                  54 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...