Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • makeblastdb protein

    $ ./makeblastdb -in ../../Phosphosite_seq.fasta -input_type
    fasta -dbtype prot -title Phosphosite_seq_db -out Phosphosite_seq


    Building a new DB, current time: 03/19/2015 16:03:22
    New DB name: Phosphosite_seq
    New DB title: Phosphosite_seq_db
    Sequence type: Protein
    Keep Linkouts: T
    Keep MBits: T
    Maximum file size: 1000000000B

    volume: Phosphosite_seq

    file: Phosphosite_seq.pin
    file: Phosphosite_seq.phr
    file: Phosphosite_seq.psq

    BLAST Database creation error: FASTA-Reader: No residues given


    Any ideas?

  • #2
    Additionally

    $ head ../../Phosphosite_seq.fasta
    >CBLN1|mouse|Q9R171
    MLGVVELLLLGTAWLAGPARGQNETEPIVLEGKCLVVCDSNPTSDPTGTALGISVRSGSA
    KVAFSAIRSTNHEPSEMSNRTMIIYFDQVLVNIGNNFDSERSTFIAPRKGIYSFNFHVVK
    VYNRQTIQVSLMLNGWPVISAFAGDQDVTREAASNGVLIQMEKGDRAYLKLERGNLMGGW
    KYSTFSGFLVFPL
    >COX7A2|mouse|P48771
    MLRNLLALRQIAQRTISTTSRRHFENKVPEKQKLFQEDNGMPVHLKGGASDALLYRATMA
    LTLGGTAYAIYLLAMAAFPKKQN
    >FAM219A|mouse|Q9D772
    MMEEIDRFQDPAAASISDRDCDAREEKQRELARKGSLKNGSMGSPVNQQPKKNNVMARTR

    Comment


    • #3
      I would expect that the problem is that one of your sequences has no residues in it, but I can't reproduce the problem with your test data:

      Code:
      lpritc@Totoro:~$ more test.fas
      >CBLN1|mouse|Q9R171
      MLGVVELLLLGTAWLAGPARGQNETEPIVLEGKCLVVCDSNPTSDPTGTALGISVRSGSA
      KVAFSAIRSTNHEPSEMSNRTMIIYFDQVLVNIGNNFDSERSTFIAPRKGIYSFNFHVVK
      VYNRQTIQVSLMLNGWPVISAFAGDQDVTREAASNGVLIQMEKGDRAYLKLERGNLMGGW
      KYSTFSGFLVFPL
      >COX7A2|mouse|P48771
      MLRNLLALRQIAQRTISTTSRRHFENKVPEKQKLFQEDNGMPVHLKGGASDALLYRATMA
      LTLGGTAYAIYLLAMAAFPKKQN
      >FAM219A|mouse|Q9D772
      MMEEIDRFQDPAAASISDRDCDAREEKQRELARKGSLKNGSMGSPVNQQPKKNNVMARTR
      lpritc@Totoro:~$ makeblastdb -in test.fas -input_type fasta -dbtype prot -title test_db -out test
      
      
      Building a new DB, current time: 03/19/2015 22:01:53
      New DB name:   test
      New DB title:  test_db
      Sequence type: Protein
      Deleted existing BLAST database with identical name.
      Keep Linkouts: T
      Keep MBits: T
      Maximum file size: 1000000000B
      Adding sequences from FASTA; added 3 sequences in 0.000775099 seconds.
      If I were you, I would inspect the complete input file for sequences with no residues (e.g.

      Code:
      >some_sequence_id_1
      
      >some_sequence_id_2
      ACGHITNKLLSMNER
      This can happen if you've been using a tool that masks repeats.
      Last edited by LeightonP; 03-19-2015, 02:06 PM.

      Comment


      • #4
        Originally posted by LeightonP View Post
        If I were you, I would inspect the complete input file for sequences with no residues (e.g.

        Code:
        >some_sequence_id_1
        
        >some_sequence_id_2
        ACGHITNKLLSMNER
        This can happen if you've been using a tool that masks repeats.
        I haven't used any program to mask repeats. It is a seq database listing the complete sequence of proteins used in the Phosphosite database. There should be no empty sequences.

        Comment


        • #5
          Originally posted by ctstackh View Post
          I haven't used any program to mask repeats. It is a seq database listing the complete sequence of proteins used in the Phosphosite database. There should be no empty sequences.
          One of the frustrations of bioinformatics is that datasets don't always contain exactly what you expect. Did you check that there are - as you expect - no empty sequences?

          I can reproduce your error with a fake dataset containing an empty sequence, and I still think that this is possibly a cause of your error:

          Code:
          lpritc@Totoro:~$ cat > test.fas
          >seq1
          ATGCTGTCAGCTAGCTGATCGATCGGC
          >seq2
          
          >seq3
          GHILKPNMACDEFGH
          lpritc@Totoro:~$ makeblastdb -in test.fas -input_type fasta -dbtype prot -title test_db -out test
          
          
          Building a new DB, current time: 03/20/2015 11:55:19
          New DB name:   test
          New DB title:  test_db
          Sequence type: Protein
          Keep Linkouts: T
          Keep MBits: T
          Maximum file size: 1000000000B
          
          volume: test
          
          file: test.pin
          file: test.phr
          file: test.psq
          
          BLAST Database creation error: FASTA-Reader: No residues given
          Last edited by LeightonP; 03-20-2015, 03:56 AM.

          Comment


          • #6
            You can count the number of blank lines in a file (filename) with:

            Code:
            grep -c "^$" filename
            or, if you consider whitespace to be "blank":

            Code:
            grep -c "^\s*$" filename
            If you have any in your FASTA file, from which you're trying to build your database, that may be the problem. You can see the surrounding context of blank lines with:

            Code:
            grep -C "^$" filename
            (note: capital 'C' this time) - this should help you find any blank line in your file and edit it.

            Comment


            • #7
              Do this (adjust file names accordingly):

              1. Using BBMap's reformat.sh remove the line wrapping from the Phosphosite_seq.txt and make the sequence names unique.

              Code:
              $ reformat.sh in=Phosphosite_seq.txt out=reform.fa uniquenames=t fastawrap=80000
              2. Build the database with makeblastdb

              Code:
              $ makeblastdb -in reform.fa -dbtype prot -out Phosphosite_seq -title Phosphosite_seq_db

              Comment


              • #8
                Originally posted by GenoMax View Post
                Do this (adjust file names accordingly):

                1. Using BBMap's reformat.sh remove the line wrapping from the Phosphosite_seq.txt and make the sequence names unique.

                Code:
                $ reformat.sh in=Phosphosite_seq.txt out=reform.fa uniquenames=t fastawrap=80000
                2. Build the database with makeblastdb

                Code:
                $ makeblastdb -in reform.fa -dbtype prot -out Phosphosite_seq -title Phosphosite_seq_db
                Sweet! That worked. Thank you!

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin




                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                  04-22-2024, 07:01 AM
                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-25-2024, 11:49 AM
                0 responses
                19 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-24-2024, 08:47 AM
                0 responses
                20 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                62 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                61 views
                0 likes
                Last Post seqadmin  
                Working...
                X