Seqanswers Leaderboard Ad

**ctstackh** · 03-19-2015, 01:16 PM

Additionally

$ head ../../Phosphosite_seq.fasta
>CBLN1|mouse|Q9R171
MLGVVELLLLGTAWLAGPARGQNETEPIVLEGKCLVVCDSNPTSDPTGTALGISVRSGSA
KVAFSAIRSTNHEPSEMSNRTMIIYFDQVLVNIGNNFDSERSTFIAPRKGIYSFNFHVVK
VYNRQTIQVSLMLNGWPVISAFAGDQDVTREAASNGVLIQMEKGDRAYLKLERGNLMGGW
KYSTFSGFLVFPL
>COX7A2|mouse|P48771
MLRNLLALRQIAQRTISTTSRRHFENKVPEKQKLFQEDNGMPVHLKGGASDALLYRATMA
LTLGGTAYAIYLLAMAAFPKKQN
>FAM219A|mouse|Q9D772
MMEEIDRFQDPAAASISDRDCDAREEKQRELARKGSLKNGSMGSPVNQQPKKNNVMARTR

**LeightonP** · 03-19-2015, 02:04 PM

I would expect that the problem is that one of your sequences has no residues in it, but I can't reproduce the problem with your test data:

Code:

lpritc@Totoro:~$ more test.fas
>CBLN1|mouse|Q9R171
MLGVVELLLLGTAWLAGPARGQNETEPIVLEGKCLVVCDSNPTSDPTGTALGISVRSGSA
KVAFSAIRSTNHEPSEMSNRTMIIYFDQVLVNIGNNFDSERSTFIAPRKGIYSFNFHVVK
VYNRQTIQVSLMLNGWPVISAFAGDQDVTREAASNGVLIQMEKGDRAYLKLERGNLMGGW
KYSTFSGFLVFPL
>COX7A2|mouse|P48771
MLRNLLALRQIAQRTISTTSRRHFENKVPEKQKLFQEDNGMPVHLKGGASDALLYRATMA
LTLGGTAYAIYLLAMAAFPKKQN
>FAM219A|mouse|Q9D772
MMEEIDRFQDPAAASISDRDCDAREEKQRELARKGSLKNGSMGSPVNQQPKKNNVMARTR
lpritc@Totoro:~$ makeblastdb -in test.fas -input_type fasta -dbtype prot -title test_db -out test


Building a new DB, current time: 03/19/2015 22:01:53
New DB name:   test
New DB title:  test_db
Sequence type: Protein
Deleted existing BLAST database with identical name.
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1000000000B
Adding sequences from FASTA; added 3 sequences in 0.000775099 seconds.

If I were you, I would inspect the complete input file for sequences with no residues (e.g.

Code:

>some_sequence_id_1

>some_sequence_id_2
ACGHITNKLLSMNER

This can happen if you've been using a tool that masks repeats.

**ctstackh** · 03-19-2015, 02:54 PM

Originally posted by LeightonP View Post

If I were you, I would inspect the complete input file for sequences with no residues (e.g.

Code:

>some_sequence_id_1

>some_sequence_id_2
ACGHITNKLLSMNER

This can happen if you've been using a tool that masks repeats.

I haven't used any program to mask repeats. It is a seq database listing the complete sequence of proteins used in the Phosphosite database. There should be no empty sequences.

**LeightonP** · 03-20-2015, 03:54 AM

Originally posted by ctstackh View Post

I haven't used any program to mask repeats. It is a seq database listing the complete sequence of proteins used in the Phosphosite database. There should be no empty sequences.

One of the frustrations of bioinformatics is that datasets don't always contain exactly what you expect. Did you check that there are - as you expect - no empty sequences?

I can reproduce your error with a fake dataset containing an empty sequence, and I still think that this is possibly a cause of your error:

Code:

lpritc@Totoro:~$ cat > test.fas
>seq1
ATGCTGTCAGCTAGCTGATCGATCGGC
>seq2

>seq3
GHILKPNMACDEFGH
lpritc@Totoro:~$ makeblastdb -in test.fas -input_type fasta -dbtype prot -title test_db -out test


Building a new DB, current time: 03/20/2015 11:55:19
New DB name:   test
New DB title:  test_db
Sequence type: Protein
Keep Linkouts: T
Keep MBits: T
Maximum file size: 1000000000B

volume: test

file: test.pin
file: test.phr
file: test.psq

BLAST Database creation error: FASTA-Reader: No residues given

**LeightonP** · 03-20-2015, 04:03 AM

You can count the number of blank lines in a file (filename) with:

Code:

grep -c "^$" filename

or, if you consider whitespace to be "blank":

Code:

grep -c "^\s*$" filename

If you have any in your FASTA file, from which you're trying to build your database, that may be the problem. You can see the surrounding context of blank lines with:

Code:

grep -C "^$" filename

(note: capital 'C' this time) - this should help you find any blank line in your file and edit it.

**GenoMax** · 03-20-2015, 04:13 AM

Do this (adjust file names accordingly):

1. Using BBMap's reformat.sh remove the line wrapping from the Phosphosite_seq.txt and make the sequence names unique.

Code:

$ reformat.sh in=Phosphosite_seq.txt out=reform.fa uniquenames=t fastawrap=80000

2. Build the database with makeblastdb

Code:

$ makeblastdb -in reform.fa -dbtype prot -out Phosphosite_seq -title Phosphosite_seq_db

**ctstackh** · 03-20-2015, 08:13 AM

Originally posted by GenoMax View Post

Do this (adjust file names accordingly):

1. Using BBMap's reformat.sh remove the line wrapping from the Phosphosite_seq.txt and make the sequence names unique.

Code:

$ reformat.sh in=Phosphosite_seq.txt out=reform.fa uniquenames=t fastawrap=80000

2. Build the database with makeblastdb

Code:

$ makeblastdb -in reform.fa -dbtype prot -out Phosphosite_seq -title Phosphosite_seq_db

Sweet! That worked. Thank you!

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 20 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

makeblastdb protein

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News