Unconfigured Ad

**mastal** · 07-05-2013, 03:33 AM

Odd values in blastx outputs

Are you using the old blast or blastplus?

Your results do look odd, the third column, which should be pident, seems to have only integer values.

What are you using to view the csv file with the results?

**AMCT** · 07-05-2013, 04:11 AM

Hi mastal

Thanks for your reply. I am using MySQL to handle the results but didn't use the decimal data type for the percent_id or bitscore columns (so, decimal places have been dropped)

Here are the same lines from the raw blast output:

contig4822,gi|281371382|ref|NP_001163830.1|,1.25,80,79,0,326,87,729,808,5e-16,86.3
contig5762,gi|326676144|ref|XP_001334811.4|,3.12,96,91,1,348,67,691,786,4e-33, 142
contig209,gi|190194343|ref|NP_001121707.1|,1.41,71,70,0,357,145,5,75,1e-21,92.4
contig320,gi|432885829|ref|XP_004074779.1|,1.71,117,115,0,352,2,275,391,4e-47, 189
contig3304,gi|348520766|ref|XP_003447898.1|,1.28,156,145,3,466,20,1084,1237,3e-24, 113

We are running blastx 2.2.25+

**rhinoceros** · 07-05-2013, 04:12 AM

Shouldn't the output have commas since it's supposed to be comma-separated values? Also, the subject sequence ids look wrong (assuming your db is a subset of refseq_protein). Finally, why would you want to have only one match for each contig, when in all likelihood, many contigs ought to have numerous ORFs..

edit. your second output looks proper..

**kmcarr** · 07-05-2013, 08:17 AM

Calculating a Bit score (from which the e-value is derived) is far more complex than just the pecent identity, escpecially so, as your case, where you are doing the BLAST search in amino acid sequence space. When aligning amino acid sequences BLAST uses a scoring matrix with weighted scores (positive and negative) for each possible pair of aligned amino acids. This is unlike alignments in nucleotide space which are simply +1 for a match and 0 for a mismatch. There are also penalties for gap opens and extensions which affect the final score. The number of identical aligned amino acids is just one factor of the Bit score calculation so while there will be a positive correlation between them there is not a direct linear relationship between % identity and Bit score.

**AMCT** · 07-07-2013, 02:54 AM

Thanks for the help so far!

I ran a second blastx with the same parametrs using a subset of sequences and found that all of the output values are the same as in the first search except for the percent_ids and the number of mismatches.

output of first search:
contig4822,gi|281371382|ref|NP_001163830.1|,1.25,80,79,0,326,87,729,808,5e-16,86.3
contig5762,gi|326676144|ref|XP_001334811.4|,3.12,96,91,1,348,67,691,786,4e-33, 142
contig209,gi|190194343|ref|NP_001121707.1|,1.41,71,70,0,357,145,5,75,1e-21,92.4
contig320,gi|432885829|ref|XP_004074779.1|,1.71,117,115,0,352,2,275,391,4e-47, 189
contig3304,gi|348520766|ref|XP_003447898.1|,1.28,156,145,3,466,20,1084,1237,3e-24, 113

output of second search:
contig4822,gi|281371382|ref|NP_001163830.1|,51.25,80,39,0,323,84,729,808,5e-16,86.3
contig5762,gi|326676144|ref|XP_001334811.4|,72.92,96,24,1,348,67,691,786,4e-33, 142
contig209,gi|190194343|ref|NP_001121707.1|,61.97,71,27,0,357,145,5,75,1e-21,92.4
contig320,gi|432885829|ref|XP_004074779.1|,80.34,117,23,0,352,2,275,391,4e-47, 189
contig3304,gi|348520766|ref|XP_003447898.1|,34.62,156,93,3,466,20,1084,1237,3e-24, 113

I have looked through the results from my other blastx searches and there also appears to be cases where the evalues and percent_ids don't correspond properly. All of the blastx searches so far have been big- thousands of input sequences against big databases so I have running the searches on multiple threads/computers at the same time (usually in batches of 3000-5000 sequences per input file)(this is why we only want the best hit for each query for now!). Perhaps this is a scale/computing problem on our end...

Topics	Statistics	Last Post
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, Today, 05:37 AM	0 responses 5 views 0 reactions	Last Post by SEQadmin2 Today, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 16 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 49 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 109 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM

Unconfigured Ad

Odd_values_In_blastx_outputs

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News