Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Odd_values_In_blastx_outputs

    Hi all

    I am getting some odd values in my blast outputs. The evalues in the outputs don't seem to correspond properly with the percent_ids (and number of mismatches). I am using Blast 2.2.25 and I have been submitting the jobs using the following command:

    #Resources
    #$ -pe orte 4

    #Run this command:
    blastx -query xaa -db /home/my_dir/refseq_dbs/refseq_proteins_eukaryotes.fasta -evalue
    0.01 -outfmt 10 -max_target_seqs 1 -out out_a.txt -num_threads 4

    I have also tried using the -num_descriptions and the -num_alignments flags instead of the -max_target_seqs flag but this has not fixed the problem.

    Here is a few lines from the blastoutput:
    qseqid sseqid pident length mismatch gapopen qstart qend sstart send evalue bitscore

    contig4822 281371382 1 80 79 0 326 87 729 808 5.00E-016 86
    contig5762 326676144 3 96 91 1 348 67 691 786 4.00E-033 142
    contig209 190194343 1 71 70 0 357 145 5 75 1.00E-021 92
    contig320 432885829 2 117 115 0 352 2 275 391 4.00E-047 189
    contig3304 348520766 1 156 145 3 466 20 1084 1237 3.00E-024 113

    Am I missing something?

    Thanks in advance!

  • #2
    Odd values in blastx outputs

    Are you using the old blast or blastplus?

    Your results do look odd, the third column, which should be pident, seems to have only integer values.

    What are you using to view the csv file with the results?

    Comment


    • #3
      Hi mastal

      Thanks for your reply. I am using MySQL to handle the results but didn't use the decimal data type for the percent_id or bitscore columns (so, decimal places have been dropped)

      Here are the same lines from the raw blast output:

      contig4822,gi|281371382|ref|NP_001163830.1|,1.25,80,79,0,326,87,729,808,5e-16,86.3
      contig5762,gi|326676144|ref|XP_001334811.4|,3.12,96,91,1,348,67,691,786,4e-33, 142
      contig209,gi|190194343|ref|NP_001121707.1|,1.41,71,70,0,357,145,5,75,1e-21,92.4
      contig320,gi|432885829|ref|XP_004074779.1|,1.71,117,115,0,352,2,275,391,4e-47, 189
      contig3304,gi|348520766|ref|XP_003447898.1|,1.28,156,145,3,466,20,1084,1237,3e-24, 113

      We are running blastx 2.2.25+

      Comment


      • #4
        Shouldn't the output have commas since it's supposed to be comma-separated values? Also, the subject sequence ids look wrong (assuming your db is a subset of refseq_protein). Finally, why would you want to have only one match for each contig, when in all likelihood, many contigs ought to have numerous ORFs..

        edit. your second output looks proper..
        savetherhino.org

        Comment


        • #5
          Calculating a Bit score (from which the e-value is derived) is far more complex than just the pecent identity, escpecially so, as your case, where you are doing the BLAST search in amino acid sequence space. When aligning amino acid sequences BLAST uses a scoring matrix with weighted scores (positive and negative) for each possible pair of aligned amino acids. This is unlike alignments in nucleotide space which are simply +1 for a match and 0 for a mismatch. There are also penalties for gap opens and extensions which affect the final score. The number of identical aligned amino acids is just one factor of the Bit score calculation so while there will be a positive correlation between them there is not a direct linear relationship between % identity and Bit score.

          Comment


          • #6
            Thanks for the help so far!

            I ran a second blastx with the same parametrs using a subset of sequences and found that all of the output values are the same as in the first search except for the percent_ids and the number of mismatches.

            output of first search:
            contig4822,gi|281371382|ref|NP_001163830.1|,1.25,80,79,0,326,87,729,808,5e-16,86.3
            contig5762,gi|326676144|ref|XP_001334811.4|,3.12,96,91,1,348,67,691,786,4e-33, 142
            contig209,gi|190194343|ref|NP_001121707.1|,1.41,71,70,0,357,145,5,75,1e-21,92.4
            contig320,gi|432885829|ref|XP_004074779.1|,1.71,117,115,0,352,2,275,391,4e-47, 189
            contig3304,gi|348520766|ref|XP_003447898.1|,1.28,156,145,3,466,20,1084,1237,3e-24, 113

            output of second search:
            contig4822,gi|281371382|ref|NP_001163830.1|,51.25,80,39,0,323,84,729,808,5e-16,86.3
            contig5762,gi|326676144|ref|XP_001334811.4|,72.92,96,24,1,348,67,691,786,4e-33, 142
            contig209,gi|190194343|ref|NP_001121707.1|,61.97,71,27,0,357,145,5,75,1e-21,92.4
            contig320,gi|432885829|ref|XP_004074779.1|,80.34,117,23,0,352,2,275,391,4e-47, 189
            contig3304,gi|348520766|ref|XP_003447898.1|,34.62,156,93,3,466,20,1084,1237,3e-24, 113

            I have looked through the results from my other blastx searches and there also appears to be cases where the evalues and percent_ids don't correspond properly. All of the blastx searches so far have been big- thousands of input sequences against big databases so I have running the searches on multiple threads/computers at the same time (usually in batches of 3000-5000 sequences per input file)(this is why we only want the best hit for each query for now!). Perhaps this is a scale/computing problem on our end...

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Advanced Tools Transforming the Field of Cytogenomics
              by seqadmin


              At the intersection of cytogenetics and genomics lies the exciting field of cytogenomics. It focuses on studying chromosomes at a molecular scale, involving techniques that analyze either the whole genome or particular DNA sequences to examine variations in structure and behavior at the chromosomal or subchromosomal level. By integrating cytogenetic techniques with genomic analysis, researchers can effectively investigate chromosomal abnormalities related to diseases, particularly...
              09-26-2023, 06:26 AM
            • seqadmin
              How RNA-Seq is Transforming Cancer Studies
              by seqadmin



              Cancer research has been transformed through numerous molecular techniques, with RNA sequencing (RNA-seq) playing a crucial role in understanding the complexity of the disease. Maša Ivin, Ph.D., Scientific Writer at Lexogen, and Yvonne Goepel Ph.D., Product Manager at Lexogen, remarked that “The high-throughput nature of RNA-seq allows for rapid profiling and deep exploration of the transcriptome.” They emphasized its indispensable role in cancer research, aiding in biomarker...
              09-07-2023, 11:15 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 09:38 AM
            0 responses
            9 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 09-27-2023, 06:57 AM
            0 responses
            11 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 09-26-2023, 07:53 AM
            1 response
            23 views
            0 likes
            Last Post seed_phrase_metal_storage  
            Started by seqadmin, 09-25-2023, 07:42 AM
            0 responses
            17 views
            0 likes
            Last Post seqadmin  
            Working...
            X