Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • BLAST+ vs BLASTALL (legacy BLAST)

    Hi all,

    I have done a comparison of blastn as implemented in BLAST+ 2.2.25 (latest version) and BLASTALL (legacy BLAST) and observed non-trivial discrepancies in the results. In summary, BLAST+ gives more hits of the query to the subject/database, with lower (better) E-values. BLAST+ also often generates best hits (often with better E-values) to different sequences in the subject/database, compared to BLASTALL.

    Of concern also is the fact that the default and .tsv outputs obtained with BLASTALL, blastn also show differences in the number of hits. No such discrepancy was seen in the BLAST+ results, regardless of which output format was specified.

    If anyone has had similar observations or feedback on my observations/analyses, they would be much appreciated.





    Below are examples of equivalent command lines I used for the different BLAST versions.

    The command lines used for BLAST+, blastn were:

    blastn -task blastn -db database -query query.fa -evalue 0.00001 -dust no -num_descriptions 1 -num_alignments 1 -num_threads 8 –out output.blastn (default output)

    blastn -task blastn -db database -query query.fa -evalue 0.00001 -dust no -num_descriptions 1 -num_alignments 1 -num_threads 8 -outfmt 10 qseqid qacc qlen qframe qstart qend qseq sseqid sacc slen sframe sstart send sseq pident nident length mismatch positive ppos gapopen gaps evalue bitscore score –out output.blastn.csv (csv output)


    The command lines used for BLASTALL, blastn were:

    blastall -p blastn -i query.fa -d database -e 0.00001 -v 1 -b 1 -F F -o output.blastn -a 8 (default output)

    blastall -p blastn -i query.fa -d database -e 0.00001 -v 1 -b 1 -F F -m 8 –o output.blastn.tsv -a 2 (tsv output)


    Thanks!

  • #2
    What version numbers? That can make a difference.

    I don't use -num_descriptions and -num_alignments having found them behaving oddly (something at least partially addressed in a recent BLAST+ release). Have you tried with -max_target_seqs instead?

    Comment


    • #3
      I don't think you can look at the evalues and say they are lower therefore better, or returning more or fewer hits. The statistics aren't comparable without calibrating the Karlin-Altschul parameters. I am suspicious of blast+, because it is so fast I suspect that they tweaked the hash word size parameters in favor of speed rather than accuracy. You might want to compare the the actual parameters that are used for example, look at what parameters blastall runs blastn with then compare them with blast+, which is the equivalent of blastn. There is a way to get them to print the actual parameters, not just the parameters of the wrapper. My understanding is that there isn't much difference in the two but mostly if there was a difference it was the parametrization that the wrappers used.

      Comment


      • #4
        Hi,

        Thanks maubp and rskr for your feedback.

        I am using blastn from blastall 2.2.23 and blastn from BLAST+ 2.2.25.

        Perhaps if I specify for both to use the same hash word size, that might be a more equivalent comparison. Note that I have specified for both to have the dust filters turned OFF.

        I'll try -max_target_seqs in BLAST+. Do you know what the equivalent parameter in BLASTALL is?

        I specified -num_descriptions and -num_alignments for BLAST+ blastn as the legacy_blast.pl returned them as the equivalent of -b and -v in BLASTALL blastn.

        If anyone can let me know how to get both applications to print out all the actual default parameters they used, that'd be great.


        Cheers!

        Comment


        • #5
          The difference in the number of hits between the default and csv formats is that the -b and -v parameters are only followed for the default format. In the csv format, the -b and -v parameters are ignored.

          In BLAST+ this was remedied by the introduction of the -max_target_seqs parameter. The documentation suggests that for the default format, the -num_descriptions and -num_alignments options should be used but for XML and tabular output, the -max_target_seqs options should be used instead.

          As far as seeing different results between old and new BLASTs, have you figured out what type of sequences lead to different results?

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Quality Control Essentials for Next-Generation Sequencing Workflows
            by seqadmin




            Like all molecular biology applications, next-generation sequencing (NGS) workflows require diligent quality control (QC) measures to ensure accurate and reproducible results. Proper QC begins at nucleic acid extraction and continues all the way through to data analysis. This article outlines the key QC steps in an NGS workflow, along with the commonly used tools and techniques.

            Nucleic Acid Quality Control
            Preparing for NGS starts with isolating the...
            02-10-2025, 01:58 PM
          • seqadmin
            An Introduction to the Technologies Transforming Precision Medicine
            by seqadmin


            In recent years, precision medicine has become a major focus for researchers and healthcare professionals. This approach offers personalized treatment and wellness plans by utilizing insights from each person's unique biology and lifestyle to deliver more effective care. Its advancement relies on innovative technologies that enable a deeper understanding of individual variability. In a joint documentary with our colleagues at Biocompare, we examined the foundational principles of precision...
            01-27-2025, 07:46 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 02-07-2025, 09:30 AM
          0 responses
          65 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 02-05-2025, 10:34 AM
          0 responses
          101 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 02-03-2025, 09:07 AM
          0 responses
          81 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 01-31-2025, 08:31 AM
          0 responses
          45 views
          0 likes
          Last Post seqadmin  
          Working...
          X