Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ElMichael
    Member
    • Jun 2009
    • 31

    blastall output: NCBI vs command line

    Sorry for basic question. I wonder is there any options in blastall I can use in order to get output in the same format as we get it using blast online on the NCBI website?
    Specifically, I want to get "Sequences producing significant alignments:" line that contains columns: Accession Description Max score Total score Query coverage E value Max ident


    But using blastall from command line
    e.g. blastall -p blastx -i input.fa -d /blast/db/nr -a 4 -b 5 -v 5 -e 1e-20 -o output.file
    I get only Accession Description Score (bits) E value columns.


    However, I want also to get Query coverage and Max ident columns.
    I didn't find solution in the blastall manual. Perhaps, it depends on the parameter -m, but there are many options...
    Thanks in advance!

    UPD: -m 9 (tabular with comment lines (post-processed, sorted) view) produces almost what I want, but it gives only an Accession ID without description. And something like gi|66734174|gb|AAY53484.1| isn't very helpful.
    Last edited by ElMichael; 03-09-2011, 02:26 PM.
  • maubp
    Peter (Biopython etc)
    • Jul 2009
    • 1544

    #2
    Originally posted by ElMichael View Post
    Sorry for basic question. I wonder is there any options in blastall I can use in order to get output in the same format as we get it using blast online on the NCBI website?
    The NCBI website is now using BLAST+ rather than 'legacy' BLAST. So one thing to do would be to switch from using 'legacy' blastall binary to the blastx binary. Note that with BLAST+ you can request lots of extra columns in the tabular output - that may cover what you want.

    Comment

    • colindaven
      Senior Member
      • Oct 2008
      • 417

      #3
      I haven't tried Blast+ yet, but in the past we have used a combination of
      blastx -m 8
      and blastx (without the -m parameter)
      to get coverage and protein hit names.

      Comment

      • colindaven
        Senior Member
        • Oct 2008
        • 417

        #4
        I haven't tried Blast+ yet, but in the past we have used a combination of
        blastx -m 8
        and blastx (without the -m parameter)
        to get coverage and protein hit names.

        Comment

        • ElMichael
          Member
          • Jun 2009
          • 31

          #5
          maubp, colindaven, thanks for your advice!
          I tried the blast+, but, unfortunately, the number of supported format specifiers doesn't include Description of subject (I wonder why?!) and Query coverage (though it could be calculated, but again why?!).
          I think, I have to use combination of two blastx runs as colindaven suggested.
          (Though still hope that there is some unknown to me magic option that produces required format).

          Comment

          • maubp
            Peter (Biopython etc)
            • Jul 2009
            • 1544

            #6
            Originally posted by ElMichael View Post
            maubp, colindaven, thanks for your advice!
            I tried the blast+, but, unfortunately, the number of supported format specifiers doesn't include Description of subject (I wonder why?!) and Query coverage (though it could be calculated, but again why?!).
            I'd like to be able to have query length and subject length as output columns (which then makes either percentage coverage easily calculated).
            Originally posted by ElMichael View Post
            I think, I have to use combination of two blastx runs as colindaven suggested.
            (Though still hope that there is some unknown to me magic option that produces required format).
            You don't have to do that, run BLAST+ once with ASN.1 output, then use blast_formatter to turn this into any of the output formats (text, html, xml, tabular).

            Comment

            • kmcarr
              Senior Member
              • May 2008
              • 1181

              #7
              This sounds like a job for Bio::SearchIO. However you have to be very comfortable already with BioPerl.

              Comment

              • ElMichael
                Member
                • Jun 2009
                • 31

                #8
                Originally posted by maubp View Post
                You don't have to do that, run BLAST+ once with ASN.1 output, then use blast_formatter to turn this into any of the output formats (text, html, xml, tabular).
                Thanks for the hint.

                kmcarr, that works terrific! Exactly, what I wanted. Thank you.

                Comment

                • yifangt
                  Member
                  • Feb 2011
                  • 61

                  #9
                  follow up blast+

                  Hello,
                  I met similar case to blast. Not familiar with the blast+ though. Anyway, I tried:
                  Code:
                  blastall -p blastx -i all-EST-cleaned.fasta -d my-db -m 9 -B 3 -b 10  -o blast-output.txt
                  and I got the result,
                  Code:
                  # Fields: Query id, Subject id, % identity, alignment length, mismatches, gap openings, q. start, q. end, s. start, s. end, e-value, bit score
                  1EB_RP_001_2009-03-27_0116=1EB_RP_001_A07_26MAR2009_032.seq	sp|Q6IBW4|CNDH2_HUMAN	34.69	49	32	0	360	214	258	306	1.0	32.7
                  1EB_RP_001_2009-03-27_0116=1EB_RP_001_A07_26MAR2009_032.seq	sp|Q5T655|CC147_HUMAN	20.73	82	65	0	260	15	39	120	1.3	32.3
                  1EB_RP_001_2009-03-27_0116=1EB_RP_001_A07_26MAR2009_032.seq	sp|Q9BRQ6|CHCH6_HUMAN	29.87	77	43	2	420	223	63	139	2.9	31.2
                  1EB_RP_001_2009-03-27_0116=1EB_RP_001_A07_26MAR2009_032.seq	sp|Q9Y3L3|3BP1_HUMAN	35.48	62	39	2	414	232	7	62	2.9	31.2
                  1EB_RP_001_2009-03-27_0116=1EB_RP_001_A07_26MAR2009_032.seq	sp|Q9LEM8|NAC2_CHLRE	38.46	39	24	0	387	271	1313	1351	3.8	30.8
                  1EB_RP_001_2009-03-27_0116=1EB_RP_001_A07_26MAR2009_032.seq	sp|P33424|POLN_HEVPA	39.34	61	37	2	423	241	1034	1090	5.0	30.4
                  1EB_RP_001_2009-03-27_0116=1EB_RP_001_A07_26MAR2009_032.seq	sp|Q81862|POLN_HEVCH	39.34	61	37	2	423	241	1034	1090	5.0	30.4
                  1EB_RP_001_2009-03-27_0116=1EB_RP_001_A07_26MAR2009_032.seq	sp|Q9UKP4|ATS7_HUMAN	39.47	38	23	0	423	310	1022	1059	5.0	30.4
                  1EB_RP_001_2009-03-27_0116=1EB_RP_001_A07_26MAR2009_032.seq	sp|Q2PC93|SSPO_CHICK	39.47	38	22	1	286	176	4073	4110	8.4	29.6
                  Now,
                  1) how can I add the annotation to the end for each subject entry,
                  2) how to reformat the subject entries as html link to NCBI if not using 1)?
                  Thanks!

                  Comment

                  Latest Articles

                  Collapse

                  • SEQadmin2
                    From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                    by SEQadmin2


                    Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                    The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                    ...
                    06-02-2026, 10:05 AM
                  • SEQadmin2
                    Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                    by SEQadmin2


                    With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                    Introduction

                    Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                    05-22-2026, 06:42 AM
                  • SEQadmin2
                    Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                    by SEQadmin2

                    Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                    Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                    05-06-2026, 09:04 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, 06-02-2026, 12:03 PM
                  0 responses
                  19 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-02-2026, 11:40 AM
                  0 responses
                  14 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 05-28-2026, 11:40 AM
                  0 responses
                  29 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 05-26-2026, 10:12 AM
                  0 responses
                  31 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...