Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • SDPA_Pet
    Senior Member
    • Apr 2013
    • 222

    BLAST convert xml to tabular format

    Hi, can anyone help me find a tool which can convert xml blast output format to tabular. It's better to support blast+.

    Thank you.

    Ben
  • lindenb
    Senior Member
    • Apr 2010
    • 143

    #2
    See https://www.biostars.org/p/7290/ Question: Tools Parsing Ncbi Blast -M 7 Xml Output Format?

    Comment

    • SDPA_Pet
      Senior Member
      • Apr 2013
      • 222

      #3
      Any other tools with GUI version? I am not familiar with perl

      Comment

      • maubp
        Peter (Biopython etc)
        • Jul 2009
        • 1544

        #4
        Here's my Python script https://github.com/peterjc/galaxy_bl..._to_tabular.py with Galaxy wrapper https://github.com/peterjc/galaxy_bl...to_tabular.xml - does that count as a GUI?

        Comment

        • SDPA_Pet
          Senior Member
          • Apr 2013
          • 222

          #5
          Originally posted by maubp View Post
          Here's my Python script https://github.com/peterjc/galaxy_bl..._to_tabular.py with Galaxy wrapper https://github.com/peterjc/galaxy_bl...to_tabular.xml - does that count as a GUI?
          Hi thanks. I download your scripts and save it as blastxml_to_tabular.py. How should I use it.

          should I type python blastxml_to_tabular.py -i nitrogen.xml -o nitrogen.txt.

          I am sorry about this kind of newbie questions. I am a biological student. I don't have any background of computer sciences.

          I attached the blastxml_to_tabular.py. Please have a quick look and see the format is correct or not.
          Attached Files

          Comment

          • maubp
            Peter (Biopython etc)
            • Jul 2009
            • 1544

            #6
            It looks like you got the Galaxy XML file by mistake. This https://github.com/peterjc/galaxy_bl..._to_tabular.py is the pretty Human Readable page for the Python script, this https://raw.githubusercontent.com/pe..._to_tabular.py link will download the script itself ready to run.

            Are you using Linux, Mac OS X, or Windows? If Windows you will also need to install Python and running it is a little more complicated. Linux and the Mac should already have a suitable version of Python installed.

            Comment

            • SDPA_Pet
              Senior Member
              • Apr 2013
              • 222

              #7
              I tried on windows but it says some errors.

              Can you give me an example of using this script, if I want to convert 12 column txt file.

              Comment

              • maubp
                Peter (Biopython etc)
                • Jul 2009
                • 1544

                #8
                What errors? If you copy/paste the message here it would be far easier to guide you - but I guess part of the problem is you are using -i which is not expected.

                Assuming you installed Python 2.7, you would run something like this - the default is the standard 12 column output:

                Code:
                C:\Python27\python blastxml_to_tabular.py -o nitrogen.txt nitrogen.xml

                Comment

                • SDPA_Pet
                  Senior Member
                  • Apr 2013
                  • 222

                  #9
                  Hi, I want to convert xml to txt?

                  should it be python blastxml_to_tabular.py nitrogen.xml -o nitrogen.txt

                  do I need to type an "-i" in front of "nitrogen.xml"

                  The error is invalid data format

                  Comment

                  • maubp
                    Peter (Biopython etc)
                    • Jul 2009
                    • 1544

                    #10
                    In this case, either is fine:
                    Code:
                    python blastxml_to_tabular.py -o nitrogen.txt nitrogen.xml
                    Code:
                    python blastxml_to_tabular.py nitrogen.xml -o nitrogen.txt
                    Note that in general many command line tools are fussy about the exact order.

                    What does your file "nitrogen.xml" look like? Can you share it via http://gist.github.com or perhaps show the first ten lines here inside [ code ] and [ /code ] tags?

                    (The code tags are available via the forum's advanced editor view using the "#" icon.)

                    Comment

                    • SDPA_Pet
                      Senior Member
                      • Apr 2013
                      • 222

                      #11
                      BLAST-like file generated by MEGAN


                      Query=HZKDEPY02FP29T

                      >ref|YP_003433234.1| glutamine synthetase [Hydrogenobacter thermophilus TK-6]
                      ref|YP_005512249.1| glutamine synthetase, type I [Hydrogenobacter thermophilus TK-6]
                      ref|WP_012964213.1| glutamine synthetase [Hydrogenobacter thermophilus]
                      dbj|BAI70033.1| glutamine synthetase [Hydrogenobacter thermophilus TK-6]
                      gb|ADO45956.1| glutamine synthetase, type I [Hydrogenobacter thermophilus TK-6]
                      Length = 469

                      Score = 80.9 bits (198), Expect = 1e-13
                      Identities = 35/39 (89%), Positives = 39/39 (100%)
                      Frame = -2

                      Query: 161 PLTRERYGRDTRYVAQKAEQYLRQTGIGDTAYFGPEAEF 45
                      P+TRERYGRDTRY+AQKAEQYL+QTGIGDTAY+GPEAEF
                      Sbjct: 98 PITRERYGRDTRYIAQKAEQYLKQTGIGDTAYYGPEAEF 136

                      >ref|YP_003474070.1| glutamine synthetase [Thermocrinis albus DSM 14484]
                      ref|WP_012992349.1| glutamine synthetase [Thermocrinis albus]
                      gb|ADC89943.1| glutamine synthetase, type I [Thermocrinis albus DSM 14484]
                      Length = 469

                      Score = 80.9 bits (198), Expect = 1e-13
                      Identities = 35/39 (89%), Positives = 39/39 (100%)
                      Frame = -2

                      Query: 161 PLTRERYGRDTRYVAQKAEQYLRQTGIGDTAYFGPEAEF 45
                      P+TRERYGRDTRY+AQKAEQYL+QTGIGDTAY+GPEAEF
                      Sbjct: 98 PITRERYGRDTRYIAQKAEQYLKQTGIGDTAYYGPEAEF 136

                      >ref|YP_007499517.1| glutamine synthetase, type I [Hydrogenobaculum sp. HO]
                      ref|YP_007646578.1| glutamine synthetase, type I [Hydrogenobaculum sp. SN]
                      ref|WP_015418780.1| glutamine synthetase, type I [Hydrogenobaculum sp. HO]
                      gb|AEF18544.1| glutamine synthetase, type I [Hydrogenobaculum sp. 3684]
                      gb|AEG45832.1| glutamine synthetase, type I [Hydrogenobaculum sp. SHO]
                      gb|AGG14474.1| glutamine synthetase, type I [Hydrogenobaculum sp. HO]
                      gb|AGH92778.1| glutamine synthetase, type I [Hydrogenobaculum sp. SN]
                      Length = 469

                      Score = 80.1 bits (196), Expect = 2e-13
                      Identities = 35/39 (89%), Positives = 38/39 (97%)
                      Frame = -2

                      Query: 161 PLTRERYGRDTRYVAQKAEQYLRQTGIGDTAYFGPEAEF 45
                      P+TRERYGRDTRY+AQKAEQYL+QTGIGD AYFGPEAEF
                      Sbjct: 97 PITRERYGRDTRYIAQKAEQYLKQTGIGDVAYFGPEAEF 135

                      Comment

                      • maubp
                        Peter (Biopython etc)
                        • Jul 2009
                        • 1544

                        #12
                        BLAST XML output looks like this, and is designed for a computer to read:

                        Code:
                        <?xml version="1.0"?>
                        <!DOCTYPE BlastOutput PUBLIC "-//NCBI//NCBI BlastOutput/EN" "NCBI_BlastOutput.dtd">
                        <BlastOutput>
                          <BlastOutput_program>blastp</BlastOutput_program>
                          <BlastOutput_version>BLASTP 2.2.24+</BlastOutput_version>
                          <BlastOutput_reference>Stephen F. Altschul, Thomas L. Madden, Alejandro A. Sch&amp;auml;ffer, Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997), &quot;Gapped BLAST and PSI-BLAST: a new generation of protein database search programs&quot;, Nucleic Acids Res. 25:3389-3402.</BlastOutput_reference>
                          <BlastOutput_db>nr</BlastOutput_db>
                          <BlastOutput_query-ID>Query_1</BlastOutput_query-ID>
                          <BlastOutput_query-def>Sample</BlastOutput_query-def>
                          <BlastOutput_query-len>516</BlastOutput_query-len>
                        ...
                        Your results look like plain text BLAST output

                        Comment

                        • SDPA_Pet
                          Senior Member
                          • Apr 2013
                          • 222

                          #13
                          The one I posted earlier is extract from megan.

                          I have the original file but it didn't work either. Here it is

                          BLASTX 2.2.20 [Feb-08-2009]

                          Reference: Altschul, Stephen F., Thomas L. Madden, Alejandro A. Schaffer,
                          Jinghui Zhang, Zheng Zhang, Webb Miller, and David J. Lipman (1997),
                          "Gapped BLAST and PSI-BLAST: a new generation of protein database search
                          programs", Nucleic Acids Res. 25:3389-3402.
                          Query= HZKDEPY02G6265
                          (375 letters)

                          Database: /nfs/scratch/sdpapet/db/mpidatabase/NCBI05252013nr
                          25,805,290 sequences; 8,915,431,356 total letters



                          Score E
                          Sequences producing significant alignments: (bits) Value

                          ref|YP_003474070.1| glutamine synthetase [Thermocrinis albus DSM... 220 1e-55
                          ref|YP_003433234.1| glutamine synthetase [Hydrogenobacter thermo... 219 3e-55
                          ref|YP_002120801.1| glutamine synthetase, type I [Hydrogenobacul... 204 9e-51
                          ref|YP_007499517.1| glutamine synthetase, type I [Hydrogenobacul... 202 6e-50
                          ref|NP_213074.1| glutamine synthetase [Aquifex aeolicus VF5] >gi... 199 5e-49
                          ref|WP_008286412.1| glutamine synthetase [Hydrogenivirga sp. 128... 196 3e-48
                          ref|YP_002731353.1| glutamine synthetase, type I [Persephonella ... 187 1e-45
                          ref|YP_002728028.1| glutamine synthetase [Sulfurihydrogenibium a... 187 1e-45
                          ref|YP_001931244.1| glutamine synthetase, type I [Sulfurihydroge... 183 2e-44
                          ref|WP_007545780.1| glutamine synthetase, type I [Sulfurihydroge... 183 3e-44

                          >ref|YP_003474070.1| glutamine synthetase [Thermocrinis albus DSM 14484]
                          ref|WP_012992349.1| glutamine synthetase [Thermocrinis albus]
                          gb|ADC89943.1| glutamine synthetase, type I [Thermocrinis albus DSM 14484]
                          Length = 469

                          Score = 220 bits (561), Expect = 1e-55
                          Identities = 104/110 (94%), Positives = 107/110 (97%)
                          Frame = +2

                          Query: 26 KHGPALTAFTNPTINSYHRLVPGFEAPVRLAYSARNRSAAIRIPTYSQSPKAKRIEIRFP 205
                          KHGPALTAFTNPT+NSYHRLVPGFEAPVRLAYSARNRSAAIRIPTYSQSPKAKRIEIRFP
                          Sbjct: 303 KHGPALTAFTNPTVNSYHRLVPGFEAPVRLAYSARNRSAAIRIPTYSQSPKAKRIEIRFP 362

                          Query: 206 DPTCNPYLAFSAILMAAIDGVENKIHPGEPFDKDIYSLPPEELKDIPNCP 355
                          DPTCNPYLAFSAILMAAIDG+EN+IHPGEP DKDIYSLPPEELKDIP P
                          Sbjct: 363 DPTCNPYLAFSAILMAAIDGIENRIHPGEPLDKDIYSLPPEELKDIPQLP 412

                          Comment

                          • maubp
                            Peter (Biopython etc)
                            • Jul 2009
                            • 1544

                            #14
                            That file is also just plain text BLAST output.

                            Comment

                            • SDPA_Pet
                              Senior Member
                              • Apr 2013
                              • 222

                              #15
                              Oh, when I did the blast I use -m 7 and it says it is xml format. Is there any software to convert this to tabular format? I used the old BLAST not BLAST+

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                                Here are nine questions we think about, in roughly the order they matter, before...
                                06-18-2026, 07:11 AM
                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Yesterday, 11:10 AM
                              0 responses
                              7 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-17-2026, 06:09 AM
                              0 responses
                              42 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              104 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              125 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...