Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • tonybert
    Member
    • Aug 2012
    • 38

    retrieving <Hit_def> information from XML output

    Greetings, I am trying to retrieve information regarding the putative taxonomic identifications of 16S/18S rRNA genes retrieved from a HiSeq Illumina run using BLASTN (from the blast+ package). Thus far i have been relying on biopython to parse the data. I'm able to retrieve information regarding the e-values to each query, alignment lengths and such for all of the hits using commands like the ones below.

    ###############################################
    >>>from Bio.Blast import NCBIXML
    >>>blast = NCBIXML.parse(open('16SxmlResults', 'rU'))
    >>>for record in blast:
    >>> print record.alignments[0].hsps[0].score
    ###############################################

    The above prints all the high-scoring pair bit scores to standard output.
    However, the piece of information i can't seem to access is located in the <Hit_def>. Looks like this;

    <Hit_def>JR951091.270.2233 Bacteria;Proteobacteria;Alphaproteobacteria;Rickettsiales;mitochondria;Pisum sativum (pea)

    I have looked into the biopython Bio.Blast.Record documention, as well as the tutorial, and can't seem to find any mixes/matches of how to retrieve this information. As well, I have also tried using elementtree to parse the data. This works, but i'm having a hard time "looping" through the whole file (there are ~4000 entries.

    If anyone has any suggestions, or can provide some guidance i would sincerely appreciate it. Thanks,

    -Tony
  • maubp
    Peter (Biopython etc)
    • Jul 2009
    • 1544

    #2
    You want the alignment's hit_def attribute, e.g.

    Code:
    from Bio.Blast import NCBIXML
    blast = NCBIXML.parse(open('16SxmlResults', 'rU'))
    for record in blast:
        for align in record.alignments:
            for hsp in align.hsps:
                print hsp.score, align.hit_def
    Tip: Explore dir(x) and help(x) at the Python prompt where x is an unfamiliar class.

    Comment

    • a0909
      Junior Member
      • Nov 2014
      • 3

      #3
      I am very new to python, as the codes above are just printing, could you please tell me how to save this in a file(.csv or .txt).

      Thanks

      Comment

      • maubp
        Peter (Biopython etc)
        • Jul 2009
        • 1544

        #4
        Easy way: When you run BLAST+ rather than asking for XML output with
        Code:
        -outfmt 5
        ask for tabular output with
        Code:
        -outfmt 6
        (or ask for CSV if you prefer).

        Hard way: Convert the BLAST XML into tabular format using a script like https://github.com/peterjc/galaxy_bl..._to_tabular.py
        Last edited by maubp; 12-09-2014, 08:41 AM. Reason: formatting

        Comment

        • bernardo_bello
          Member
          • May 2012
          • 49

          #5
          Originally posted by maubp View Post
          You want the alignment's hit_def attribute, e.g.

          Code:
          from Bio.Blast import NCBIXML
          blast = NCBIXML.parse(open('16SxmlResults', 'rU'))
          for record in blast:
              for align in record.alignments:
                  for hsp in align.hsps:
                      print hsp.score, align.hit_def
          Tip: Explore dir(x) and help(x) at the Python prompt where x is an unfamiliar class.
          What does 'rU' refers to? a second input file?

          Comment

          • GenoMax
            Senior Member
            • Feb 2008
            • 7142

            #6
            Originally posted by bernardo_bello View Post
            What does 'rU' refers to? a second input file?
            r is open for reading. As for "U"

            Python is usually built with universal newlines support; supplying 'U' opens the file as a text file, but lines may be terminated by any of the following: the Unix end-of-line convention '\n', the Macintosh convention '\r', or the Windows convention '\r\n'. All of these external representations are seen as '\n' by the Python program.

            Comment

            Latest Articles

            Collapse

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, 06-09-2026, 11:58 AM
            0 responses
            22 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-05-2026, 10:09 AM
            0 responses
            29 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-04-2026, 08:59 AM
            0 responses
            39 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-02-2026, 12:03 PM
            0 responses
            61 views
            0 reactions
            Last Post SEQadmin2  
            Working...