Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Blast2GO (b2g4pipe) output format

    Hello everyone,

    I'm using the command line version of Blast2GO and I was hoping that someone could help me with the output format.

    In the GUI version of Blast2GO you can get your output in a number of different formats (Genespring, etc.), but b2g4pipe seems to output the .annot format exclusively.

    Now I know I can just open my output file in the GUI version and save it in another format, but if I have hundreds of small annotations that I need in Genespring format so that method would be a bit of a pain. So my questions are:

    1) Is there a way to get b2g4pipe to output Genespring format annotations (I'm told that there isn't in the current version but I'd be happy to discover this was wrong)?
    2) Assuming there isn't, does anyone know of a command line script to convert .annot format to Genespring format?

    Thanks so much for your help

  • #2
    I think b2g4pipe can produce a normal Blast2GO .dat file, but they warn against this as the files are much larger. You could try doing that, then using the Blast2GO GUI to convert this to their "Genespring format".

    Comment


    • #3
      I often output in *.dat format but, yes, after a certain point (>200,000 sequences?) this bogs down.

      I am starting to turn away from B2Go. The single CPU mode of b2g4pipe is limiting and is often the slow part of the overall pipeline.

      Comment


      • #4
        Originally posted by westerman View Post
        I often output in *.dat format but, yes, after a certain point (>200,000 sequences?) this bogs down.

        I am starting to turn away from B2Go. The single CPU mode of b2g4pipe is limiting and is often the slow part of the overall pipeline.
        Interesting. Can you recommend a good alternative?

        Comment


        • #5
          Originally posted by westerman View Post
          I am starting to turn away from B2Go. The single CPU mode of b2g4pipe is limiting and is often the slow part of the overall pipeline.
          I don't recall finding b2g4pipe CPU limited - rather it was the (local) database queries. Maybe you've got a quicker local BLast2Go database server than us? In any case, for us b2g4pipe was much quicker than running the BLAST searches against the NR database (or InterPro scan).

          Comment


          • #6
            I've been trying to track down the slowdown for a couple of months although not very extensively. Usually I just want the results and generally do not have the time/resources to do much exploration into performance concerns. Note that I am talking about very large projects -- hundreds of thousands of contigs -- and generating a DAT file. Thus it becomes unwieldy to run test scenarios.

            b2g4pipe, unless I am mistaken, is a single-CPU program. Thus it will be processing one contig at a time. It will have to process all contigs before it generates a file. My sysadmin swears that our local database server is not overloaded. And indeed I can run multiple instances of b2g4pipe without any complaints on his end. Thus I suspect b2g4pipe, either in its code or in how it handles network traffic, to be the slow part.

            Since I can do them in parallel, I do the blast searches outside of b2g4pipe and then feed the blast-xml file into b2g4pipe. My blast searches take less time than b2g4pipe. I do not do interpro scan on a regular basic. That part seems to be even more slow.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-25-2024, 11:49 AM
            0 responses
            19 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-24-2024, 08:47 AM
            0 responses
            19 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            62 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            60 views
            0 likes
            Last Post seqadmin  
            Working...
            X