Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Romualdo
    Junior Member
    • Nov 2014
    • 6

    Blast2GO Beginner's Question

    Hi everybody, I'm new in NGS. I used the flow STAR/Cufflinks/Cuffcompare and now i need annotate my transcripts, and I decided to use Blast2GO because it seems more intuitive.
    But not so totaly intuitive for a total beginner like me, and I don't know what fashion of blast I should perform. The basic version of program allow me to use this methods:

    QBlast@NCBI. NCBI o ers a public service that allows searching molecular sequence
    databases with the BLAST algorithm. The main advantages of making use of this service
    are its versatility and that no database maintenance is required. Therefore by selecting
    this option at Blast2GO no additional installations have to be done.

    Remote BLAST. Blast2GO will download the latest BLAST+ executable form NCBI and
    will use it to query NR or other databases remotely.

    Local BLAST against own database. It is possible to use BLAST+ excuteble to query a
    local/own database.

    WWW-BLAST. Alternatively, BLAST can be done locally against a custom database. For
    this, you need to place a copy of your FASTA formatted custom DB plus a WWW-BLAST
    installation on a local BLAST server and indicate Blast2GO their location.

    My fasta have 16450 sequences, and I want to use database NCBI NR Full, I have a i7 3770 8gb ram computer.

    So the question is: with this resources what is the most safe and easy way to Blast ?
  • cement_head
    Senior Member
    • Mar 2012
    • 264

    #2
    Do this locally. Download the nr database and use BLASTX against it.

    BUT, it is MUCH faster if you use mpiBLAST on a cluster.

    The command for mpiBLAST (with the correct flags for B2GO) is something like:

    mpiblast -p blastx -d nr -i input.fa -v 20 -b 20 -I T -e 0.001 -m 7 -o output.xml

    Comment

    • Romualdo
      Junior Member
      • Nov 2014
      • 6

      #3
      Originally posted by cement_head View Post
      Do this locally. Download the nr database and use BLASTX against it.

      BUT, it is MUCH faster if you use mpiBLAST on a cluster.

      The command for mpiBLAST (with the correct flags for B2GO) is something like:

      mpiblast -p blastx -d nr -i input.fa -v 20 -b 20 -I T -e 0.001 -m 7 -o output.xml
      mpiBLAST is much faster even in this pc alone ?

      Comment

      • sarvidsson
        Senior Member
        • Jan 2015
        • 137

        #4
        No, MPI-BLAST only makes sense on a cluster (if you have access to one). Local multithreading BLAST will be faster than local MPI multiprocessing, and more memory-efficient.

        You might run into trouble with only 8 GB RAM if you want to BLAST the complete nr locally, give it a try but you may get out-of-memory problems. i7-3770 would be 4 cores with hyperthreading, so be prepared for several days of BLASTing...

        Comment

        • cement_head
          Senior Member
          • Mar 2012
          • 264

          #5
          Originally posted by sarvidsson View Post
          No, MPI-BLAST only makes sense on a cluster (if you have access to one). Local multithreading BLAST will be faster than local MPI multiprocessing, and more memory-efficient.

          You might run into trouble with only 8 GB RAM if you want to BLAST the complete nr locally, give it a try but you may get out-of-memory problems. i7-3770 would be 4 cores with hyperthreading, so be prepared for several days of BLASTing...
          Yes, correct - only if you have (access to) a cluster.

          Comment

          • Will Nelson
            Member
            • Nov 2010
            • 16

            #6
            Blasting against nr is not easy. Even with 4 threads, to blast 16,000 sequences will take around 4,000 minutes, or 66 days. Performing GO assignment is also not easy. Importing 16,000 blastx results into the free version of Blast2GO, and then doing GO assignments, will take many days.

            Comment

            • Will Nelson
              Member
              • Nov 2010
              • 16

              #7
              Sorry, my math was all wrong on my last post. Let me try again.

              In reality, it takes at least 5 minutes for blastx to align one transcript to nr. For 16,000 sequences, with 4 threads, that is (16,000x5)/4 = 20,000 minutes, or 13.8 days. Then if you want to get GOs by importing into the blast2GO free version, that takes several more days at least.

              Comment

              • GenoMax
                Senior Member
                • Feb 2008
                • 7142

                #8
                @Will Nelson: curious if you have a roughly equivalent spec computer as the OP. Did you actually time a search?

                Comment

                • Romualdo
                  Junior Member
                  • Nov 2014
                  • 6

                  #9
                  Well I'm having a hard time with this, Blast2GO-basic remotely blasting just take too long for each sequence, so I got more speed trying it locally with Blast+ Blastx and importing the output xml on Blast2GO for subsequently steps. But Will Nelson is right, is impraticable do this on this computer. Our lab is about to buy a server with 128gb RAM, until then I wanna be more experienced with this, so I made a 100 sequences sample.

                  So I got this repeatedly when running the Blastx:

                  CFastaReader: Bad gap size at line ***
                  CFastaReader: Problem parsing gap mods at line ***

                  When "***" are line numbers, this lines matches with sequence id lines, that use this format:

                  >?_GroupUn999_2_939_+

                  What in this format is generating that error ?

                  Comment

                  • GenoMax
                    Senior Member
                    • Feb 2008
                    • 7142

                    #10
                    What format are your sequences in?

                    That error seems to indicate that there may be a problem with your fasta file. Can you try to replace the "?_" at the beginning of the header? Looks like that may be causing a problem.

                    Comment

                    • cement_head
                      Senior Member
                      • Mar 2012
                      • 264

                      #11
                      Originally posted by Romualdo View Post
                      Well I'm having a hard time with this, Blast2GO-basic remotely blasting just take too long for each sequence, so I got more speed trying it locally with Blast+ Blastx and importing the output xml on Blast2GO for subsequently steps. But Will Nelson is right, is impraticable do this on this computer. Our lab is about to buy a server with 128gb RAM, until then I wanna be more experienced with this, so I made a 100 sequences sample.

                      So I got this repeatedly when running the Blastx:

                      CFastaReader: Bad gap size at line ***
                      CFastaReader: Problem parsing gap mods at line ***

                      When "***" are line numbers, this lines matches with sequence id lines, that use this format:

                      >?_GroupUn999_2_939_+

                      What in this format is generating that error ?
                      Make absolutely sure that you buy ECC RAM. Anything less and you will have major problems

                      Comment

                      Latest Articles

                      Collapse

                      • SEQadmin2
                        From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                        by SEQadmin2


                        Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                        The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                        ...
                        06-02-2026, 10:05 AM
                      • SEQadmin2
                        Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                        by SEQadmin2


                        With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                        Introduction

                        Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                        05-22-2026, 06:42 AM
                      • SEQadmin2
                        Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                        by SEQadmin2

                        Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                        Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                        05-06-2026, 09:04 AM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by SEQadmin2, Today, 08:59 AM
                      0 responses
                      8 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-02-2026, 12:03 PM
                      0 responses
                      21 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 06-02-2026, 11:40 AM
                      0 responses
                      15 views
                      0 reactions
                      Last Post SEQadmin2  
                      Started by SEQadmin2, 05-28-2026, 11:40 AM
                      0 responses
                      29 views
                      0 reactions
                      Last Post SEQadmin2  
                      Working...