Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Blast2GO Beginner's Question

    Hi everybody, I'm new in NGS. I used the flow STAR/Cufflinks/Cuffcompare and now i need annotate my transcripts, and I decided to use Blast2GO because it seems more intuitive.
    But not so totaly intuitive for a total beginner like me, and I don't know what fashion of blast I should perform. The basic version of program allow me to use this methods:

    QBlast@NCBI. NCBI o ers a public service that allows searching molecular sequence
    databases with the BLAST algorithm. The main advantages of making use of this service
    are its versatility and that no database maintenance is required. Therefore by selecting
    this option at Blast2GO no additional installations have to be done.

    Remote BLAST. Blast2GO will download the latest BLAST+ executable form NCBI and
    will use it to query NR or other databases remotely.

    Local BLAST against own database. It is possible to use BLAST+ excuteble to query a
    local/own database.

    WWW-BLAST. Alternatively, BLAST can be done locally against a custom database. For
    this, you need to place a copy of your FASTA formatted custom DB plus a WWW-BLAST
    installation on a local BLAST server and indicate Blast2GO their location.

    My fasta have 16450 sequences, and I want to use database NCBI NR Full, I have a i7 3770 8gb ram computer.

    So the question is: with this resources what is the most safe and easy way to Blast ?

  • #2
    Do this locally. Download the nr database and use BLASTX against it.

    BUT, it is MUCH faster if you use mpiBLAST on a cluster.

    The command for mpiBLAST (with the correct flags for B2GO) is something like:

    mpiblast -p blastx -d nr -i input.fa -v 20 -b 20 -I T -e 0.001 -m 7 -o output.xml

    Comment


    • #3
      Originally posted by cement_head View Post
      Do this locally. Download the nr database and use BLASTX against it.

      BUT, it is MUCH faster if you use mpiBLAST on a cluster.

      The command for mpiBLAST (with the correct flags for B2GO) is something like:

      mpiblast -p blastx -d nr -i input.fa -v 20 -b 20 -I T -e 0.001 -m 7 -o output.xml
      mpiBLAST is much faster even in this pc alone ?

      Comment


      • #4
        No, MPI-BLAST only makes sense on a cluster (if you have access to one). Local multithreading BLAST will be faster than local MPI multiprocessing, and more memory-efficient.

        You might run into trouble with only 8 GB RAM if you want to BLAST the complete nr locally, give it a try but you may get out-of-memory problems. i7-3770 would be 4 cores with hyperthreading, so be prepared for several days of BLASTing...

        Comment


        • #5
          Originally posted by sarvidsson View Post
          No, MPI-BLAST only makes sense on a cluster (if you have access to one). Local multithreading BLAST will be faster than local MPI multiprocessing, and more memory-efficient.

          You might run into trouble with only 8 GB RAM if you want to BLAST the complete nr locally, give it a try but you may get out-of-memory problems. i7-3770 would be 4 cores with hyperthreading, so be prepared for several days of BLASTing...
          Yes, correct - only if you have (access to) a cluster.

          Comment


          • #6
            Blasting against nr is not easy. Even with 4 threads, to blast 16,000 sequences will take around 4,000 minutes, or 66 days. Performing GO assignment is also not easy. Importing 16,000 blastx results into the free version of Blast2GO, and then doing GO assignments, will take many days.

            Comment


            • #7
              Sorry, my math was all wrong on my last post. Let me try again.

              In reality, it takes at least 5 minutes for blastx to align one transcript to nr. For 16,000 sequences, with 4 threads, that is (16,000x5)/4 = 20,000 minutes, or 13.8 days. Then if you want to get GOs by importing into the blast2GO free version, that takes several more days at least.

              Comment


              • #8
                @Will Nelson: curious if you have a roughly equivalent spec computer as the OP. Did you actually time a search?

                Comment


                • #9
                  Well I'm having a hard time with this, Blast2GO-basic remotely blasting just take too long for each sequence, so I got more speed trying it locally with Blast+ Blastx and importing the output xml on Blast2GO for subsequently steps. But Will Nelson is right, is impraticable do this on this computer. Our lab is about to buy a server with 128gb RAM, until then I wanna be more experienced with this, so I made a 100 sequences sample.

                  So I got this repeatedly when running the Blastx:

                  CFastaReader: Bad gap size at line ***
                  CFastaReader: Problem parsing gap mods at line ***

                  When "***" are line numbers, this lines matches with sequence id lines, that use this format:

                  >?_GroupUn999_2_939_+

                  What in this format is generating that error ?

                  Comment


                  • #10
                    What format are your sequences in?

                    That error seems to indicate that there may be a problem with your fasta file. Can you try to replace the "?_" at the beginning of the header? Looks like that may be causing a problem.

                    Comment


                    • #11
                      Originally posted by Romualdo View Post
                      Well I'm having a hard time with this, Blast2GO-basic remotely blasting just take too long for each sequence, so I got more speed trying it locally with Blast+ Blastx and importing the output xml on Blast2GO for subsequently steps. But Will Nelson is right, is impraticable do this on this computer. Our lab is about to buy a server with 128gb RAM, until then I wanna be more experienced with this, so I made a 100 sequences sample.

                      So I got this repeatedly when running the Blastx:

                      CFastaReader: Bad gap size at line ***
                      CFastaReader: Problem parsing gap mods at line ***

                      When "***" are line numbers, this lines matches with sequence id lines, that use this format:

                      >?_GroupUn999_2_939_+

                      What in this format is generating that error ?
                      Make absolutely sure that you buy ECC RAM. Anything less and you will have major problems

                      Comment

                      Latest Articles

                      Collapse

                      • seqadmin
                        Essential Discoveries and Tools in Epitranscriptomics
                        by seqadmin




                        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                        04-22-2024, 07:01 AM
                      • seqadmin
                        Current Approaches to Protein Sequencing
                        by seqadmin


                        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                        04-04-2024, 04:25 PM

                      ad_right_rmr

                      Collapse

                      News

                      Collapse

                      Topics Statistics Last Post
                      Started by seqadmin, Yesterday, 11:49 AM
                      0 responses
                      13 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-24-2024, 08:47 AM
                      0 responses
                      16 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-11-2024, 12:08 PM
                      0 responses
                      61 views
                      0 likes
                      Last Post seqadmin  
                      Started by seqadmin, 04-10-2024, 10:19 PM
                      0 responses
                      60 views
                      0 likes
                      Last Post seqadmin  
                      Working...
                      X