Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • flacchy
    Member
    • Apr 2013
    • 33

    Tracking blastall

    Hi,
    I just started my phd and I am working with a huge dataset (~7mil reads).
    I set blastall for nt into my biolinux shell and since it's going to take forever I wanted to ask for some help on how keep traks of the analysis.
    Using the less comand I can see what's on the output file but is there a way to get some numbers out of it? such as how many reads have been submitted already, and stuff like that.
    could someone help?

    ps.: this is the command I used:
    blastall -d 'nt' -p 'blastn' -i contigs.fa -o contigs.fa.blastn -e 1e-06 -b 10 -v 10 -a 4

    Thanks
  • rhinoceros
    Senior Member
    • Apr 2013
    • 372

    #2
    How many months/years you expect this query will take? You think you have enough hdd space for the output file? If it's impossible for you to run your query on some more powerful platform, at least split the input into smaller files..
    Last edited by rhinoceros; 05-08-2013, 01:26 AM.
    savetherhino.org

    Comment

    • flacchy
      Member
      • Apr 2013
      • 33

      #3
      We do have enough space for the output file, I know somebody tried this before and took 6 months, that's why I was wondering for a way to keep track...
      Do you know if there is a different way then clustering the data? or a free platform I could use?

      Comment

      • rhinoceros
        Senior Member
        • Apr 2013
        • 372

        #4
        If I were you, I'd run my blasts on Amazon EC2 or something similar. It's not that expensive..
        savetherhino.org

        Comment

        • maubp
          Peter (Biopython etc)
          • Jul 2009
          • 1544

          #5
          How may sequences in your contig FASTA file?

          Are your contigs from a transcriptome assembly, meaning each is not that long (typical genes)? Or genomic meaning some could be very large? Either way, try smaller batches of 100 or 1000 sequences at a time - that should let you estimate how long the whole assembly will take.

          Does your computer have enough RAM for the NT database?

          Does your computer have multiple CPU cores? Have you tried running BLAST with multiple threads and/or multiple copies of BLAST on separate query files?

          Are you using the plain text output? If so what will you do with it - parse it? Perhaps a more compact and computer friendly output might be wiser, like the tabular output?

          Comment

          • flacchy
            Member
            • Apr 2013
            • 33

            #6
            Thanks maubp... so

            The metagenome is been sequenced with Illumina and we know that the read length is in a range between 15 and 99 bp.

            Do you suggest using softwares such as CD-Hit to spilt the file into smaller batches?

            We installed the nt db into the NX machine and we have 8cores CPU, could you help me a little more on how run BLAST with multiple threads and/or multiple copies of BLAST on separate query files? (is there some link I can look at?)

            as a output I set a fasta file (I sow that on some workshops) so I told the program to give as output a file named contigs.fa.blastn

            Comment

            • rhinoceros
              Senior Member
              • Apr 2013
              • 372

              #7
              Originally posted by flacchy View Post
              The metagenome is been sequenced with Illumina and we know that the read length is in a range between 15 and 99 bp.

              Do you suggest using softwares such as CD-Hit to spilt the file into smaller batches?
              Why not assemble before doing anything else, or alternatively send the reads for blast to mg-rast or img/m or some other online pipeline? But really, you should assemble first. What do you hope to gain from blasting reads that are just 15 nt long?
              We installed the nt db into the NX machine and we have 8cores CPU, could you help me a little more on how run BLAST with multiple threads and/or multiple copies of BLAST on separate query files? (is there some link I can look at?)
              http://www.ncbi.nlm.nih.gov/books/NBK1762/ ..you had already set up 4 threads with the -a flag. In newer versions of blast -num_threads replaces this flag, and really, for speed gains you should be using the latest version..
              Last edited by rhinoceros; 05-08-2013, 02:52 AM.
              savetherhino.org

              Comment

              • maubp
                Peter (Biopython etc)
                • Jul 2009
                • 1544

                #8
                Originally posted by rhinoceros View Post
                Why not assemble before doing anything else, or alternatively send the reads for blast to mg-rast or img/m or some other online pipeline? But really, you should assemble first. What do you hope to gain from blasting reads that are just 15 nt long?

                http://www.ncbi.nlm.nih.gov/books/NBK1762/ ..you had already set up 4 threads with the -a flag. In newer versions of blasts -num_threads replaces this flag..
                I assumed from your question from the filename contigs.fa that you had already assembled the data. If not, you should do that first.

                Comment

                • flacchy
                  Member
                  • Apr 2013
                  • 33

                  #9
                  I assemble these reads with velvet, now I am trying to set metavelvet to get better contigs, since the contigs I obtained are still short (some of them 41nt)

                  at the same time we are running a search on the reads to look at what kind of 'organisms' expect from the data. Does it make sense?
                  Last edited by flacchy; 05-08-2013, 05:17 AM.

                  Comment

                  • GenoMax
                    Senior Member
                    • Feb 2008
                    • 7142

                    #10
                    Wouldn't it be preferable to use a resource like MG-RAST (http://metagenomics.anl.gov/) for this type of analysis? Assuming that the sample here is metagenomic, of course.

                    Comment

                    • flacchy
                      Member
                      • Apr 2013
                      • 33

                      #11
                      yes it is metagenome (specifically marine viromes), I'll have a look.. Thank you so much this was of great help!

                      Comment

                      • flacchy
                        Member
                        • Apr 2013
                        • 33

                        #12
                        If anyone is curious there is a script to keep track on blast (if you are dealing with huge data)

                        Comment

                        • kmcarr
                          Senior Member
                          • May 2008
                          • 1181

                          #13
                          Originally posted by flacchy View Post
                          yes it is metagenome (specifically marine viromes), I'll have a look.. Thank you so much this was of great help!
                          DO NOT use nt!! If your query sequences are from marine viruses don't search against the entire universe of DNA sequences.

                          One of the very first things you should do when setting up a BLAST experiment (yes, think of running BLAST as an in silico experiment) is choosing a database appropriate to your experimental system and objective. The nt database has DNA from every branch of the taxonomic tree and every species from aardvark to zyzzyva. I am hard pressed to think of a time when nt is the correct database to use. Construct a target database focused to the experiment and it will greatly speed up your BLAST.

                          Comment

                          Latest Articles

                          Collapse

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by SEQadmin2, 06-09-2026, 11:58 AM
                          0 responses
                          24 views
                          0 reactions
                          Last Post SEQadmin2  
                          Started by SEQadmin2, 06-05-2026, 10:09 AM
                          0 responses
                          29 views
                          0 reactions
                          Last Post SEQadmin2  
                          Started by SEQadmin2, 06-04-2026, 08:59 AM
                          0 responses
                          39 views
                          0 reactions
                          Last Post SEQadmin2  
                          Started by SEQadmin2, 06-02-2026, 12:03 PM
                          0 responses
                          61 views
                          0 reactions
                          Last Post SEQadmin2  
                          Working...