Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • tonybert
    Member
    • Aug 2012
    • 38

    GNU parallel + usearch piping

    Greetings, I am wondering if anyone knows how to pipe sequence data into usearch. I am trying to use GNU parallel to break-up up and distribute multiple smaller ublast jobs over our small server. I can do this with regular blast, but i get fatal errors when i try it with usearch-ublast.

    cat CAM_SMPL_001754_RMRNA_derep.fasta | parallel --block 100k --recstart '>' --pipe /public2/Tony/usearch -db /public2/Tony/CAMERA/RefSeqMicrobial/microbial.nonredundant.all.udb -top_hits_only -threads 16 -blast6out DATA

    Any suggestions would be helpful. thanks.
  • tange
    Junior Member
    • Feb 2013
    • 7

    #2
    Given: usearch -cluster_fast seqs.fasta -id 0.9 -centroids nr.fasta

    You can do:

    cat CAM_SMPL_001754_RMRNA_derep.fasta | parallel --block 100k --recstart '>' --pipe "cat > {#}; usearch -cluster_fast {#} -id 0.9 -centroids {#}.out; cat {#}.out; rm {#} {#}.out"

    Comment

    • tonybert
      Member
      • Aug 2012
      • 38

      #3
      Yes, this works for cluster_fast, but its blast I am really after. thanks,

      Comment

      • tange
        Junior Member
        • Feb 2013
        • 7

        #4
        I am no expert in usearch, but if you show the command line you would run to do it without GNU Parallel, then I might be able to help you parallelize it.

        Comment

        • tonybert
          Member
          • Aug 2012
          • 38

          #5
          thanks, below is the usearch command I would like to pipe;

          usearch -ublast ./454reads.fa -db ./RefSeqmicrobes.udb -evalue 1e-5 -top_hits_only -blast6out ./454reads_refseq_results

          Comment

          • tange
            Junior Member
            • Feb 2013
            • 7

            #6
            Extremely similar to the cluster_fast command:

            cat CAM_SMPL_001754_RMRNA_derep.fasta | parallel --block 100k --recstart '>' --pipe "cat > {#}; usearch -ublast ./{#} -db ./RefSeqmicrobes.udb -evalue 1e-5 -top_hits_only -blast6out ./{#}.out; cat {#}.out; rm {#} {#}.out"

            Comment

            • GisleVestergaard
              Junior Member
              • Sep 2014
              • 2

              #7
              Usearch has annoying default stdout output

              Originally posted by tange View Post
              Given: usearch -cluster_fast seqs.fasta -id 0.9 -centroids nr.fasta

              You can do:

              cat CAM_SMPL_001754_RMRNA_derep.fasta | parallel --block 100k --recstart '>' --pipe "cat > {#}; usearch -cluster_fast {#} -id 0.9 -centroids {#}.out; cat {#}.out; rm {#} {#}.out"
              This works very well, except for the fact that usearch (even using -quiet) will print 6 lines to stdout!
              usearch v7.0.1090_i86linux32, 4.0Gb RAM (32.5Gb total), 8 cores
              (C) Copyright 2013 Robert C. Edgar, all rights reserved.


              Licensed to: [email protected]

              The best solution I have found is to add:
              grep -E "^>|^[A,C,G,T]" > tyt

              Comment

              • tange
                Junior Member
                • Feb 2013
                • 7

                #8
                Originally posted by GisleVestergaard View Post
                This works very well, except for the fact that usearch (even using -quiet) will print 6 lines to stdout!
                usearch v7.0.1090_i86linux32, 4.0Gb RAM (32.5Gb total), 8 cores
                (C) Copyright 2013 Robert C. Edgar, all rights reserved.


                Licensed to: [email protected]

                The best solution I have found is to add:
                grep -E "^>|^[A,C,G,T]" > tyt
                Would this work with GNU Parallel 20140822:

                parallel --pipepart -a CAM_SMPL_001754_RMRNA_derep.fasta --block 100k --recstart '>' --cat "usearch -cluster_fast {} -id 0.9 -centroids {#}.out; tail -n +7 {#}.out; rm {#}.out"

                Comment

                • GisleVestergaard
                  Junior Member
                  • Sep 2014
                  • 2

                  #9
                  Originally posted by tange View Post
                  Would this work with GNU Parallel 20140822:

                  parallel --pipepart -a CAM_SMPL_001754_RMRNA_derep.fasta --block 100k --recstart '>' --cat "usearch -cluster_fast {} -id 0.9 -centroids {#}.out; tail -n +7 {#}.out; rm {#}.out"
                  Yes, this works and is faster than sed. Thanks!

                  Comment

                  Latest Articles

                  Collapse

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, 06-05-2026, 10:09 AM
                  0 responses
                  12 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-04-2026, 08:59 AM
                  0 responses
                  24 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-02-2026, 12:03 PM
                  0 responses
                  28 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-02-2026, 11:40 AM
                  0 responses
                  22 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...