Unconfigured Ad

**tange** · 03-20-2014, 06:08 AM

Given: usearch -cluster_fast seqs.fasta -id 0.9 -centroids nr.fasta

You can do:

cat CAM_SMPL_001754_RMRNA_derep.fasta | parallel --block 100k --recstart '>' --pipe "cat > {#}; usearch -cluster_fast {#} -id 0.9 -centroids {#}.out; cat {#}.out; rm {#} {#}.out"

**tonybert** · 03-20-2014, 12:42 PM

Yes, this works for cluster_fast, but its blast I am really after. thanks,

**tange** · 03-21-2014, 02:52 AM

I am no expert in usearch, but if you show the command line you would run to do it without GNU Parallel, then I might be able to help you parallelize it.

**tonybert** · 03-23-2014, 04:33 PM

thanks, below is the usearch command I would like to pipe;

usearch -ublast ./454reads.fa -db ./RefSeqmicrobes.udb -evalue 1e-5 -top_hits_only -blast6out ./454reads_refseq_results

**tange** · 03-23-2014, 11:27 PM

Extremely similar to the cluster_fast command:

cat CAM_SMPL_001754_RMRNA_derep.fasta | parallel --block 100k --recstart '>' --pipe "cat > {#}; usearch -ublast ./{#} -db ./RefSeqmicrobes.udb -evalue 1e-5 -top_hits_only -blast6out ./{#}.out; cat {#}.out; rm {#} {#}.out"

**GisleVestergaard** · 09-19-2014, 02:55 AM

Usearch has annoying default stdout output

Originally posted by tange View Post

Given: usearch -cluster_fast seqs.fasta -id 0.9 -centroids nr.fasta

You can do:

cat CAM_SMPL_001754_RMRNA_derep.fasta | parallel --block 100k --recstart '>' --pipe "cat > {#}; usearch -cluster_fast {#} -id 0.9 -centroids {#}.out; cat {#}.out; rm {#} {#}.out"

This works very well, except for the fact that usearch (even using -quiet) will print 6 lines to stdout!
usearch v7.0.1090_i86linux32, 4.0Gb RAM (32.5Gb total), 8 cores
(C) Copyright 2013 Robert C. Edgar, all rights reserved.

USEARCH

http://drive5.com/usearch

Licensed to: [email protected]

The best solution I have found is to add:
grep -E "^>|^[A,C,G,T]" > tyt

**tange** · 09-20-2014, 06:50 AM

Originally posted by GisleVestergaard View Post

This works very well, except for the fact that usearch (even using -quiet) will print 6 lines to stdout!
usearch v7.0.1090_i86linux32, 4.0Gb RAM (32.5Gb total), 8 cores
(C) Copyright 2013 Robert C. Edgar, all rights reserved.

USEARCH

http://drive5.com/usearch

Licensed to: [email protected]

The best solution I have found is to add:
grep -E "^>|^[A,C,G,T]" > tyt

Would this work with GNU Parallel 20140822:

parallel --pipepart -a CAM_SMPL_001754_RMRNA_derep.fasta --block 100k --recstart '>' --cat "usearch -cluster_fast {} -id 0.9 -centroids {#}.out; tail -n +7 {#}.out; rm {#}.out"

**GisleVestergaard** · 09-21-2014, 10:33 PM

Originally posted by tange View Post

Would this work with GNU Parallel 20140822:

parallel --pipepart -a CAM_SMPL_001754_RMRNA_derep.fasta --block 100k --recstart '>' --cat "usearch -cluster_fast {} -id 0.9 -centroids {#}.out; tail -n +7 {#}.out; rm {#}.out"

Yes, this works and is faster than sed. Thanks!

Topics	Statistics	Last Post
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 20 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 38 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 44 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 49 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM

Unconfigured Ad

GNU parallel + usearch piping

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News