Anyone has experience on running cd-hit-para.pl in a SGE environment?
I have got a large number of sequences to be clustered. Running cd-hit is very slow even I used 20 cpus in our workstation. The cd-hit-para.pl can run cd-hit in parallel way by divide jobs into pieces and run them on a cluster. I have access to TACC (Texas Advance Computing Center) and they implement SGE. So anyone have done this before?
I have got a large number of sequences to be clustered. Running cd-hit is very slow even I used 20 cpus in our workstation. The cd-hit-para.pl can run cd-hit in parallel way by divide jobs into pieces and run them on a cluster. I have access to TACC (Texas Advance Computing Center) and they implement SGE. So anyone have done this before?