
There is a nice utility in CD-HIT (plot_len1.pl) which gives me a table with sequence frequencies for various length classes. So all the frequency information is in the .clstr file, but how do I get only the information out that I want... and how do I link that then back to the original sequences? Lets say I want to retrieve all sequences that occur from 10 to 19 times in my input dataset?