Hi everyone,
I'm currently looking at ways to reduce redundancy in de novo transcriptomes of some closely related species with the goal of searching for orthologs for phylogenetics afterwards.
I'm interested in using CD-HIT-EST, but I'm unsure about was threshold of similarity is best to use. 95 and 90 aren't that different in terms of how many clusters are formed. Is 90 too stringent a threshold, potentially losing important genes? Since I'm looking for useful orthologs downstream, reducing redundancy as much as possible is important, but I don't want to sacrifice unique genes...
I'm currently looking at ways to reduce redundancy in de novo transcriptomes of some closely related species with the goal of searching for orthologs for phylogenetics afterwards.
I'm interested in using CD-HIT-EST, but I'm unsure about was threshold of similarity is best to use. 95 and 90 aren't that different in terms of how many clusters are formed. Is 90 too stringent a threshold, potentially losing important genes? Since I'm looking for useful orthologs downstream, reducing redundancy as much as possible is important, but I don't want to sacrifice unique genes...
Comment