Hi all,
I am trying to come up with a conservative list of single copy human genes -- i.e. I want to exclude all genes that may be part of a larger gene family/have highly similar sequence similarity. I've done quite a bit of searching but have not found anything I'm satisfied with.
My basic approach is to blast the the human transcripts file against itself and filter out transcripts that have significant hits to any other transcripts other than itself.
My question is, what blast parameters would best accomplish this? I would rather be more conservative.
Any ideas of suitable cut-offs for the following parameters?
1. e-value (1e-10)
2. hit length to query length ratio (50%)
3. Percent identity (90%)
Thanks
I am trying to come up with a conservative list of single copy human genes -- i.e. I want to exclude all genes that may be part of a larger gene family/have highly similar sequence similarity. I've done quite a bit of searching but have not found anything I'm satisfied with.
My basic approach is to blast the the human transcripts file against itself and filter out transcripts that have significant hits to any other transcripts other than itself.
My question is, what blast parameters would best accomplish this? I would rather be more conservative.
Any ideas of suitable cut-offs for the following parameters?
1. e-value (1e-10)
2. hit length to query length ratio (50%)
3. Percent identity (90%)
Thanks
Comment