Hello,
I'm working on creating a core set of orthologous genes in ~40 c.diff strains in order to create a binary tree at the end.
I was told to use reciprocal blast searches. I understand this and have both a script to perform them and handle the output.
My issue is... When doing this reciprocal blast in a pairwise fashion how do I decide which strains to compare with each other? For example, should I compare strain1 with all other strains and then strain2 with all others etc. or should I compare each one with a sensible reference genome (for example, that which the assembly of these genomes was originally carried out)?
Obviously the first method would involve an incredible number of runs. The latter would take fewer, but which is more sensible and will give meaningful results?
I'm a masters student by the way so whilst I think I know everything, I possibly don't...
Thanks!
I'm working on creating a core set of orthologous genes in ~40 c.diff strains in order to create a binary tree at the end.
I was told to use reciprocal blast searches. I understand this and have both a script to perform them and handle the output.
My issue is... When doing this reciprocal blast in a pairwise fashion how do I decide which strains to compare with each other? For example, should I compare strain1 with all other strains and then strain2 with all others etc. or should I compare each one with a sensible reference genome (for example, that which the assembly of these genomes was originally carried out)?
Obviously the first method would involve an incredible number of runs. The latter would take fewer, but which is more sensible and will give meaningful results?
I'm a masters student by the way so whilst I think I know everything, I possibly don't...
Thanks!
Comment