Hello all,
I've been trying to figure out how the taxonomy ids are set up for NCBI Blast results, but have been having some trouble. I am trying to figure out how to include/exclude results from Blast searches based on the associated taxa that Blast results associated with each sequence blasted. In other words, I want to be able to pull out all the sequences that blast to E. coli or Stronglyocentros purpuratus or, more generally, Eukaryotes, whatever, and then be able to import them into a program like Blast2GO to run some stats on them. I thought perhaps I could pull out a series of sequence names to use for filtering using some unix or python commands, but I haven't been able to figure out how to use the taxonomy ids to do that. Has anyone done this before with a decent sized data set (a few thousand sequences) and have some good suggestions?
Thanks in advance any advice.
I've been trying to figure out how the taxonomy ids are set up for NCBI Blast results, but have been having some trouble. I am trying to figure out how to include/exclude results from Blast searches based on the associated taxa that Blast results associated with each sequence blasted. In other words, I want to be able to pull out all the sequences that blast to E. coli or Stronglyocentros purpuratus or, more generally, Eukaryotes, whatever, and then be able to import them into a program like Blast2GO to run some stats on them. I thought perhaps I could pull out a series of sequence names to use for filtering using some unix or python commands, but I haven't been able to figure out how to use the taxonomy ids to do that. Has anyone done this before with a decent sized data set (a few thousand sequences) and have some good suggestions?
Thanks in advance any advice.
Comment