Hello,
I inherited some bioinformatics RNA-Seq from someone else, and I'm trying to make sense of something that they did. The project was looking for differentially expressed genes using DESeq, which works nicely. The output of that is then compared to a gtf file for the reference genome and protein IDs for each of the hits are extracted. A program called fastacmd is then used to get amino acid sequences for those protein IDs. That all makes sense to me. However then those protein sequences are blasted against the kegg database, and the 3 letter code for the best hits for each is then used to assign the organism host to that gene. This doesn't really make sense to me (since the header from the fasta file generated by fastacmd contains the organism name), I'm hoping someone else can help. The work was originally done about a year ago so no one can quite remember the logic of doing it, and the blast results from kegg gave some interesting output so I want to be able to validate it.
Can anyone offer some insight into their logic? or perhaps suggest a better way?
Thanks
I inherited some bioinformatics RNA-Seq from someone else, and I'm trying to make sense of something that they did. The project was looking for differentially expressed genes using DESeq, which works nicely. The output of that is then compared to a gtf file for the reference genome and protein IDs for each of the hits are extracted. A program called fastacmd is then used to get amino acid sequences for those protein IDs. That all makes sense to me. However then those protein sequences are blasted against the kegg database, and the 3 letter code for the best hits for each is then used to assign the organism host to that gene. This doesn't really make sense to me (since the header from the fasta file generated by fastacmd contains the organism name), I'm hoping someone else can help. The work was originally done about a year ago so no one can quite remember the logic of doing it, and the blast results from kegg gave some interesting output so I want to be able to validate it.
Can anyone offer some insight into their logic? or perhaps suggest a better way?
Thanks
Comment