Hi all,
I've been using a general pipeline of Hisat2 -> StringTie -> Ballgown for differential gene expression analysis in a non-model organism. Using RefSeq annotations during StringTie assembly has been an effective way to add gene/transcript names, but because my study species lacks a well-annotated genome, I have a lot of "mystery" genes (I think) that I would like to identify, as many have interesting expression patterns.
To this end, I merged assemblies across all treatment groups/biological replicates, then ran Blastx on that assembly against Swissprot/Uniprot db to see if I could uncover the identity of any of the mystery genes. But I'm stuck now as how to use that blastx output -- other than manually searching and matching Blastx hits to individual genes by the "MSTRG" id's, which would be laborious for ~1000 hits, I'm not sure how to move forward.
Any help would be greatly appreciated.
I've been using a general pipeline of Hisat2 -> StringTie -> Ballgown for differential gene expression analysis in a non-model organism. Using RefSeq annotations during StringTie assembly has been an effective way to add gene/transcript names, but because my study species lacks a well-annotated genome, I have a lot of "mystery" genes (I think) that I would like to identify, as many have interesting expression patterns.
To this end, I merged assemblies across all treatment groups/biological replicates, then ran Blastx on that assembly against Swissprot/Uniprot db to see if I could uncover the identity of any of the mystery genes. But I'm stuck now as how to use that blastx output -- other than manually searching and matching Blastx hits to individual genes by the "MSTRG" id's, which would be laborious for ~1000 hits, I'm not sure how to move forward.
Any help would be greatly appreciated.