Hi all,
I'm looking for some advice on enriching an NGS annotation pipeline with better splice-site variant predictive scores. According to the CMGS one should use a consensus of at least three tools and they name NNSPLICE (available on fruitfly.org), NetGene2, Alex Dong Li's splice site finder (which appears to no longer exist!), GeneSplicer, and the human splice finder (HSF) database.
Aside from GeneSplicer and NetGene2, which both require large amounts of sequence information in order to make a prediction and, despite positive reviews, haven't demonstrated great sens./spec. in some perfunctory testing I've done, these are mostly web-based applications that don't seem to have any option for local installation or high-throughput use.
I've found MaxEntScan (not named by the CMGS, but used in HSF and Alamut's SS predictive toolkits) to be the only tool which requires relatively small amounts of sequence data and can make reliable predictions off of them.
So, to the point, has anyone done work to bring in better splice site variant scoring algorithms in a high-throughput analysis pipeline? Have you any thoughts on how one might actually leverage some of the predictions these web apps make? (E.g. run the splice regions for every gene on the web and store the outputs in a local database -- Oh, the horror of the thought!)
Thanks!
I'm looking for some advice on enriching an NGS annotation pipeline with better splice-site variant predictive scores. According to the CMGS one should use a consensus of at least three tools and they name NNSPLICE (available on fruitfly.org), NetGene2, Alex Dong Li's splice site finder (which appears to no longer exist!), GeneSplicer, and the human splice finder (HSF) database.
Aside from GeneSplicer and NetGene2, which both require large amounts of sequence information in order to make a prediction and, despite positive reviews, haven't demonstrated great sens./spec. in some perfunctory testing I've done, these are mostly web-based applications that don't seem to have any option for local installation or high-throughput use.
I've found MaxEntScan (not named by the CMGS, but used in HSF and Alamut's SS predictive toolkits) to be the only tool which requires relatively small amounts of sequence data and can make reliable predictions off of them.
So, to the point, has anyone done work to bring in better splice site variant scoring algorithms in a high-throughput analysis pipeline? Have you any thoughts on how one might actually leverage some of the predictions these web apps make? (E.g. run the splice regions for every gene on the web and store the outputs in a local database -- Oh, the horror of the thought!)
Thanks!
Comment