Dear All,
I apologize if an existing answer to the following, basic question is somewhere buried in the forum - if yes, then a quick search did not reveal it.
I'm looking for the right statistical model to compute the required sequencing depth for detecting a rare isoform with a certain probability in RNA-Seq data. Or, in other words, I would like to compute the sensitivity of an RNA-Seq experiment for finding minority isoforms at a given sequencing depth and isoform characteristic.
The problem is closely related to differential expression analysis but I have serious problems combining the right models (poisson, betabin) at the right positions. Perhaps one of the statistically minded people working on RNA-Seq has an idea. Of course, partial solutions or caveats pointed out are also very welcome.
Here is a contrived example with rough numbers: Let's assume that I want to look for a rare isoform that only occurs as n (=10) of the N (=100,000) overal mRNA transcripts per cell. How many reads of length r (=100bp) do I need to sequence from my library derived from the total mRNA of k (=1,000,000) cells so that I will sequence at least m (=3) reads from my rare isoform at a probability of P >=p (=0.999)?
Bonus points: of course, a useful estimate may also depend on how easily I can distinguish the rare isoform from its more abundant brethren originating from the same gene. After all, I may receive reads from my rare isoform with probability P but only ones that are indistinguishable from the other isoforms since the isoforms are identical for most of the sequence. For simplicity, let's assume that all isoforms of the gene are L=(1000bp) long and can be differentiated from each other by one single stretch of length l=(200bp) which encodes an alternatively spliced exon und uniquely tags an isoform.
I realize this is a complex example, but perhaps it's not without merit. Also, who better to ask it than you guys. Anyways, thanks for any insights!
Cheers, Sven
--
Sven-Eric Schelhorn - http://mpi-inf.mpg.de/~sven
Max Planck Institute for Informatics, Saarbrücken
D3 - Computational Biology & Applied Algorithmics
I apologize if an existing answer to the following, basic question is somewhere buried in the forum - if yes, then a quick search did not reveal it.
I'm looking for the right statistical model to compute the required sequencing depth for detecting a rare isoform with a certain probability in RNA-Seq data. Or, in other words, I would like to compute the sensitivity of an RNA-Seq experiment for finding minority isoforms at a given sequencing depth and isoform characteristic.
The problem is closely related to differential expression analysis but I have serious problems combining the right models (poisson, betabin) at the right positions. Perhaps one of the statistically minded people working on RNA-Seq has an idea. Of course, partial solutions or caveats pointed out are also very welcome.
Here is a contrived example with rough numbers: Let's assume that I want to look for a rare isoform that only occurs as n (=10) of the N (=100,000) overal mRNA transcripts per cell. How many reads of length r (=100bp) do I need to sequence from my library derived from the total mRNA of k (=1,000,000) cells so that I will sequence at least m (=3) reads from my rare isoform at a probability of P >=p (=0.999)?
Bonus points: of course, a useful estimate may also depend on how easily I can distinguish the rare isoform from its more abundant brethren originating from the same gene. After all, I may receive reads from my rare isoform with probability P but only ones that are indistinguishable from the other isoforms since the isoforms are identical for most of the sequence. For simplicity, let's assume that all isoforms of the gene are L=(1000bp) long and can be differentiated from each other by one single stretch of length l=(200bp) which encodes an alternatively spliced exon und uniquely tags an isoform.
I realize this is a complex example, but perhaps it's not without merit. Also, who better to ask it than you guys. Anyways, thanks for any insights!
Cheers, Sven
--
Sven-Eric Schelhorn - http://mpi-inf.mpg.de/~sven
Max Planck Institute for Informatics, Saarbrücken
D3 - Computational Biology & Applied Algorithmics
Comment