Hey guys,
I have some problems understanding the need for statistical models when dealing with differential expression in RNA-Seq. Of course I already used tools like DESeq2 or NOISeq. Nevertheless, I also want at least partially understand what these tools are doing. Unfortunately I don't have a good statistical background and found no tutorial, which is explaining the usage of statistical models in a for me understandable manner. I think the best would be if one could explain it for the Poisson model as this one seems to be easier to understand than a NB.
So what I know is that after sequencing I align my reads to the reference genome, followed by generation of read counts for each annotated gene. Of course I cannot directly use these counts for testing DE cause of different library sizes as well as technical and biological variation.
So what I read most of the time is that people fit statistical models to the count data. Like for example a Poisson model (as this one is accounting for the technical variance).
Question 1: Is the model fitted on the read counts of all genes? Or is each gene getting its own model?
Question 2: In the case of the Poisson model, where do I get the lambda? Should be calculated from my count data?
Question 3: If I have constructed my Poisson model. What is it now used for? Do I use it to change my count data? Is it used in the statistical test? This is the step where I have absolutely no clue what is going on.
I tried to read different publications including the DESeq publications or in the case of Poisson the Marioni paper from 2008. But with my little statistical knowledge I do not get the key idea of these statistical models and how they can help me when dealing with DE in RNA-Seq.
I really hope someone can explain this general concept in a really easy way so I can understand it.
Cheers
Mario
I have some problems understanding the need for statistical models when dealing with differential expression in RNA-Seq. Of course I already used tools like DESeq2 or NOISeq. Nevertheless, I also want at least partially understand what these tools are doing. Unfortunately I don't have a good statistical background and found no tutorial, which is explaining the usage of statistical models in a for me understandable manner. I think the best would be if one could explain it for the Poisson model as this one seems to be easier to understand than a NB.
So what I know is that after sequencing I align my reads to the reference genome, followed by generation of read counts for each annotated gene. Of course I cannot directly use these counts for testing DE cause of different library sizes as well as technical and biological variation.
So what I read most of the time is that people fit statistical models to the count data. Like for example a Poisson model (as this one is accounting for the technical variance).
Question 1: Is the model fitted on the read counts of all genes? Or is each gene getting its own model?
Question 2: In the case of the Poisson model, where do I get the lambda? Should be calculated from my count data?
Question 3: If I have constructed my Poisson model. What is it now used for? Do I use it to change my count data? Is it used in the statistical test? This is the step where I have absolutely no clue what is going on.
I tried to read different publications including the DESeq publications or in the case of Poisson the Marioni paper from 2008. But with my little statistical knowledge I do not get the key idea of these statistical models and how they can help me when dealing with DE in RNA-Seq.
I really hope someone can explain this general concept in a really easy way so I can understand it.
Cheers
Mario
Comment