Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Rachel Hillmer
    Junior Member
    • Jun 2012
    • 8

    splice-form aware input for DESeq/EdgeR/baySeq

    I am looking for a way to quantify and statistically evaluate spliceforms across a set of RNA-Seq experiments.

    My current understanding is that the input to DESeq/edgeR/baySeq should be simply reads mapped to a gene locus. Since cufflinks assigns spliceform abundances one library at a time, any systematic errors inherent in that sample are confounded into the spliceform quantification problem. As I understand it, these systematic errors really should be corrected for with a general linear model, which takes as input all the samples of interest/relevance. (Cufflinks aficionados, please correct me if I am wrong!)

    Also as I understand it, the developers of DESeq/edgeR/baySeq have come to the conclusion (along with many others) that RPKM/FPKM is not a sufficient correction to be able to compare different genes within the same library to each other. There appear to be additional biases (beyond length) that affect the transformation from mRNA to RNA-Seq sequences. Therefore, it has been suggested that instead, it is only reasonable, for now, to restrict ourselves to comparing abundances of the same gene between different samples.

    I find this solution somewhat unsatisfying, though. If I have two spliceforms which, by definition, originate from the same genetic locus, but have different lengths, and varying expression levels in the two (or more) conditions I am surveying, then the noise associated with those two expression levels is different. Moreover, given the current model for mean-variance relationships (negative bionomial), noise, unlike expression level, is not linear. So I would not expect the noise from two genes with the same average expression level, but one containing many differently-regulated spliceforms, and the other containing a single spliceform, to follow the same distribution. Ideally, I would want a general linear model that can simultaneously correct for systematic (non-biological) errors in the sample collection process and estimate spliceform abundances as well.

    Is there a good reason such a model is unnecessary? Is there a good reason to be content with locus-level abundances?

    Thanks for your input!
    ~Rachel
  • chadn737
    Senior Member
    • Jan 2009
    • 392

    #2
    Check out DEXseq. It looks at differential exon-usage and is based on DESeq.

    Comment

    • Gordon Smyth
      Member
      • Apr 2011
      • 91

      #3
      Originally posted by Rachel Hillmer View Post
      I find this solution somewhat unsatisfying, though. If I have two spliceforms which, by definition, originate from the same genetic locus, but have different lengths, and varying expression levels in the two (or more) conditions I am surveying, then the noise associated with those two expression levels is different. Moreover, given the current model for mean-variance relationships (negative bionomial), noise, unlike expression level, is not linear. So I would not expect the noise from two genes with the same average expression level, but one containing many differently-regulated spliceforms, and the other containing a single spliceform, to follow the same distribution.
      ~Rachel
      It is perfectly possible for a sum of negative binomial random variables to also be negatively binomial distributed, even when the means are different. So it is perfectly possible for a quadratic mean-variance relationship to hold both for separate isoforms of varying expression levels and for the total aggregate count for the whole gene region. This requires only that the level of biological variability be comparable between the isoforms.

      Even if this relationship was not satisfied exactly, the quadratic variance function would likely still be more realistic than assuming biological variability to be absent, which is what one is doing by using a Poisson distribution.

      Gordon

      Comment

      • Gordon Smyth
        Member
        • Apr 2011
        • 91

        #4
        Originally posted by chadn737 View Post
        Check out DEXseq. It looks at differential exon-usage and is based on DESeq.
        Or the function spliceVariants() in the edgeR package.

        This and DEXSeq are designed to test for differential splicing though -- they don't attempt to quantify the expression levels of the different isoforms in absolute terms.

        Gordon

        Comment

        Latest Articles

        Collapse

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by SEQadmin2, Yesterday, 10:09 AM
        0 responses
        10 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-04-2026, 08:59 AM
        0 responses
        20 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-02-2026, 12:03 PM
        0 responses
        27 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-02-2026, 11:40 AM
        0 responses
        21 views
        0 reactions
        Last Post SEQadmin2  
        Working...