Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • dpryan
    Devon Ryan
    • Jul 2011
    • 3478

    #31
    Originally posted by pengchy View Post
    Hi dpryan,

    Sorry, I can't catch your meaning.
    The pvalue of adjusted pvalue will be used to detect differentially expressed genes. If the different p value is caused by the gene length, the order of p value will misguid the following biology experiments.

    Thank you.
    If the biologists (I am one myself) are thrown off by this, then they should find a different profession. This concept is incredibly common in molecular biology or in whatever other sub-field your colleagues might work.

    The p-value isn't being caused by gene length, rather longer genes will tend to have more reads, meaning that there can be more evidence for or against differential expression. Other factors affecting that are within-group variability, between-group variability (i.e., fold change) and general level of expression (depending on exactly how the statistics are done). The p-value, then, is just one factor that should affect their decision of which follow-up experiment(s) to perform.

    Comment

    • pengchy
      Senior Member
      • Feb 2009
      • 116

      #32
      Hi dpryan,

      Thank you for your reply.

      I agree with all of your viewpoints.

      From your explanation, it can be concluded that p-value is not only influenced by gene length, but also other factors. Here, the focus is gene length, if we can reduce this influence at DEG detect step, why not to try?

      Comment

      • capricy
        Senior Member
        • Apr 2012
        • 125

        #33
        After reading so many posts, I feel length adjusted counts should be a better way of doing DE analysis. Even though the length has equal effects among different samples as manual suggested, however, when we do DE analysis, we do care the top DE list which really involves comparison between genes.

        Since we can not get the exact counts of genes by simply doing RNA seq, steps of length normalization and rounding represents an approximate data.

        Just my two cents.

        Comment

        • simonandrews
          Simon Andrews
          • May 2009
          • 870

          #34
          Originally posted by pengchy View Post
          Hi dpryan,

          Thank you for your reply.

          I agree with all of your viewpoints.

          From your explanation, it can be concluded that p-value is not only influenced by gene length, but also other factors. Here, the focus is gene length, if we can reduce this influence at DEG detect step, why not to try?
          The underlying problem here is that the way that RNA-Seq data is collected, with a random sampling of fragmented cDNA. This intrinsically means that if you have 2 genes with a copy number of 100, but one is 10X the length of the other then on average you will have 10X the number of RNA-Seq reads from the longer gene even though they exist at the same expression level.

          This difference in observation then passes through to the statistical analysis where what matters is how accurately the expression of each gene is measured, as well as the level of change in expression. The more observations you have the more accurately you can infer your true expression level and the easier it becomes to detect differential expression at a given fold change. You'll therefore find that it's easier to detect changes in longer genes for the same fold change and the associated p-values will therefore be lower.

          You mentioned the idea of correcting for this observation bias, and I guess you could do this, but the problem would be that you can only do this by making the well observed data worse. There's nothing you can do to make the poorly observed (shorter) genes better. Pretty much all of the statistical approaches use some direct transformation of read counts in their statistical tests since this provides the most direct and relevant measure. You could run your statistics on counts which have been length normalised (RPKM) but all you end up doing by that is mixing together different observation levels with very different levels of noise at the same value in your data, ie a high value RPKM could be a long gene with a large number of observations for which you can be very sure of the value, or a short gene with low numbers of observations where the true expression level is not known with any certainty. Taking this approach won't help you improve your analysis (quite the opposite) and won't make it any fairer - it will just put the biases in a different place.

          I guess the ultimate solution to this will come when we lose the length restriction on sequence read levels so that every transcript is read in its entirety, but I'm not holding my breath for this.

          One thing we have been doing to help make DE analysis fairer is to use the intensity difference analysis approach in SeqMonk to help to order the hits coming out of DE analysis. This doesn't change the set of hits you'd get out of something like DESeq but it can be useful in helping to prioritse which are the most interesting. The basic approach is that we construct a local distribution of differences for genes with similar average expression to the gene being tested. We can then compute z-scores for each DE hit using the local level of noise to provide an improved 'fold change' type meaure which we've found to be useful in ranking hits and selecting the top hits to follow up.

          Comment

          • capricy
            Senior Member
            • Apr 2012
            • 125

            #35
            I feel the length difference of the most of the gene are not huge, generally within 10 times. How much could it introduce noise?

            Comment

            • smurmu
              Junior Member
              • Oct 2016
              • 6

              #36
              I am stuck in normalization of rna-seq data using DESeq.
              I have used command like "counts( data, normalized=TRUE ) " but an error occurred which says that "Error in .local(object, ...) : unused argument (normalized = TRUE)".
              How shall i get rid of this problem??

              Comment

              Latest Articles

              Collapse

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, 06-09-2026, 11:58 AM
              0 responses
              17 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-05-2026, 10:09 AM
              0 responses
              27 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-04-2026, 08:59 AM
              0 responses
              38 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-02-2026, 12:03 PM
              0 responses
              61 views
              0 reactions
              Last Post SEQadmin2  
              Working...