Unconfigured Ad

**dpryan** · 05-12-2013, 11:39 AM

Originally posted by pengchy View Post

Hi dpryan,

Sorry, I can't catch your meaning.
The pvalue of adjusted pvalue will be used to detect differentially expressed genes. If the different p value is caused by the gene length, the order of p value will misguid the following biology experiments.

Thank you.

If the biologists (I am one myself) are thrown off by this, then they should find a different profession. This concept is incredibly common in molecular biology or in whatever other sub-field your colleagues might work.

The p-value isn't being caused by gene length, rather longer genes will tend to have more reads, meaning that there can be more evidence for or against differential expression. Other factors affecting that are within-group variability, between-group variability (i.e., fold change) and general level of expression (depending on exactly how the statistics are done). The p-value, then, is just one factor that should affect their decision of which follow-up experiment(s) to perform.

**pengchy** · 05-12-2013, 04:21 PM

Hi dpryan,

Thank you for your reply.

I agree with all of your viewpoints.

From your explanation, it can be concluded that p-value is not only influenced by gene length, but also other factors. Here, the focus is gene length, if we can reduce this influence at DEG detect step, why not to try?

**capricy** · 09-21-2013, 10:28 AM

After reading so many posts, I feel length adjusted counts should be a better way of doing DE analysis. Even though the length has equal effects among different samples as manual suggested, however, when we do DE analysis, we do care the top DE list which really involves comparison between genes.

Since we can not get the exact counts of genes by simply doing RNA seq, steps of length normalization and rounding represents an approximate data.

Just my two cents.

**simonandrews** · 09-22-2013, 07:16 AM

Originally posted by pengchy View Post

Hi dpryan,

Thank you for your reply.

I agree with all of your viewpoints.

From your explanation, it can be concluded that p-value is not only influenced by gene length, but also other factors. Here, the focus is gene length, if we can reduce this influence at DEG detect step, why not to try?

The underlying problem here is that the way that RNA-Seq data is collected, with a random sampling of fragmented cDNA. This intrinsically means that if you have 2 genes with a copy number of 100, but one is 10X the length of the other then on average you will have 10X the number of RNA-Seq reads from the longer gene even though they exist at the same expression level.

This difference in observation then passes through to the statistical analysis where what matters is how accurately the expression of each gene is measured, as well as the level of change in expression. The more observations you have the more accurately you can infer your true expression level and the easier it becomes to detect differential expression at a given fold change. You'll therefore find that it's easier to detect changes in longer genes for the same fold change and the associated p-values will therefore be lower.

You mentioned the idea of correcting for this observation bias, and I guess you could do this, but the problem would be that you can only do this by making the well observed data worse. There's nothing you can do to make the poorly observed (shorter) genes better. Pretty much all of the statistical approaches use some direct transformation of read counts in their statistical tests since this provides the most direct and relevant measure. You could run your statistics on counts which have been length normalised (RPKM) but all you end up doing by that is mixing together different observation levels with very different levels of noise at the same value in your data, ie a high value RPKM could be a long gene with a large number of observations for which you can be very sure of the value, or a short gene with low numbers of observations where the true expression level is not known with any certainty. Taking this approach won't help you improve your analysis (quite the opposite) and won't make it any fairer - it will just put the biases in a different place.

I guess the ultimate solution to this will come when we lose the length restriction on sequence read levels so that every transcript is read in its entirety, but I'm not holding my breath for this.

One thing we have been doing to help make DE analysis fairer is to use the intensity difference analysis approach in SeqMonk to help to order the hits coming out of DE analysis. This doesn't change the set of hits you'd get out of something like DESeq but it can be useful in helping to prioritse which are the most interesting. The basic approach is that we construct a local distribution of differences for genes with similar average expression to the gene being tested. We can then compute z-scores for each DE hit using the local level of noise to provide an improved 'fold change' type meaure which we've found to be useful in ranking hits and selecting the top hits to follow up.

**capricy** · 09-27-2013, 01:04 PM

I feel the length difference of the most of the gene are not huge, generally within 10 times. How much could it introduce noise?

**smurmu** · 10-19-2016, 05:32 AM

I am stuck in normalization of rna-seq data using DESeq.
I have used command like "counts( data, normalized=TRUE ) " but an error occurred which says that "Error in .local(object, ...) : unused argument (normalized = TRUE)".
How shall i get rid of this problem??

Topics	Statistics	Last Post
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 17 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 27 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 38 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 61 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM

Unconfigured Ad

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News