Unconfigured Ad

**mbblack** · 05-30-2012, 04:50 AM

In the complete absence of replicates, I don't think any statistical tool is going to be worth a dang for differential gene expression. All you can do is look at simple differences in counts, with no means at all of assessing the significance of those differences. The statistics cannot compensate for a complete lack of adequate data for the analysis in question, and without some minimal number of replicates (3 is really the minimum, 4 or more would be far better), there is no way to assign statistical significance.

I know the vignettes for tools like edgeR talk about good performance "...even for experiments with minimal levels of biological replication" (quoting from the edgeR manual), but note the use of the word "minimum". A complete absence of replication is not minimum, and in the complete absence of replication, you cannot perform statistical tests of significance for differences.

And since you have no statistical power at all, comparing different analytical tools seems pointless to me.

**lexa** · 05-30-2012, 04:58 AM

I have to agree with mbblack. you should try to gain more statistical power by getting at least 3 replicates per treatment. otherwise your comparision is not really meaningful.

**mrfox** · 05-30-2012, 06:21 AM

Many thanks, mbblack and lexa.

Lacking of replicates is indeed an issue for some of my projects. Unfortunately, these collaborators will not proceed to sequence replicates until they find something interesting in the current data.
They even wish to have a short, "reliable" list of DE genes or differentially spliced that makes sense, while we are not able to achieve this without replicates. It is really a dilema.

It is important for biologists to discuss with bioinformaticians before they submit the samples for sequencing.

**mgogol** · 05-30-2012, 06:25 AM

What you could do is run both and show them the resulting gene lists for both and the intersection (venn diagram?)

**lexa** · 05-30-2012, 06:27 AM

that's hard. anyway, you could try to get a 'reliable' gene set using different methods and just take the overlap from different methods. maybe, you should take genes verified by at least 2 different methods. then, do a literature search for the genes you found. maybe, some of the genes you find are already described.

**Tom Bair** · 05-30-2012, 07:07 AM

edgeR does mention a method for dealing with lack of replication by assigning a variance value

simply pick a reasonable dispersion value, based on your experience with similar data, and use that. Although subjective, this is still more defensible than assuming Poisson variation. Typical values are dispersion=0.4 for human data, dispersion=0.1 for data
on genetically identical model organisms or dispersion=0.01 for technical replicates.

More detail in the User Guide, an option anyway, replication is always better.

**mrfox** · 05-30-2012, 07:10 AM

In my mind I tried that a long time ago. I found that the result is sensitive to the selected dispersion coefficient.

**mbblack** · 05-30-2012, 08:01 AM

Originally posted by mrfox View Post

Many thanks, mbblack and lexa.

Lacking of replicates is indeed an issue for some of my projects. Unfortunately, these collaborators will not proceed to sequence replicates until they find something interesting in the current data.
They even wish to have a short, "reliable" list of DE genes or differentially spliced that makes sense, while we are not able to achieve this without replicates. It is really a dilema.

It is important for biologists to discuss with bioinformaticians before they submit the samples for sequencing.

You need to discuss this with them. Without replicates, there is no way to actually give them the answers they seek. "Reliable" list of DE genes? That cannot possible be derived without some statistical significance assigned to the results, and you cannot have any statistically significant results without replicates. At best, all you could give them would be a ranked list of simple differences in gene counts or RPKM for mapped genes, and with no hint of what the variance about those differences there may be.

They really need to do a proper pilot study, with 3-5 replicates to see just what they have to work with. Otherwise, all you can tell them is what is different, but with no statistical ranking of significance nor any idea of how variable those differences may be.

It is not that you have minimal statistical power without replicates, you have none. All you have is simple numeric differences of some count or normalized values, and nothing more. And you have no idea at all if those differences are real biological differences, or random experimental noise.

And there is nothing unique to RNAseq data about that - you cannot compute statistics on a simple difference between two single numbers.

**mrfox** · 05-30-2012, 08:12 AM

I could not agree more. Inferring a short list of DE genes from an expensive(compared to array data) RNA-Seq sequencing for even one single pair of samples is some collaborators' dream. Some even prefer to spend money on sequencing more cell line types rather than replicates. I find it is hard to persuade them.

Without replicates, what we can provide is only the list of DE genes based on statistical models such as poisson but this will never reflect the truth without sufficient replicates.

Originally posted by mbblack View Post

You need to discuss this with them. Without replicates, there is no way to actually give them the answers they seek. "Reliable" list of DE genes? That cannot possible be derived without some statistical significance assigned to the results, and you cannot have any statistically significant results without replicates. At best, all you could give them would be a ranked list of simple differences in gene counts or RPKM for mapped genes, and with no hint of what the variance about those differences there may be.

They really need to do a proper pilot study, with 3-5 replicates to see just what they have to work with. Otherwise, all you can tell them is what is different, but with no statistical ranking of significance nor any idea of how variable those differences may be.

It is not that you have minimal statistical power without replicates, you have none. All you have is simple numeric differences of some count or normalized values, and nothing more. And you have no idea at all if those differences are real biological differences, or random experimental noise.

And there is nothing unique to RNAseq data about that - you cannot compute statistics on a simple difference between two single numbers.

**chrisbala** · 05-23-2013, 05:12 AM

edgeR without replicates

Knowing that it is unwise to do experiments without replication, I find myself in exactly that situation. (pooled samples).

I've analysed these data with older versions of DE-Seq, but now would also like to try edgeR. I can't seem to decipher exactly how one does this analysis without replicates based on the vignette. Anyone able to help me out/share a script?

It's pretty clear that both DEseq and edgeR camps are now strongly discouraging such efforts (does DEseq2 even stil incorporate such analyses?), but still need to give it a go in this case.

Thanks!

**mbblack** · 05-23-2013, 06:08 AM

To be honest, my opinion is that the first option mentioned in the edgeR vignette is really the only valid approach to follow in that situation. To quote from page 18:

"1. Be satised with a descriptive analysis, that might include an MDS plot and an analysis
of fold changes. Do not attempt a signicance analysis. This may be the best advice."

In other words, make your argument for significantly differentially expressed genes based solely on the magnitude of measured differences between samples and accept that you cannot perform any reliable or valid statistical significance testing. I just think it is pointless to spend a lot of time running algorithms or code on a data set that fundamentally cannot be analyzed statistically.

Basically, what is the point of the effort if the stats are meaningless or open to vigorous negative criticism?

**chrisbala** · 05-23-2013, 07:20 AM

thanks, option 1 is basically what we are doing. but also trying to scrutinize the data in as many ways as possible. we pooled 10 individuals per library, and our results seem not hopeless in that we can see some of the things we know we should see, and these do hold up to DESeq stats ("working without replicates"). but its the novel stuff that is more problematic. We'll be finding out via qPCR and in situs, I suppose, how well these stats hold up. But yes, not so optimistic. should also say that we are have 3 groups, not 2 so we at least have a bit more information on variability.

Topics	Statistics	Last Post
High-Resolution Sequencing Exposes Hidden Toxoplasma Diversity by SEQadmin2 Started by SEQadmin2, Today, 11:08 AM	0 responses 6 views 0 reactions	Last Post by SEQadmin2 Today, 11:08 AM
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, 06-30-2026, 05:37 AM	0 responses 11 views 0 reactions	Last Post by SEQadmin2 06-30-2026, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 19 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 53 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM

Unconfigured Ad

differential gene expression without replicates: edgeR, DESeq?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News