Multiple DGE libraries comparison. (EdgeR baySeq DESeq)

RockChalkJayhawk replied

04-14-2010, 08:56 AM
Originally posted by Simon Anders View Post

Let's say you have four different condition, each with two replicates: A1, A2, B1, B2, C1, C2, D1, D2.

Then, you can use DESeq as it is to do contrasts such as 'A vs B', 'A vs C', 'B vs D', etc.

What does not yet work is interaction contrasts. So, let's say, you have the conditions wt, A, B, AB; where wt is the wild-type, A and B are some treatments or mutations, and AB is the combination of both. You now might either be interested in simple contrasts, as above, which is fine. Or you want to check for interactions, i.e., you say that the effect of A is given by the difference dA between A and wt (dA = A - wt) and similar for B (dB = B - wt) and AB (dAB = AB - wt), and you now wonder whether dAB = dA + dB. To check this, you need an interaction contrast, and this is something that I am currently working on, but I might still need a few weeks.

For now, you can use variance-stabilize the data (function 'getVarianceStabilizedData) and then use an ordinary linear model or the 'limma' package to check for interaction. That works fine but the power is reduced.

Just using the raw count data, scaled by the library sizes, is not a good idea, if you intend to feed it a linear model. Ordinary least squares regression requires homoscedasticity, and count data is heteroscedastic. The point of a variance-stabilizing transformation is precisely to remedy that.

Whoops! I just saw it in your vignette. That's what I need since I'm actually doing a time-course. It'll just make it easier to generate some figures.
Leave a comment:
Simon Anders replied

04-14-2010, 08:50 AM
Let's say you have four different condition, each with two replicates: A1, A2, B1, B2, C1, C2, D1, D2.

Then, you can use DESeq as it is to do contrasts such as 'A vs B', 'A vs C', 'B vs D', etc.

What does not yet work is interaction contrasts. So, let's say, you have the conditions wt, A, B, AB; where wt is the wild-type, A and B are some treatments or mutations, and AB is the combination of both. You now might either be interested in simple contrasts, as above, which is fine. Or you want to check for interactions, i.e., you say that the effect of A is given by the difference dA between A and wt (dA = A - wt) and similar for B (dB = B - wt) and AB (dAB = AB - wt), and you now wonder whether dAB = dA + dB. To check this, you need an interaction contrast, and this is something that I am currently working on, but I might still need a few weeks.

For now, you can use variance-stabilize the data (function 'getVarianceStabilizedData) and then use an ordinary linear model or the 'limma' package to check for interaction. That works fine but the power is reduced.

Just using the raw count data, scaled by the library sizes, is not a good idea, if you intend to feed it a linear model. Ordinary least squares regression requires homoscedasticity, and count data is heteroscedastic. The point of a variance-stabilizing transformation is precisely to remedy that.
Leave a comment:
RockChalkJayhawk replied

04-14-2010, 08:36 AM
Multi-sample comaprison

Simon,

Lets say I want to compare 4 samples each with 2 replicates (an ANOVA for example). As of now, DESeq doesn't have this functionality correct?

Is there a way I can export the entire matrix of normalized values rather than making the 1X1 comparison? For example, would it be correct to multiply all of the values in T1b by 0.5587394, or T2 by 1.5823096, etc? (these represent the column names ans sizeFactors from the DESeq vignette)
Leave a comment:
Simon Anders replied

04-08-2010, 10:40 PM
If you followed the example in the vignette, you have come across the line

resSig <- res[ res$padj < .1, ]

This is where we select the genes, which we want to consider significant. res$padj is the Benjamini-Hoch-berg-adjusted p value, and selecting all genes with padj below 0.1 controls the FDR at 10%. To get 1%, just put .01 here.
Leave a comment:
gen2prot replied

04-08-2010, 02:20 PM
Hi Simon,

How can I set the FDR at 1% instead of 10%. Is there a way of doing this through the nbinomTest function? I am not an R programmer so I do not know.

Thanks
Abhijit
Leave a comment:
Simon Anders replied

04-08-2010, 11:00 AM
The same lines.

A false dicovery rate (FDR) of 10% means that (in expectation) 10% of your hits are false positives. To get FDR control at 10%, you adjust the raw p values for multiple testing with an FDR-controlling method such as the one by Benjamini and Hochber or the one by Storey, and the take all those genes with an adjusted p value below the desired FDR. DESeq does this for you, using the Benjamini-Hochberg method.

Controlling at FDR 10% is a common, but, of course, arbitrary choice. It is up to you to decide how stringent you want to be.

The concept of FDR has been introduced by Benjamini and Hochberg in 1995 (J Roy Stat Soc B 57 289). Since then, there has been a lot of research on how best to adjust p values to allow for FDR control, and how well this works in case of correlation.
Leave a comment:
gen2prot replied

04-08-2010, 10:25 AM
Hi Simon,

Thank you for your help. I got DESeq to work. However, I am trying to figure out a good value to use for the FDR cutoff. This maybe a very stupid question, but I'll ask anyways. Is an FDR of 10% more stringent than an FDR of 1%. I am thinking in terms of p-values where lesser numbers are higher stringency. Should I imagine FDR along the same lines or the opposite.

Thanks
Abhijit
Leave a comment:
lpachter replied

04-07-2010, 07:52 AM
Dear Wolfgang,

Thank you for pointing out my typo- namely that "measurement" should have been "estimate".
Leave a comment:
Wolfgang Huber replied

04-07-2010, 01:52 AM
Originally posted by lpachter View Post

I'd like to clarify one point here:

...for a measurement to be useful (whatever units it is in) it is important to know its variance. ....

While we're at it...: there is also no such thing as "the variance of a measurement". There is the sample variance of a set of measurements; and there is the variance of a statistical ensemble (physicists' language) or distribution in a probabilistic model (mathematicians' language), whose "true" value we often do not know, but which we can aim to estimate from data.

The choice of ensemble matters: repeated measurements on the same sample using the same machine will have smaller variance than repeated measurements using different machines, or different biological replicates, e.g. different cell passages.
Leave a comment:
Simon Anders replied

04-06-2010, 11:21 AM
Hi Abhijit

have you followed the instructions on http://www-huber.embl.de/users/anders/DESeq/ ?

If so, please send me the output from the installation. The real error message should be hidden in there.

Cheers
Simon
Leave a comment:
gen2prot replied

04-06-2010, 10:15 AM
Hi Simon,

I am trying to install DESeq on MAC OSX Leopard. I get the error message "In getDependencies(pkgs, dependencies, available, lib) : package ‘DESeq’ is not available". Any clue why this is happening. I found that DESeq needs BioBase to be installed. I have done that, and yet I get this error. Any ideas?

Thanks
Abhijit
Leave a comment:
Boel replied

04-05-2010, 04:35 AM
Originally posted by Cole Trapnell View Post

One more thing: when cuffdiff reports the results of differential testing, it's being performed on the actual counts (after log transformation) for the reasons that Simon suggests.

Is there a way to output the actual counts?
Leave a comment:
steven replied

04-05-2010, 01:28 AM
Originally posted by lpachter View Post

There is no such thing as "FPKM data" (or for that matter "RPKM data"). The only data in an RNA-Seq experiment is the read counts.

I heard from people working on alignment that "the only data is the read sequences" and from people working on base calling that "the only data is the raw images".. at least, one consensus: for sure that makes a lot of data
Leave a comment:
RockChalkJayhawk replied

04-03-2010, 09:39 AM
Originally posted by Cole Trapnell View Post

One more thing: when cuffdiff reports the results of differential testing, it's being performed on the actual counts (after log transformation) for the reasons that Simon suggests.

Cole,

Do you think it would be a good idea to use quantile normalization on tag counts before running cuffdiff or is this unecessary given that it didn't actually use RPKM values (which I didn't know, but am glad to hear)?
Leave a comment:
Cole Trapnell replied

04-01-2010, 11:00 AM
One more thing: when cuffdiff reports the results of differential testing, it's being performed on the actual counts (after log transformation) for the reasons that Simon suggests.
Leave a comment:

Previous 1 2 3 4 template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
04-22-2024, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, Today, 11:49 AM	0 responses 12 views 0 likes	Last Post by seqadmin Today, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, Yesterday, 08:47 AM	0 responses 16 views 0 likes	Last Post by seqadmin Yesterday, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 61 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Leave a comment:

Latest Articles

ad_right_rmr

News