understanding design formula in DESeq2

descostes

Junior Member

Join Date: May 2019

Posts: 1
- Share
- Tweet
#1

understanding design formula in DESeq2

05-09-2019, 08:13 AM

Hi,

I am trying to understand the role of using an interaction term in the design formula of DESeq2. I have read this explanation: http://bioconductor.org/packages/dev...l#interactions

This contains the following paragraph:

The key point to remember about designs with interaction terms is that, unlike for a design ~genotype + condition, where the condition effect represents the overall effect controlling for differences due to genotype, by adding genotype:condition, the main condition effect only represents the effect of condition for the reference level of genotype (I, or whichever level was defined by the user as the reference level). The interaction terms genotypeII.conditionB and genotypeIII.conditionB give the difference between the condition effect for a given genotype and the condition effect for the reference genotype.

I would be happy if someone can confirm these affirmations to know if I understand this correctly:

1) = ~condition + genotype + condition:genotype

This is not looking at differential expression between conditions, typically a WT vs KO. This is in fact detecting the genes that are differentially expressed between conditions AND differently between genotypes.

2) = ~ condition + genotype

This is detecting differentially expressed genes correcting for the genotype effect. In other words, this is looking at differentially expressed genes between all the samples of condition A and all the samples of condition B, but correcting for the effect of the genotype (like we can correct for the batch effect).

3) =~condition

Same as above but not correcting for the genotype effect.

I would like also to know if the following statement is correct:

If now considering batches instead of genotypes, if one uses a package for batch effect correction such as sva, we can say that:

1) (~condition + USAGE OF SVA) is equivalent, in the principle, to (~condition + batch). The difference is that a particular package will use a different method.

Question:

If the above statements are true, is it correct to say that the following code is equivalent to a 2 by 2 comparision in each genotype using only ~condition:

`results(dds, contrast=c("group", "IB", "IA"))
results(dds, contrast=c("group", "IIB", "IIA"))
results(dds, contrast=c("group", "IIIB", "IIIA"))`

or is it only subselecting genes that are different between all genotypes AND different between conditions for genotype X (X=c("I", "II", "III"))?

Thanks a lot in advance.
Tags: deseq2
Wolfgang Huber

Senior Member

Join Date: Aug 2009

Posts: 109
- Share
- Tweet
#2

05-19-2019, 11:57 AM

Hi Nicolas

Assertions 2-4 seem OK, but 1 is not correct. The best I could come up with to explain this is in the recent book: https://www.huber.embl.de/msmb/Chap-...ec:multifactor

In particular, note that model formulae are not detecting any genes. They are a concise way of specifying a model with multiple parameters ("betas"), and the next step is saying which particular one of these parameters, or linear combination of them ("contrasts") you care about, and *then* you look for genes with a large value of this (univariate) parameter.

Sorry, I didn't understand the "Question".

Hope this helps (a little) -
Wolfgang

Last edited by Wolfgang Huber; 05-19-2019, 12:12 PM.

Wolfgang Huber
EMBL
Comment

Previous template Next

Essential Discoveries and Tools in Epitranscriptomics

by seqadmin

The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
- Channel: Articles
Yesterday, 07:01 AM
Current Approaches to Protein Sequencing

by seqadmin

Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
- Channel: Articles
04-04-2024, 04:25 PM

Topics	Statistics	Last Post
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 55 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 52 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM
Novel Diagnostic Assay Enhances Ovarian Cancer Detection by seqadmin Started by seqadmin, 04-10-2024, 09:21 AM	0 responses 45 views 0 likes	Last Post by seqadmin 04-10-2024, 09:21 AM
Evolutionary Dynamics of Centromeres: A Comparative Genomic Analysis by seqadmin Started by seqadmin, 04-04-2024, 09:00 AM	0 responses 55 views 0 likes	Last Post by seqadmin 04-04-2024, 09:00 AM

Seqanswers Leaderboard Ad

Announcement

understanding design formula in DESeq2

Comment

Latest Articles

ad_right_rmr

News