Hello,
I am analyzing some RNA-Seq data with edgeR, and I have a question about how to set up the GLM model.matrix. My experiment is a 2X2 factorial design where both WT and KO animals are subjected to either of two treatments. The 4 groups are WT-Oil, WT-E2, KO-Oil, and KO-E2. I am interested in determining which genes are DE in the WT-E2 group (our phenotype after treating with E2 is rescued in similarly treated KO animals).
Here is the code that I used
> data.frame(Sample=colnames(a),genotype2,trt2)
Sample genotype2 trt2
1 X1_WT_Oil_140710 WT OIL
2 X5_WT_Oil_140710 WT OIL
3 X1_WT_Oil_141109 WT OIL
4 X2_WT_Oil_141109 WT OIL
5 X2_WT_E2_140710 WT E2
6 X6_WT_E2_140710 WT E2
7 X3_WT_E2_141109 WT E2
8 X4_WT_E2_141109 WT E2
9 X3_KO_Oil_140710 KO OIL
10 X8_KO_Oil_140710 KO OIL
11 X4_KO_E2_140710 KO E2
12 X5_KO_E2_141109 KO E2
13 X6_KO_E2_141109 KO E2
14 X7_KO_E2_141109 KO E2
> design2 <- model.matrix(~0+genotype2:trt2)
> design2
genotype2KO:trt2E2 genotype2WT:trt2E2 genotype2KO:trt2OIL genotype2WT:trt2OIL
1 0 0 0 1
2 0 0 0 1
3 0 0 0 1
4 0 0 0 1
5 0 1 0 0
6 0 1 0 0
7 0 1 0 0
8 0 1 0 0
9 0 0 1 0
10 0 0 1 0
11 1 0 0 0
12 1 0 0 0
13 1 0 0 0
14 1 0 0 0
attr(,"assign")
[1] 1 1 1 1
attr(,"contrasts")
attr(,"contrasts")$genotype2
[1] "contr.treatment"
attr(,"contrasts")$trt2
[1] "contr.treatment"
> b <- estimateGLMCommonDisp(b,design2,verbose=TRUE)
Disp = 0.20378 , BCV = 0.4514
> fit <- glmFit(b,design2)
> lrt <- glmLRT(fit,contrast=c(-1,3,-1,-1))
This worked just fine, but now I would like to reanalyze this data taking batch effects into account. The last part of the sample IDs (140710 and 141109) indicate the two separate dates in which the samples were collected/sequenced. The new data frame would look like this:
> data.frame(Sample=colnames(a),genotype2,trt2,chip2)
Sample genotype2 trt2 chip2
1 X1_WT_Oil_140710 WT OIL 1
2 X5_WT_Oil_140710 WT OIL 1
3 X1_WT_Oil_141109 WT OIL 2
4 X2_WT_Oil_141109 WT OIL 2
5 X2_WT_E2_140710 WT E2 1
6 X6_WT_E2_140710 WT E2 1
7 X3_WT_E2_141109 WT E2 2
8 X4_WT_E2_141109 WT E2 2
9 X3_KO_Oil_140710 KO OIL 1
10 X8_KO_Oil_140710 KO OIL 1
11 X4_KO_E2_140710 KO E2 1
12 X5_KO_E2_141109 KO E2 2
13 X6_KO_E2_141109 KO E2 2
14 X7_KO_E2_141109 KO E2 2
The manual doesn't describe how to set up the model.matrix when there are two factors plus batch effects, and I'm struggling setting this up.
Could somebody guide me as to how to properly define the model.matrix as well as properly define the correct contrasts in the glmLRT function? Thanks!
I am analyzing some RNA-Seq data with edgeR, and I have a question about how to set up the GLM model.matrix. My experiment is a 2X2 factorial design where both WT and KO animals are subjected to either of two treatments. The 4 groups are WT-Oil, WT-E2, KO-Oil, and KO-E2. I am interested in determining which genes are DE in the WT-E2 group (our phenotype after treating with E2 is rescued in similarly treated KO animals).
Here is the code that I used
> data.frame(Sample=colnames(a),genotype2,trt2)
Sample genotype2 trt2
1 X1_WT_Oil_140710 WT OIL
2 X5_WT_Oil_140710 WT OIL
3 X1_WT_Oil_141109 WT OIL
4 X2_WT_Oil_141109 WT OIL
5 X2_WT_E2_140710 WT E2
6 X6_WT_E2_140710 WT E2
7 X3_WT_E2_141109 WT E2
8 X4_WT_E2_141109 WT E2
9 X3_KO_Oil_140710 KO OIL
10 X8_KO_Oil_140710 KO OIL
11 X4_KO_E2_140710 KO E2
12 X5_KO_E2_141109 KO E2
13 X6_KO_E2_141109 KO E2
14 X7_KO_E2_141109 KO E2
> design2 <- model.matrix(~0+genotype2:trt2)
> design2
genotype2KO:trt2E2 genotype2WT:trt2E2 genotype2KO:trt2OIL genotype2WT:trt2OIL
1 0 0 0 1
2 0 0 0 1
3 0 0 0 1
4 0 0 0 1
5 0 1 0 0
6 0 1 0 0
7 0 1 0 0
8 0 1 0 0
9 0 0 1 0
10 0 0 1 0
11 1 0 0 0
12 1 0 0 0
13 1 0 0 0
14 1 0 0 0
attr(,"assign")
[1] 1 1 1 1
attr(,"contrasts")
attr(,"contrasts")$genotype2
[1] "contr.treatment"
attr(,"contrasts")$trt2
[1] "contr.treatment"
> b <- estimateGLMCommonDisp(b,design2,verbose=TRUE)
Disp = 0.20378 , BCV = 0.4514
> fit <- glmFit(b,design2)
> lrt <- glmLRT(fit,contrast=c(-1,3,-1,-1))
This worked just fine, but now I would like to reanalyze this data taking batch effects into account. The last part of the sample IDs (140710 and 141109) indicate the two separate dates in which the samples were collected/sequenced. The new data frame would look like this:
> data.frame(Sample=colnames(a),genotype2,trt2,chip2)
Sample genotype2 trt2 chip2
1 X1_WT_Oil_140710 WT OIL 1
2 X5_WT_Oil_140710 WT OIL 1
3 X1_WT_Oil_141109 WT OIL 2
4 X2_WT_Oil_141109 WT OIL 2
5 X2_WT_E2_140710 WT E2 1
6 X6_WT_E2_140710 WT E2 1
7 X3_WT_E2_141109 WT E2 2
8 X4_WT_E2_141109 WT E2 2
9 X3_KO_Oil_140710 KO OIL 1
10 X8_KO_Oil_140710 KO OIL 1
11 X4_KO_E2_140710 KO E2 1
12 X5_KO_E2_141109 KO E2 2
13 X6_KO_E2_141109 KO E2 2
14 X7_KO_E2_141109 KO E2 2
The manual doesn't describe how to set up the model.matrix when there are two factors plus batch effects, and I'm struggling setting this up.
Could somebody guide me as to how to properly define the model.matrix as well as properly define the correct contrasts in the glmLRT function? Thanks!
Comment