Hello,

I would like to use glm to ignore the bias from two different protocols and obtain the real DEG.

I have three conditions, say C1, C2 and C3.

Protocol 1 was used to obtain three replicates of C1 and three of C2.

Protocol 2 was used to obtain three replicates of C2 and C3.

I am interested in finding the DEG between C1 and C3 by taking into consideration C2 (from both protocols) in order to eliminate the protocol bias. See table below.

_________|__C1___|___C2___|___C3___|

Protocol 1 |___X___|___X____|_________|

Protocol 2 |_______|___X____|____X____|

The next are my two vectors for the matrix.design:

Protocol = p1 p1 p1 p1 p2 p1 p2 p1 p2 p2 p2 p2

Conditions= c1 c1 c1 c2 c2 c2 c2 c2 c2 c3 c3 c3

In R code we have:

Protocol = factor( c( rep('p1',3), rep(c("p1","p2"),3),rep('p2',3)) )

Conditions = factor(c(rep('c1',3),rep("c2",6),rep('c3',3)))

data.frame(Protocol,Conditions)

Protocol Conditions

1 p1 c1

2 p1 c1

3 p1 c1

4 p1 c2

5 p2 c2

6 p1 c2

7 p2 c2

8 p1 c2

9 p2 c2

10 p2 c3

11 p2 c3

12 p2 c3

design <- model.matrix(~Protocol+Conditions)

design

(Intercept) Protocolp2 Conditionsc2 Conditionsc3

1 1 0 0 0

2 1 0 0 0

3 1 0 0 0

4 1 0 1 0

5 1 1 1 0

6 1 0 1 0

7 1 1 1 0

8 1 0 1 0

9 1 1 1 0

10 1 1 0 1

11 1 1 0 1

12 1 1 0 1

attr(,"assign")

[1] 0 1 2 2

attr(,"contrasts")

attr(,"contrasts")$Protocol

[1] "contr.treatment"

attr(,"contrasts")$Conditions

[1] "contr.treatment"

After estimating dispersion, glmFit and glmLRT I have:

...

lrt <- glmLRT(fit)

topTags(lrt)

Coefficient: ConditionC3

...results here...

My questions are:

1 - is my design correct by using "model.matrix(~Protocol+Conditions)"? Where did ConditionsC1 go in design table?

2 - is the coefficient "ConditionsC3" correct for this analysis? How should the contrast be in glmLRT function?

Any comments, tips or help is greatly appreciated.

Thank you very much,

I would like to use glm to ignore the bias from two different protocols and obtain the real DEG.

I have three conditions, say C1, C2 and C3.

Protocol 1 was used to obtain three replicates of C1 and three of C2.

Protocol 2 was used to obtain three replicates of C2 and C3.

I am interested in finding the DEG between C1 and C3 by taking into consideration C2 (from both protocols) in order to eliminate the protocol bias. See table below.

_________|__C1___|___C2___|___C3___|

Protocol 1 |___X___|___X____|_________|

Protocol 2 |_______|___X____|____X____|

The next are my two vectors for the matrix.design:

Protocol = p1 p1 p1 p1 p2 p1 p2 p1 p2 p2 p2 p2

Conditions= c1 c1 c1 c2 c2 c2 c2 c2 c2 c3 c3 c3

In R code we have:

Protocol = factor( c( rep('p1',3), rep(c("p1","p2"),3),rep('p2',3)) )

Conditions = factor(c(rep('c1',3),rep("c2",6),rep('c3',3)))

data.frame(Protocol,Conditions)

Protocol Conditions

1 p1 c1

2 p1 c1

3 p1 c1

4 p1 c2

5 p2 c2

6 p1 c2

7 p2 c2

8 p1 c2

9 p2 c2

10 p2 c3

11 p2 c3

12 p2 c3

design <- model.matrix(~Protocol+Conditions)

design

(Intercept) Protocolp2 Conditionsc2 Conditionsc3

1 1 0 0 0

2 1 0 0 0

3 1 0 0 0

4 1 0 1 0

5 1 1 1 0

6 1 0 1 0

7 1 1 1 0

8 1 0 1 0

9 1 1 1 0

10 1 1 0 1

11 1 1 0 1

12 1 1 0 1

attr(,"assign")

[1] 0 1 2 2

attr(,"contrasts")

attr(,"contrasts")$Protocol

[1] "contr.treatment"

attr(,"contrasts")$Conditions

[1] "contr.treatment"

After estimating dispersion, glmFit and glmLRT I have:

...

lrt <- glmLRT(fit)

topTags(lrt)

Coefficient: ConditionC3

...results here...

My questions are:

1 - is my design correct by using "model.matrix(~Protocol+Conditions)"? Where did ConditionsC1 go in design table?

2 - is the coefficient "ConditionsC3" correct for this analysis? How should the contrast be in glmLRT function?

Any comments, tips or help is greatly appreciated.

Thank you very much,

## Comment