Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • edgeR GLM model - factorial design with batch effects

    Hello,

    I am analyzing some RNA-Seq data with edgeR, and I have a question about how to set up the GLM model.matrix. My experiment is a 2X2 factorial design where both WT and KO animals are subjected to either of two treatments. The 4 groups are WT-Oil, WT-E2, KO-Oil, and KO-E2. I am interested in determining which genes are DE in the WT-E2 group (our phenotype after treating with E2 is rescued in similarly treated KO animals).

    Here is the code that I used

    > data.frame(Sample=colnames(a),genotype2,trt2)
    Sample genotype2 trt2
    1 X1_WT_Oil_140710 WT OIL
    2 X5_WT_Oil_140710 WT OIL
    3 X1_WT_Oil_141109 WT OIL
    4 X2_WT_Oil_141109 WT OIL
    5 X2_WT_E2_140710 WT E2
    6 X6_WT_E2_140710 WT E2
    7 X3_WT_E2_141109 WT E2
    8 X4_WT_E2_141109 WT E2
    9 X3_KO_Oil_140710 KO OIL
    10 X8_KO_Oil_140710 KO OIL
    11 X4_KO_E2_140710 KO E2
    12 X5_KO_E2_141109 KO E2
    13 X6_KO_E2_141109 KO E2
    14 X7_KO_E2_141109 KO E2

    > design2 <- model.matrix(~0+genotype2:trt2)
    > design2
    genotype2KO:trt2E2 genotype2WT:trt2E2 genotype2KO:trt2OIL genotype2WT:trt2OIL
    1 0 0 0 1
    2 0 0 0 1
    3 0 0 0 1
    4 0 0 0 1
    5 0 1 0 0
    6 0 1 0 0
    7 0 1 0 0
    8 0 1 0 0
    9 0 0 1 0
    10 0 0 1 0
    11 1 0 0 0
    12 1 0 0 0
    13 1 0 0 0
    14 1 0 0 0
    attr(,"assign")
    [1] 1 1 1 1
    attr(,"contrasts")
    attr(,"contrasts")$genotype2
    [1] "contr.treatment"

    attr(,"contrasts")$trt2
    [1] "contr.treatment"

    > b <- estimateGLMCommonDisp(b,design2,verbose=TRUE)
    Disp = 0.20378 , BCV = 0.4514
    > fit <- glmFit(b,design2)
    > lrt <- glmLRT(fit,contrast=c(-1,3,-1,-1))

    This worked just fine, but now I would like to reanalyze this data taking batch effects into account. The last part of the sample IDs (140710 and 141109) indicate the two separate dates in which the samples were collected/sequenced. The new data frame would look like this:

    > data.frame(Sample=colnames(a),genotype2,trt2,chip2)
    Sample genotype2 trt2 chip2
    1 X1_WT_Oil_140710 WT OIL 1
    2 X5_WT_Oil_140710 WT OIL 1
    3 X1_WT_Oil_141109 WT OIL 2
    4 X2_WT_Oil_141109 WT OIL 2
    5 X2_WT_E2_140710 WT E2 1
    6 X6_WT_E2_140710 WT E2 1
    7 X3_WT_E2_141109 WT E2 2
    8 X4_WT_E2_141109 WT E2 2
    9 X3_KO_Oil_140710 KO OIL 1
    10 X8_KO_Oil_140710 KO OIL 1
    11 X4_KO_E2_140710 KO E2 1
    12 X5_KO_E2_141109 KO E2 2
    13 X6_KO_E2_141109 KO E2 2
    14 X7_KO_E2_141109 KO E2 2

    The manual doesn't describe how to set up the model.matrix when there are two factors plus batch effects, and I'm struggling setting this up.

    Could somebody guide me as to how to properly define the model.matrix as well as properly define the correct contrasts in the glmLRT function? Thanks!
    Last edited by rhcr56; 01-13-2015, 04:38 PM. Reason: highlight important changes

  • #2
    I never got the fixation with treating a simple factorial design as if it's not. You can skip the contrasts in your first design and simply use:
    Code:
    design2 <- model.matrix(~genotype2*trt2)
    You then don't need any contrasts:
    Code:
    fit <- glmFit(b,design2)
    lrt <- glmLRT(fit) #You'll get the last coefficient, which will be the interaction, by default.
    BTW, just make sure that KO and OIL are the base levels of their respective factors.

    This also lends itself nicely to adding a batch (not that it's significantly different using the "collapse the groups and make contrasts" example, where you will still contrast the same groups):
    Code:
    design = model.matrix(~batch+genotype2*trt2)

    Comment


    • #3
      Thanks for the reply. This was my first inclination, however I'm confused about the last coefficient. The code you provided was used below

      > design3 <- model.matrix(~genotype2*trt2)
      > fit <- glmFit(b,design3)
      > lrt2 <- glmLRT(fit)
      > topTags(lrt2)
      Coefficient: genotype2WT:trt2OIL
      logFC logCPM LR PValue FDR
      1600002K03Rik -8.515051 2.068219 37.67803 8.343907e-10 8.445590e-06
      Mettl21c 9.151695 2.339162 36.03935 1.933725e-09 8.445590e-06
      Il10ra 9.719101 2.009320 35.98865 1.984707e-09 8.445590e-06
      Mpo 5.802613 3.754349 34.41918 4.443238e-09 1.265384e-05
      Cd177 7.962947 2.954040 34.20659 4.956072e-09 1.265384e-05
      Hspa1a -4.576308 4.852461 33.63713 6.641285e-09 1.413044e-05
      Tpte 7.864546 2.116211 32.57592 1.146264e-08 2.090457e-05
      Bace2 -10.589050 3.180579 31.81367 1.696943e-08 2.707897e-05
      Hspa1b -4.314080 4.794562 30.50166 3.335810e-08 4.731661e-05
      Fuom -8.057760 1.793066 30.06869 4.170094e-08 5.323542e-05

      If I am interpreting this correctly, this will identify genes that are DE in the WT-Oil group compared to every other group. I am interested in those DE genes in the WT-E2 group. Do I need to use a different coefficient? When I used the contrasts, I got the appropriate test.

      > topTags(lrt,n=343)
      Coefficient: -1*genotype2KO:trt2E2 3*genotype2WT:trt2E2 -1*genotype2KO:trt2OIL -1*genotype2WT:trt2OIL

      Thanks

      Comment


      • #4
        I'll reiterate my previous "BTW":
        BTW, just make sure that KO and OIL are the base levels of their respective factors.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Recent Advances in Sequencing Technologies
          by seqadmin







          Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.

          Long-Read Sequencing
          Long-read sequencing has...
          12-02-2024, 01:49 PM
        • seqadmin
          Genetic Variation in Immunogenetics and Antibody Diversity
          by seqadmin



          The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
          11-06-2024, 07:24 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 12-02-2024, 09:29 AM
        0 responses
        148 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-02-2024, 09:06 AM
        0 responses
        51 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 12-02-2024, 08:03 AM
        0 responses
        42 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 11-22-2024, 07:36 AM
        0 responses
        73 views
        0 likes
        Last Post seqadmin  
        Working...
        X