Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • edgeR GLM model - factorial design with batch effects

    Hello,

    I am analyzing some RNA-Seq data with edgeR, and I have a question about how to set up the GLM model.matrix. My experiment is a 2X2 factorial design where both WT and KO animals are subjected to either of two treatments. The 4 groups are WT-Oil, WT-E2, KO-Oil, and KO-E2. I am interested in determining which genes are DE in the WT-E2 group (our phenotype after treating with E2 is rescued in similarly treated KO animals).

    Here is the code that I used

    > data.frame(Sample=colnames(a),genotype2,trt2)
    Sample genotype2 trt2
    1 X1_WT_Oil_140710 WT OIL
    2 X5_WT_Oil_140710 WT OIL
    3 X1_WT_Oil_141109 WT OIL
    4 X2_WT_Oil_141109 WT OIL
    5 X2_WT_E2_140710 WT E2
    6 X6_WT_E2_140710 WT E2
    7 X3_WT_E2_141109 WT E2
    8 X4_WT_E2_141109 WT E2
    9 X3_KO_Oil_140710 KO OIL
    10 X8_KO_Oil_140710 KO OIL
    11 X4_KO_E2_140710 KO E2
    12 X5_KO_E2_141109 KO E2
    13 X6_KO_E2_141109 KO E2
    14 X7_KO_E2_141109 KO E2

    > design2 <- model.matrix(~0+genotype2:trt2)
    > design2
    genotype2KO:trt2E2 genotype2WT:trt2E2 genotype2KO:trt2OIL genotype2WT:trt2OIL
    1 0 0 0 1
    2 0 0 0 1
    3 0 0 0 1
    4 0 0 0 1
    5 0 1 0 0
    6 0 1 0 0
    7 0 1 0 0
    8 0 1 0 0
    9 0 0 1 0
    10 0 0 1 0
    11 1 0 0 0
    12 1 0 0 0
    13 1 0 0 0
    14 1 0 0 0
    attr(,"assign")
    [1] 1 1 1 1
    attr(,"contrasts")
    attr(,"contrasts")$genotype2
    [1] "contr.treatment"

    attr(,"contrasts")$trt2
    [1] "contr.treatment"

    > b <- estimateGLMCommonDisp(b,design2,verbose=TRUE)
    Disp = 0.20378 , BCV = 0.4514
    > fit <- glmFit(b,design2)
    > lrt <- glmLRT(fit,contrast=c(-1,3,-1,-1))

    This worked just fine, but now I would like to reanalyze this data taking batch effects into account. The last part of the sample IDs (140710 and 141109) indicate the two separate dates in which the samples were collected/sequenced. The new data frame would look like this:

    > data.frame(Sample=colnames(a),genotype2,trt2,chip2)
    Sample genotype2 trt2 chip2
    1 X1_WT_Oil_140710 WT OIL 1
    2 X5_WT_Oil_140710 WT OIL 1
    3 X1_WT_Oil_141109 WT OIL 2
    4 X2_WT_Oil_141109 WT OIL 2
    5 X2_WT_E2_140710 WT E2 1
    6 X6_WT_E2_140710 WT E2 1
    7 X3_WT_E2_141109 WT E2 2
    8 X4_WT_E2_141109 WT E2 2
    9 X3_KO_Oil_140710 KO OIL 1
    10 X8_KO_Oil_140710 KO OIL 1
    11 X4_KO_E2_140710 KO E2 1
    12 X5_KO_E2_141109 KO E2 2
    13 X6_KO_E2_141109 KO E2 2
    14 X7_KO_E2_141109 KO E2 2

    The manual doesn't describe how to set up the model.matrix when there are two factors plus batch effects, and I'm struggling setting this up.

    Could somebody guide me as to how to properly define the model.matrix as well as properly define the correct contrasts in the glmLRT function? Thanks!
    Last edited by rhcr56; 01-13-2015, 04:38 PM. Reason: highlight important changes

  • #2
    I never got the fixation with treating a simple factorial design as if it's not. You can skip the contrasts in your first design and simply use:
    Code:
    design2 <- model.matrix(~genotype2*trt2)
    You then don't need any contrasts:
    Code:
    fit <- glmFit(b,design2)
    lrt <- glmLRT(fit) #You'll get the last coefficient, which will be the interaction, by default.
    BTW, just make sure that KO and OIL are the base levels of their respective factors.

    This also lends itself nicely to adding a batch (not that it's significantly different using the "collapse the groups and make contrasts" example, where you will still contrast the same groups):
    Code:
    design = model.matrix(~batch+genotype2*trt2)

    Comment


    • #3
      Thanks for the reply. This was my first inclination, however I'm confused about the last coefficient. The code you provided was used below

      > design3 <- model.matrix(~genotype2*trt2)
      > fit <- glmFit(b,design3)
      > lrt2 <- glmLRT(fit)
      > topTags(lrt2)
      Coefficient: genotype2WT:trt2OIL
      logFC logCPM LR PValue FDR
      1600002K03Rik -8.515051 2.068219 37.67803 8.343907e-10 8.445590e-06
      Mettl21c 9.151695 2.339162 36.03935 1.933725e-09 8.445590e-06
      Il10ra 9.719101 2.009320 35.98865 1.984707e-09 8.445590e-06
      Mpo 5.802613 3.754349 34.41918 4.443238e-09 1.265384e-05
      Cd177 7.962947 2.954040 34.20659 4.956072e-09 1.265384e-05
      Hspa1a -4.576308 4.852461 33.63713 6.641285e-09 1.413044e-05
      Tpte 7.864546 2.116211 32.57592 1.146264e-08 2.090457e-05
      Bace2 -10.589050 3.180579 31.81367 1.696943e-08 2.707897e-05
      Hspa1b -4.314080 4.794562 30.50166 3.335810e-08 4.731661e-05
      Fuom -8.057760 1.793066 30.06869 4.170094e-08 5.323542e-05

      If I am interpreting this correctly, this will identify genes that are DE in the WT-Oil group compared to every other group. I am interested in those DE genes in the WT-E2 group. Do I need to use a different coefficient? When I used the contrasts, I got the appropriate test.

      > topTags(lrt,n=343)
      Coefficient: -1*genotype2KO:trt2E2 3*genotype2WT:trt2E2 -1*genotype2KO:trt2OIL -1*genotype2WT:trt2OIL

      Thanks

      Comment


      • #4
        I'll reiterate my previous "BTW":
        BTW, just make sure that KO and OIL are the base levels of their respective factors.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          The Impact of AI in Genomic Medicine
          by seqadmin



          Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
          02-26-2024, 02:07 PM
        • seqadmin
          Multiomics Techniques Advancing Disease Research
          by seqadmin


          New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

          A major leap in the field has
          ...
          02-08-2024, 06:33 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 02-23-2024, 04:11 PM
        0 responses
        57 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 02-21-2024, 08:52 AM
        0 responses
        67 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 02-20-2024, 08:57 AM
        0 responses
        56 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 02-14-2024, 09:19 AM
        0 responses
        65 views
        0 likes
        Last Post seqadmin  
        Working...
        X