Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • edgeR model.matrix issue for multiple factors

    Hi, there,

    I've recently started to analyze RNAseq data myself. With the help of guidebooks from a friend and the packages and forums like this, I was able to get DESeq and DESeq2 done.
    I then tried to do edgeR on the same set of data but I got some issues on the multifactor comparisons.

    Background:

    Samples were collected from wild-type (WT) or knockout (KO) mice and the mice were fed on control diet (CD) or sugar diet (SD). So I got 4 groups of samples: WT_CD, KO_CD,WT_SD,KO_SD. There are 5 animals per group and hence I have total 20 samples.

    I was also able to do comparison of any 2 groups with edgeR (e.g. WT_CD vs KO_CD or WT_CD vs WT_SD) but I got stuck when I tried to more complicated comparisons with a different design.matrix.

    So basically

    A:
    I'd like to compare the DEGs ("W") from (WT_CD vs WT_SD) with the DEGs ("X") from (KO_CD vs KO_SD) and figure out the difference between W and X.

    B:
    Similarly, I'd like to compare the DEGs ("Y") from (WT_CD vs KO_CD) with the DEGs ("Z") from (WT_SD vs KO_SD) and figure out the difference between Y and Z.

    Q1: Is it possible to do this kind of comparison in edgeR? Or I should just directly compare these DEGs?

    The input file is one merged reads-count file that I used for DESeq.

    So here are my codes:

    > library(edgeR)
    Loading required package: limma
    > rawdata <- read.delim("/Users/NGS/all20_concat.sam.count.txt", check.names=FALSE, stringsAsFactors=FALSE)
    > y <- DGEList(counts=rawdata[,2:21], genes=rawdata[,1])
    > y <- calcNormFactors(y)

    > Genotype <- factor(c("WT","WT","WT","WT","WT","KO","KO","KO","KO","KO","WT","WT","WT","WT","WT","KO","KO","KO","KO","KO"))

    > Diet <- factor(c("CD","CD","CD","CD","CD","CD","CD","CD","CD","CD","SD","SD","SD","SD","SD","SD","SD","SD","SD","SD"))

    > data.frame(Sample=colnames(y),Genotype,Diet)
    Sample Genotype Diet
    1 01.sam.count WT CD
    2 02.sam.count WT CD
    3 03.sam.count WT CD
    4 04.sam.count WT CD
    5 05.sam.count WT CD
    6 06.sam.count KO CD
    7 07.sam.count KO CD
    8 08.sam.count KO CD
    9 09.sam.count KO CD
    10 10.sam.count KO CD
    11 11.sam.count WT SD
    12 12.sam.count WT SD
    13 13.sam.count WT SD
    14 14.sam.count WT SD
    15 15.sam.count WT SD
    16 16.sam.count KO SD
    17 17.sam.count KO SD
    18 18.sam.count KO SD
    19 19.sam.count KO SD
    20 20.sam.count KO SD

    # below is the potential issue

    > design <- model.matrix(~0+Genotype+Diet+Genotypeiet)
    > colnames(design)
    [1] "GenotypeKO" "GenotypeWT" "DietSD"
    "GenotypeWTietSD"

    # Q2: I'm not sure that was the right matrix for later comparisons like: (GenotypeKOietSD-GenotypeKOietCD)-(GenotypeWTietSD-GenotypeWTietCD) ?

    # Q3: Is this (GenotypeKOietSD-GenotypeKOietCD)-(GenotypeWTietSD-GenotypeWTietCD) comparison the right one to answer my question A mentioned above?

    Setup

    > sessionInfo()
    R version 3.1.3 (2015-03-09)
    Platform: x86_64-apple-darwin13.4.0 (64-bit)
    Running under: OS X 10.10.2 (Yosemite)

    locale:
    [1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

    attached base packages:
    [1] stats graphics grDevices utils datasets methods base

    other attached packages:
    [1] edgeR_3.8.6 limma_3.22.7

    loaded via a namespace (and not attached):
    [1] tools_3.1.3

    Any input will be much appreciated.
    Last edited by neokao; 03-17-2015, 08:07 AM.

  • #2
    I'm not quite sure what you are trying to do. Are you simply trying to test the interaction: (KO_CD - KO_SD) - (WT_CD - WT_SD)?

    Have you looked at Section 3.3.1 of the edgeR User's Guide? The instructions there make it very easy to make any comparison from an experiment such as yours.

    Comment


    • #3
      Thanks for the reply. I guess that I did not express it clearly.
      Alright. So after I did simple DEG analysis, I got the DEGs from the (WT_SD vs WT_CD) comparison and also DEGs from the (KO_SD vs KO_CD) comparison. Say in the 1st DEG list, I have gene A, B, C, D, E, F, G, H, I, J up-regulated and gene N, O, P, Q, R, S, T, U, V down-regulated by SD in WT mice.
      Then in the 2nd DEG list, I have gene E, F, G, H, I, J, K, L, M up-regulated and gene N, O, P, Q, W, X,Y, Z down-regulated by SD in KO mice.
      If I’d like to know which genes are specifically up-regulated by SD only in KO mice but not up-regulated by SD in WT mice, am I supposed to do
      (KO_SD-KO_CD) - (WT_SD-WT_CD) for edgeR? Or i should just directly compare the DEG lists manually to pull out “K, L, M"?


      Thanks for suggestions.
      Last edited by neokao; 03-25-2015, 03:49 PM.

      Comment


      • #4
        Originally posted by neokao View Post
        I got the DEGs from the (WT_SD vs WT_CD) comparison and also DEGs from the (KO_SD vs KO_CD) comparison. If I’d like to know which genes are specifically up-regulated by SD only in KO mice but not up-regulated by SD in WT mice, am I supposed to do (KO_SD-KO_CD) - (WT_SD-WT_CD) for edgeR? Or i should just directly compare the DEG lists manually?
        Computing the contrast is not the same thing so, yes, you just could just compare the DEG lists. The same answer would apply in any of the packages (edgeR, DESEq2 etc).

        If you have a large number of DE genes, and you want a more stringent list, then you could select genes that are DE in the KO mice and not in the WT mice and also DE for the interaction. This ensures that you only have genes for which the KO effect is substantially greater than the WT effect.
        Last edited by Gordon Smyth; 03-25-2015, 05:21 PM.

        Comment


        • #5
          Wow...Thanks for your quick reply. That was clear.

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Choosing Between NGS and qPCR
            by seqadmin



            Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
            10-18-2024, 07:11 AM
          • seqadmin
            Non-Coding RNA Research and Technologies
            by seqadmin




            Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

            Nobel Prize for MicroRNA Discovery
            This week,...
            10-07-2024, 08:07 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 11-01-2024, 06:09 AM
          0 responses
          12 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 10-30-2024, 05:31 AM
          0 responses
          14 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 10-24-2024, 06:58 AM
          0 responses
          24 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 10-23-2024, 08:43 AM
          0 responses
          52 views
          0 likes
          Last Post seqadmin  
          Working...
          X