Hi, there,
I've recently started to analyze RNAseq data myself. With the help of guidebooks from a friend and the packages and forums like this, I was able to get DESeq and DESeq2 done.
I then tried to do edgeR on the same set of data but I got some issues on the multifactor comparisons.
Background:
Samples were collected from wild-type (WT) or knockout (KO) mice and the mice were fed on control diet (CD) or sugar diet (SD). So I got 4 groups of samples: WT_CD, KO_CD,WT_SD,KO_SD. There are 5 animals per group and hence I have total 20 samples.
I was also able to do comparison of any 2 groups with edgeR (e.g. WT_CD vs KO_CD or WT_CD vs WT_SD) but I got stuck when I tried to more complicated comparisons with a different design.matrix.
So basically
A:
I'd like to compare the DEGs ("W") from (WT_CD vs WT_SD) with the DEGs ("X") from (KO_CD vs KO_SD) and figure out the difference between W and X.
B:
Similarly, I'd like to compare the DEGs ("Y") from (WT_CD vs KO_CD) with the DEGs ("Z") from (WT_SD vs KO_SD) and figure out the difference between Y and Z.
Q1: Is it possible to do this kind of comparison in edgeR? Or I should just directly compare these DEGs?
The input file is one merged reads-count file that I used for DESeq.
So here are my codes:
> library(edgeR)
Loading required package: limma
> rawdata <- read.delim("/Users/NGS/all20_concat.sam.count.txt", check.names=FALSE, stringsAsFactors=FALSE)
> y <- DGEList(counts=rawdata[,2:21], genes=rawdata[,1])
> y <- calcNormFactors(y)
> Genotype <- factor(c("WT","WT","WT","WT","WT","KO","KO","KO","KO","KO","WT","WT","WT","WT","WT","KO","KO","KO","KO","KO"))
> Diet <- factor(c("CD","CD","CD","CD","CD","CD","CD","CD","CD","CD","SD","SD","SD","SD","SD","SD","SD","SD","SD","SD"))
> data.frame(Sample=colnames(y),Genotype,Diet)
Sample Genotype Diet
1 01.sam.count WT CD
2 02.sam.count WT CD
3 03.sam.count WT CD
4 04.sam.count WT CD
5 05.sam.count WT CD
6 06.sam.count KO CD
7 07.sam.count KO CD
8 08.sam.count KO CD
9 09.sam.count KO CD
10 10.sam.count KO CD
11 11.sam.count WT SD
12 12.sam.count WT SD
13 13.sam.count WT SD
14 14.sam.count WT SD
15 15.sam.count WT SD
16 16.sam.count KO SD
17 17.sam.count KO SD
18 18.sam.count KO SD
19 19.sam.count KO SD
20 20.sam.count KO SD
# below is the potential issue
> design <- model.matrix(~0+Genotype+Diet+Genotypeiet)
> colnames(design)
[1] "GenotypeKO" "GenotypeWT" "DietSD"
"GenotypeWTietSD"
# Q2: I'm not sure that was the right matrix for later comparisons like: (GenotypeKOietSD-GenotypeKOietCD)-(GenotypeWTietSD-GenotypeWTietCD) ?
# Q3: Is this (GenotypeKOietSD-GenotypeKOietCD)-(GenotypeWTietSD-GenotypeWTietCD) comparison the right one to answer my question A mentioned above?
Setup
> sessionInfo()
R version 3.1.3 (2015-03-09)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.2 (Yosemite)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] edgeR_3.8.6 limma_3.22.7
loaded via a namespace (and not attached):
[1] tools_3.1.3
Any input will be much appreciated.
I've recently started to analyze RNAseq data myself. With the help of guidebooks from a friend and the packages and forums like this, I was able to get DESeq and DESeq2 done.
I then tried to do edgeR on the same set of data but I got some issues on the multifactor comparisons.
Background:
Samples were collected from wild-type (WT) or knockout (KO) mice and the mice were fed on control diet (CD) or sugar diet (SD). So I got 4 groups of samples: WT_CD, KO_CD,WT_SD,KO_SD. There are 5 animals per group and hence I have total 20 samples.
I was also able to do comparison of any 2 groups with edgeR (e.g. WT_CD vs KO_CD or WT_CD vs WT_SD) but I got stuck when I tried to more complicated comparisons with a different design.matrix.
So basically
A:
I'd like to compare the DEGs ("W") from (WT_CD vs WT_SD) with the DEGs ("X") from (KO_CD vs KO_SD) and figure out the difference between W and X.
B:
Similarly, I'd like to compare the DEGs ("Y") from (WT_CD vs KO_CD) with the DEGs ("Z") from (WT_SD vs KO_SD) and figure out the difference between Y and Z.
Q1: Is it possible to do this kind of comparison in edgeR? Or I should just directly compare these DEGs?
The input file is one merged reads-count file that I used for DESeq.
So here are my codes:
> library(edgeR)
Loading required package: limma
> rawdata <- read.delim("/Users/NGS/all20_concat.sam.count.txt", check.names=FALSE, stringsAsFactors=FALSE)
> y <- DGEList(counts=rawdata[,2:21], genes=rawdata[,1])
> y <- calcNormFactors(y)
> Genotype <- factor(c("WT","WT","WT","WT","WT","KO","KO","KO","KO","KO","WT","WT","WT","WT","WT","KO","KO","KO","KO","KO"))
> Diet <- factor(c("CD","CD","CD","CD","CD","CD","CD","CD","CD","CD","SD","SD","SD","SD","SD","SD","SD","SD","SD","SD"))
> data.frame(Sample=colnames(y),Genotype,Diet)
Sample Genotype Diet
1 01.sam.count WT CD
2 02.sam.count WT CD
3 03.sam.count WT CD
4 04.sam.count WT CD
5 05.sam.count WT CD
6 06.sam.count KO CD
7 07.sam.count KO CD
8 08.sam.count KO CD
9 09.sam.count KO CD
10 10.sam.count KO CD
11 11.sam.count WT SD
12 12.sam.count WT SD
13 13.sam.count WT SD
14 14.sam.count WT SD
15 15.sam.count WT SD
16 16.sam.count KO SD
17 17.sam.count KO SD
18 18.sam.count KO SD
19 19.sam.count KO SD
20 20.sam.count KO SD
# below is the potential issue
> design <- model.matrix(~0+Genotype+Diet+Genotypeiet)
> colnames(design)
[1] "GenotypeKO" "GenotypeWT" "DietSD"
"GenotypeWTietSD"
# Q2: I'm not sure that was the right matrix for later comparisons like: (GenotypeKOietSD-GenotypeKOietCD)-(GenotypeWTietSD-GenotypeWTietCD) ?
# Q3: Is this (GenotypeKOietSD-GenotypeKOietCD)-(GenotypeWTietSD-GenotypeWTietCD) comparison the right one to answer my question A mentioned above?
Setup
> sessionInfo()
R version 3.1.3 (2015-03-09)
Platform: x86_64-apple-darwin13.4.0 (64-bit)
Running under: OS X 10.10.2 (Yosemite)
locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] edgeR_3.8.6 limma_3.22.7
loaded via a namespace (and not attached):
[1] tools_3.1.3
Any input will be much appreciated.
Comment