Hi All,
I was wondering if you could look over my R code for differential gene expression using EdjeR. I am looking to determine differential gene expression between wild type (WT) cells and knockout cells (KO). Three biological replicates were grown for each cell line and RNA was harvested. The paired end reads were mapped using STAR. Exon counts were obtained using feature counts. The exon counts were then used for the R code. Would this be sufficient to determine differential gene expression between WT and KO?
library("edgeR") library("gdata") library("heatmaply") library("ggplot2") library("genefilter") library("methylumi")
setwd("/Users/jwlandry2/Desktop/RNA-Seq\ ESC\ Data\ Analysis/R_Studio_Data_Sets")
counts=read.delim("BPTFKD_ESC_RNA_Seq_Counts_Final2.txt", header=T, row.names="Geneid")
head(counts)
group <- factor(c("WT","WT","WT","KO","KO","KO"))
dge = DGEList(counts=counts,group=group)
keep <- rowSums(cpm(dge)>2) >= 3 dge <- dge[keep, , keep.lib.sizes=FALSE]
Normalization
y <- calcNormFactors(dge) design <- model.matrix(~group) y <- estimateDisp(y,design) y$common.dispersion
plotMDS(y) plotBCV(y)
quasi-likelihood F-test
fit <- glmQLFit(y, design) qlf <- glmQLFTest(fit, coef=2) topTags(qlf)
output to text file
df <- qlf$table write.csv(df, 'qlf.csv')
Get normalizaed CPMs
mtx <- cpm(y, log = TRUE, normalized.lib.sizes = TRUE) mtx_to_plot <- varFilter(mtx, var.cutoff= 0.95)
Correlation matrix
IAC <- mtx_to_plot %>% cor(. , use = "pairwise.complete.obs", method = "pearson") heatmaply(IAC) %>% layout(margin=list(l=200,b=150))
Dendrogram
plot(hclust(as.dist(1-IAC), method="ward"))
I was wondering if you could look over my R code for differential gene expression using EdjeR. I am looking to determine differential gene expression between wild type (WT) cells and knockout cells (KO). Three biological replicates were grown for each cell line and RNA was harvested. The paired end reads were mapped using STAR. Exon counts were obtained using feature counts. The exon counts were then used for the R code. Would this be sufficient to determine differential gene expression between WT and KO?
library("edgeR") library("gdata") library("heatmaply") library("ggplot2") library("genefilter") library("methylumi")
setwd("/Users/jwlandry2/Desktop/RNA-Seq\ ESC\ Data\ Analysis/R_Studio_Data_Sets")
counts=read.delim("BPTFKD_ESC_RNA_Seq_Counts_Final2.txt", header=T, row.names="Geneid")
head(counts)
group <- factor(c("WT","WT","WT","KO","KO","KO"))
dge = DGEList(counts=counts,group=group)
keep <- rowSums(cpm(dge)>2) >= 3 dge <- dge[keep, , keep.lib.sizes=FALSE]
Normalization
y <- calcNormFactors(dge) design <- model.matrix(~group) y <- estimateDisp(y,design) y$common.dispersion
plotMDS(y) plotBCV(y)
quasi-likelihood F-test
fit <- glmQLFit(y, design) qlf <- glmQLFTest(fit, coef=2) topTags(qlf)
output to text file
df <- qlf$table write.csv(df, 'qlf.csv')
Get normalizaed CPMs
mtx <- cpm(y, log = TRUE, normalized.lib.sizes = TRUE) mtx_to_plot <- varFilter(mtx, var.cutoff= 0.95)
Correlation matrix
IAC <- mtx_to_plot %>% cor(. , use = "pairwise.complete.obs", method = "pearson") heatmaply(IAC) %>% layout(margin=list(l=200,b=150))
Dendrogram
plot(hclust(as.dist(1-IAC), method="ward"))