Hi,
I' am relatively new to bioinformatics.
I' am assisting in analyzing 7 RNA-seq samples. Two conditions: (1) experimental - 6 samples, and (2) control - 1 sample.
The input for DESeq consisted of read counts. See below for the R code for DESeq:
---------------------------------------------------------------
library(DESeq)
readCounts<-read.csv("readCounts_IndividualDigEx",row.names=1,header=TRUE)
#which data are from MPSNT and Control
group<-c(rep("MPNST",4),"Control",rep("MPNST",2))
#annotate the data
cds <- newCountDataSet(readCounts,group)
#calculate library sizes and associated affect on DE testing
cds <- estimateSizeFactors( cds )
cds <- estimateVarianceFunctions( cds )
#test for DE
res <- nbinomTest( cds, "MPNST", "Control" )
#there is no expression for a gene accross all samples... need to remove those genes
whichIsNA<-which(is.na(res[,8]))
res<-res[-whichIsNA,]
#get list of DE genes, then display some of up and down-regulated genes
resSig <- res[ res$pval < .05, ]
#up-regulated
head(resSig[order(resSig$foldChange,-resSig$baseMean),])
#down-regulated
head(resSig[order(-resSig$foldChange,-resSig$baseMean),])
---------------------------------------------------------------
See the following for example data:
a3 a4 a5 a6 a7 a8 a9
MKL2 1521 991 1058 286 1965 1376 1026
OOSP1 0 0 1 3 0 1 0
CHL1 12 11 14 5154 3565 20 4
My problem is the following. The list of DE genes severely under represents up-regulated genes (low expression in control, high in experiment). For example, with a pval<0.05 (FDR corrected): 215 genes were DE, 211 were down-regulated and 4 were up regulated.
The issue does not seem related to the biology of the samples, but perhaps an error in setting up the tool. For example, I added two mock genes to test for DE:
a3 a4 a5 a6 a7 a8 a9
test1 10000 10000 10000 10000 1000 10000 10000
test2 5000 5000 5000 5000 1000 5000 5000
The mock genes were concatenated onto the other genes. The following are the p-values for each mock gene: test1: 0.25, test2: 0.30, and the adjusted p-values for both are 1.
I understand that the tool confers much less confidence to expression measurements from conditions in which there is only one replicate, but the above outcome seems rather incorrect.
Moreover, here is a sample gene which DESeq found to be significantly up-regluated (p-value = 0.0016):
a3 a4 a5 a6 a7 a8 a9
PAX7 4346 1 0 19 0 773 86
Any advice would be greatly appreciated!
Charlie
I' am relatively new to bioinformatics.
I' am assisting in analyzing 7 RNA-seq samples. Two conditions: (1) experimental - 6 samples, and (2) control - 1 sample.
The input for DESeq consisted of read counts. See below for the R code for DESeq:
---------------------------------------------------------------
library(DESeq)
readCounts<-read.csv("readCounts_IndividualDigEx",row.names=1,header=TRUE)
#which data are from MPSNT and Control
group<-c(rep("MPNST",4),"Control",rep("MPNST",2))
#annotate the data
cds <- newCountDataSet(readCounts,group)
#calculate library sizes and associated affect on DE testing
cds <- estimateSizeFactors( cds )
cds <- estimateVarianceFunctions( cds )
#test for DE
res <- nbinomTest( cds, "MPNST", "Control" )
#there is no expression for a gene accross all samples... need to remove those genes
whichIsNA<-which(is.na(res[,8]))
res<-res[-whichIsNA,]
#get list of DE genes, then display some of up and down-regulated genes
resSig <- res[ res$pval < .05, ]
#up-regulated
head(resSig[order(resSig$foldChange,-resSig$baseMean),])
#down-regulated
head(resSig[order(-resSig$foldChange,-resSig$baseMean),])
---------------------------------------------------------------
See the following for example data:
a3 a4 a5 a6 a7 a8 a9
MKL2 1521 991 1058 286 1965 1376 1026
OOSP1 0 0 1 3 0 1 0
CHL1 12 11 14 5154 3565 20 4
My problem is the following. The list of DE genes severely under represents up-regulated genes (low expression in control, high in experiment). For example, with a pval<0.05 (FDR corrected): 215 genes were DE, 211 were down-regulated and 4 were up regulated.
The issue does not seem related to the biology of the samples, but perhaps an error in setting up the tool. For example, I added two mock genes to test for DE:
a3 a4 a5 a6 a7 a8 a9
test1 10000 10000 10000 10000 1000 10000 10000
test2 5000 5000 5000 5000 1000 5000 5000
The mock genes were concatenated onto the other genes. The following are the p-values for each mock gene: test1: 0.25, test2: 0.30, and the adjusted p-values for both are 1.
I understand that the tool confers much less confidence to expression measurements from conditions in which there is only one replicate, but the above outcome seems rather incorrect.
Moreover, here is a sample gene which DESeq found to be significantly up-regluated (p-value = 0.0016):
a3 a4 a5 a6 a7 a8 a9
PAX7 4346 1 0 19 0 773 86
Any advice would be greatly appreciated!
Charlie
Comment