Hi Simon and others,
I have an unusual dataset to work with that does not derive from RNAseq, but produces comparable count data for two conditions for which I would like to identify the significantly changing genes.
I have worked with similar datasets using DESeq with good results. However, for the current dataset I do not have a complete biological replicate - because of the way the experiments were carried out and the associated cost, replicates could only be obtained for a subset of genes (~1000 out of 8000). I have split the dataset into two CountDataSets (rep.cds and norep.cds) and used DESeq on the replicated genes as usual:
rep.cds<-estimateSizeFactors(rep.cds)
rep.cds<-estimateDispersions(rep.cds)
rep.res<-nbinomTest(rep.cds, "cond1", "cond2")
I then wanted to use the model fitted to the replicate data to estimate dispersions in the non-replicated dataset. I used the following syntax, which includes a bit of trial-and-error fiddling to circumvent error messages:
nonrep.cds<-estimateSizeFactors(nonrep.cds)
nonrep.cds<-estimateDispersions(nonrep.cds, method="blind", sharing-mode="fit-only", fitType="local")
fData(nonrep.cds)<-as.data.frame(fitInfo(rep.cds)$dispFun(rowMeans(counts(rep.cds, normalized=T))))
fvarLabels(nonrep.cds)<-"disp_blind"
nonrep.res<-nbinomTest(nonrep.res, "cond1", "cond2")
all.res<-rbind(rep.res, nonrep.res)
I get biologically plausible results, which I am happy with - certainly much happier than ignoring the replicates and just using the method="blind" approach for non-replicated data. What I would like to know is:
1. Is my approach sane?
2. If it is, am I going about it in the right way?
I'd be grateful for any comments you could offer.
Cheers,
Roy.
I have an unusual dataset to work with that does not derive from RNAseq, but produces comparable count data for two conditions for which I would like to identify the significantly changing genes.
I have worked with similar datasets using DESeq with good results. However, for the current dataset I do not have a complete biological replicate - because of the way the experiments were carried out and the associated cost, replicates could only be obtained for a subset of genes (~1000 out of 8000). I have split the dataset into two CountDataSets (rep.cds and norep.cds) and used DESeq on the replicated genes as usual:
rep.cds<-estimateSizeFactors(rep.cds)
rep.cds<-estimateDispersions(rep.cds)
rep.res<-nbinomTest(rep.cds, "cond1", "cond2")
I then wanted to use the model fitted to the replicate data to estimate dispersions in the non-replicated dataset. I used the following syntax, which includes a bit of trial-and-error fiddling to circumvent error messages:
nonrep.cds<-estimateSizeFactors(nonrep.cds)
nonrep.cds<-estimateDispersions(nonrep.cds, method="blind", sharing-mode="fit-only", fitType="local")
fData(nonrep.cds)<-as.data.frame(fitInfo(rep.cds)$dispFun(rowMeans(counts(rep.cds, normalized=T))))
fvarLabels(nonrep.cds)<-"disp_blind"
nonrep.res<-nbinomTest(nonrep.res, "cond1", "cond2")
all.res<-rbind(rep.res, nonrep.res)
I get biologically plausible results, which I am happy with - certainly much happier than ignoring the replicates and just using the method="blind" approach for non-replicated data. What I would like to know is:
1. Is my approach sane?
2. If it is, am I going about it in the right way?
I'd be grateful for any comments you could offer.
Cheers,
Roy.
Comment