Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Roy
    Member
    • Oct 2009
    • 17

    DESeq with incomplete replicates

    Hi Simon and others,

    I have an unusual dataset to work with that does not derive from RNAseq, but produces comparable count data for two conditions for which I would like to identify the significantly changing genes.

    I have worked with similar datasets using DESeq with good results. However, for the current dataset I do not have a complete biological replicate - because of the way the experiments were carried out and the associated cost, replicates could only be obtained for a subset of genes (~1000 out of 8000). I have split the dataset into two CountDataSets (rep.cds and norep.cds) and used DESeq on the replicated genes as usual:

    rep.cds<-estimateSizeFactors(rep.cds)

    rep.cds<-estimateDispersions(rep.cds)

    rep.res<-nbinomTest(rep.cds, "cond1", "cond2")


    I then wanted to use the model fitted to the replicate data to estimate dispersions in the non-replicated dataset. I used the following syntax, which includes a bit of trial-and-error fiddling to circumvent error messages:

    nonrep.cds<-estimateSizeFactors(nonrep.cds)

    nonrep.cds<-estimateDispersions(nonrep.cds, method="blind", sharing-mode="fit-only", fitType="local")

    fData(nonrep.cds)<-as.data.frame(fitInfo(rep.cds)$dispFun(rowMeans(counts(rep.cds, normalized=T))))

    fvarLabels(nonrep.cds)<-"disp_blind"

    nonrep.res<-nbinomTest(nonrep.res, "cond1", "cond2")

    all.res<-rbind(rep.res, nonrep.res)

    I get biologically plausible results, which I am happy with - certainly much happier than ignoring the replicates and just using the method="blind" approach for non-replicated data. What I would like to know is:
    1. Is my approach sane?
    2. If it is, am I going about it in the right way?

    I'd be grateful for any comments you could offer.
    Cheers,
    Roy.
  • Simon Anders
    Senior Member
    • Feb 2010
    • 995

    #2
    Your approach should be fine (even though without knowing more about the technique you used -- how come you have replicates for some but not all genes? -- it is hard to say for sure.)


    This line is odd:

    Originally posted by Roy View Post
    fData(nonrep.cds <-as.data.frame(fitInfo(rep.cds)$dispFun(rowMeans(counts(rep.cds, normalized=T))))
    You should apply the dispFun from rep.cds on the rowMeans of the counts from nonrep.cds. Your version should give an error as the number of genes is different between rep.cds and nonrep.cds.

    Simon

    Comment

    • Roy
      Member
      • Oct 2009
      • 17

      #3
      Hi Simon,

      Thanks for the quick response, much appreciated.

      Originally posted by Simon Anders View Post
      Your approach should be fine (even though without knowing more about the technique you used -- how come you have replicates for some but not all genes? -- it is hard to say for sure.)
      I'm actually analysing data from Transposon sequencing (called Tn-seq or TraDIS in the literature). The counts for each "gene" actually correspond to different bacterial mutants, and I'm using DEseq to assess their relative abundance before and after a selective screen as a measure of fitness - this is analogous to looking for differential expression of transcripts between 2 conditions. Some of the experiments are limited in the numbers of mutants it is possible to screen at once (since the total number of bacterial cells is restricted, and you need a reasonable number of each mutant to avoid stochastic effects), so we divided the mutants into subgroups and screened each separately, before combining the extracted DNA for sequencing. As the screens are expensive it was not possible to perform replicates for the full set of mutants, only a subset.

      Originally posted by Simon Anders View Post
      You should apply the dispFun from rep.cds on the rowMeans of the counts from nonrep.cds. Your version should give an error as the number of genes is different between rep.cds and nonrep.cds.
      Sorry, that was a typo, yes, it should be the rowMeans from nonrep.cds.

      Another concern - since I am analysing the the replicated and non-replicated rows separately, the p-value adjustments do not take into account the total number of tests. Should I re-run the p.adjust method on the P-values in the combined all.res table?

      Cheers,
      Roy.

      Comment

      • Simon Anders
        Senior Member
        • Feb 2010
        • 995

        #4
        Originally posted by Roy View Post
        Another concern - since I am analysing the the replicated and non-replicated rows separately, the p-value adjustments do not take into account the total number of tests. Should I re-run the p.adjust method on the P-values in the combined all.res table?
        Might be better. Shouldn't make too much of a difference, though.

        Comment

        Latest Articles

        Collapse

        • SEQadmin2
          Nine Things a Sample Prep Scientist Thinks About Before Sequencing
          by SEQadmin2


          I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


          Here are nine questions we think about, in roughly the order they matter, before...
          06-18-2026, 07:11 AM
        • SEQadmin2
          From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
          by SEQadmin2


          Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


          The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
          ...
          06-02-2026, 10:05 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by SEQadmin2, 06-17-2026, 06:09 AM
        0 responses
        30 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-09-2026, 11:58 AM
        0 responses
        44 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-05-2026, 10:09 AM
        0 responses
        49 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-04-2026, 08:59 AM
        0 responses
        51 views
        0 reactions
        Last Post SEQadmin2  
        Working...