Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq with incomplete replicates

    Hi Simon and others,

    I have an unusual dataset to work with that does not derive from RNAseq, but produces comparable count data for two conditions for which I would like to identify the significantly changing genes.

    I have worked with similar datasets using DESeq with good results. However, for the current dataset I do not have a complete biological replicate - because of the way the experiments were carried out and the associated cost, replicates could only be obtained for a subset of genes (~1000 out of 8000). I have split the dataset into two CountDataSets (rep.cds and norep.cds) and used DESeq on the replicated genes as usual:

    rep.cds<-estimateSizeFactors(rep.cds)

    rep.cds<-estimateDispersions(rep.cds)

    rep.res<-nbinomTest(rep.cds, "cond1", "cond2")


    I then wanted to use the model fitted to the replicate data to estimate dispersions in the non-replicated dataset. I used the following syntax, which includes a bit of trial-and-error fiddling to circumvent error messages:

    nonrep.cds<-estimateSizeFactors(nonrep.cds)

    nonrep.cds<-estimateDispersions(nonrep.cds, method="blind", sharing-mode="fit-only", fitType="local")

    fData(nonrep.cds)<-as.data.frame(fitInfo(rep.cds)$dispFun(rowMeans(counts(rep.cds, normalized=T))))

    fvarLabels(nonrep.cds)<-"disp_blind"

    nonrep.res<-nbinomTest(nonrep.res, "cond1", "cond2")

    all.res<-rbind(rep.res, nonrep.res)

    I get biologically plausible results, which I am happy with - certainly much happier than ignoring the replicates and just using the method="blind" approach for non-replicated data. What I would like to know is:
    1. Is my approach sane?
    2. If it is, am I going about it in the right way?

    I'd be grateful for any comments you could offer.
    Cheers,
    Roy.

  • #2
    Your approach should be fine (even though without knowing more about the technique you used -- how come you have replicates for some but not all genes? -- it is hard to say for sure.)


    This line is odd:

    Originally posted by Roy View Post
    fData(nonrep.cds <-as.data.frame(fitInfo(rep.cds)$dispFun(rowMeans(counts(rep.cds, normalized=T))))
    You should apply the dispFun from rep.cds on the rowMeans of the counts from nonrep.cds. Your version should give an error as the number of genes is different between rep.cds and nonrep.cds.

    Simon

    Comment


    • #3
      Hi Simon,

      Thanks for the quick response, much appreciated.

      Originally posted by Simon Anders View Post
      Your approach should be fine (even though without knowing more about the technique you used -- how come you have replicates for some but not all genes? -- it is hard to say for sure.)
      I'm actually analysing data from Transposon sequencing (called Tn-seq or TraDIS in the literature). The counts for each "gene" actually correspond to different bacterial mutants, and I'm using DEseq to assess their relative abundance before and after a selective screen as a measure of fitness - this is analogous to looking for differential expression of transcripts between 2 conditions. Some of the experiments are limited in the numbers of mutants it is possible to screen at once (since the total number of bacterial cells is restricted, and you need a reasonable number of each mutant to avoid stochastic effects), so we divided the mutants into subgroups and screened each separately, before combining the extracted DNA for sequencing. As the screens are expensive it was not possible to perform replicates for the full set of mutants, only a subset.

      Originally posted by Simon Anders View Post
      You should apply the dispFun from rep.cds on the rowMeans of the counts from nonrep.cds. Your version should give an error as the number of genes is different between rep.cds and nonrep.cds.
      Sorry, that was a typo, yes, it should be the rowMeans from nonrep.cds.

      Another concern - since I am analysing the the replicated and non-replicated rows separately, the p-value adjustments do not take into account the total number of tests. Should I re-run the p.adjust method on the P-values in the combined all.res table?

      Cheers,
      Roy.

      Comment


      • #4
        Originally posted by Roy View Post
        Another concern - since I am analysing the the replicated and non-replicated rows separately, the p-value adjustments do not take into account the total number of tests. Should I re-run the p.adjust method on the P-values in the combined all.res table?
        Might be better. Shouldn't make too much of a difference, though.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Best Practices for Single-Cell Sequencing Analysis
          by seqadmin



          While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
          06-06-2024, 07:15 AM
        • seqadmin
          Latest Developments in Precision Medicine
          by seqadmin



          Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

          Somatic Genomics
          “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
          05-24-2024, 01:16 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 06-17-2024, 06:54 AM
        0 responses
        11 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 06-14-2024, 07:24 AM
        0 responses
        22 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 06-13-2024, 08:58 AM
        0 responses
        17 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 06-12-2024, 02:20 PM
        0 responses
        20 views
        0 likes
        Last Post seqadmin  
        Working...
        X