Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • jmgrindheim
    Junior Member
    • Feb 2016
    • 3

    DESeq2 question: complex multifactor experiemnt

    Hello,

    I have a complex experimental setup.

    What I've done:

    countTable=read.table("HTSeq_Table.txt", header=T,row.names=1)

    And I have a design like this, where the sample names are the row names.

    > design
    geno age libType
    S1534 Ezh1 P14 single
    S1536 Ezh1 P14 single
    S8633 Ezh1 P14 single
    S1532 Ezh12 P14 single
    S8631 Ezh12 P14 single
    S1141 Ezh12del P14 single
    S1142 Ezh12del P14 single
    S1541 Wt P14 single
    S1547 Wt P14 single
    S8Wrep1 Wt W8 paired
    S8Wrep2 Wt W8 paired
    SE18rep1 Wt E18 paired
    SE18rep2 Wt E18 paired
    P0.expt1.bio1 Wt P0 single
    P0.expt1.bio2 Wt P0 single
    P0.expt2.bio1 Wt P0 single
    P0.expt2.bio2 Wt P0 single

    So, I have multiple timepoint and single/paired-end sequencing, but only one timepoint has multiple genotypes.

    To make a DESeqCountDataSet, I ran

    > cds=DESeqDataSetFromMatrix(countData=countTable,colData=designnogroup, design=~geno+age+libType)
    Error in DESeqDataSet(se, design = design, ignoreRank) :
    the model matrix is not full rank, so the model cannot be fit as specified.
    one or more variables or interaction terms in the design formula
    are linear combinations of the others and must be removed

    1: I'm not sure how to deal with having a timecourse experiment where only one timepoint has multiple genotypes.
    2. Do I need to include single/paired-end?

    Thank you so much!

    Addendum: I found the answer. In the manual of course.

    3.12.1 Linear combinations
    Last edited by jmgrindheim; 02-01-2016, 05:35 PM. Reason: Found Answer
  • dpryan
    Devon Ryan
    • Jul 2011
    • 3478

    #2
    Without playing around with the matrix it looks like "libType" is confounding the estimation of the "W8" and "E18" ages. You're going to have to either realign the PE data as SE (just ignore read 2) or accept that W8 and E18 estimations might be confounded by a batch effect (they likely will be regardless, though this will help minimize that). So ~geno+age instead of ~geno+age+libType.

    Comment

    • jmgrindheim
      Junior Member
      • Feb 2016
      • 3

      #3
      Thanks Devon, that was really helpful.

      Also, do you know what the effect will be if I put ~age+geno vs ~geno+age? It's kinda hard to decide what's more important. I want to see if the genotype causes gene expression to look like a different timepoint, so besides differential expression, I really want normalized count values to create a heatmap and I don't see an easy way to do that in DESeq2. In the first DESeq, I would take baseMeanA or B values from a differential expression test, but I haven't yet found an equivalent with DESeq2.

      Comment

      • dpryan
        Devon Ryan
        • Jul 2011
        • 3478

        #4
        The order only affects plotting and what's output by results() by default (it'll default to whichever you specify last), the actual statistics will be the same. For normalized counts, just use counts(dds, normalized=T). That's vastly more meaningful than creating heatmaps with the group means.

        Comment

        • jmgrindheim
          Junior Member
          • Feb 2016
          • 3

          #5
          So you think that plotting count values for individual biological replicates is better then plotting them for the replicates grouped?

          Comment

          • dpryan
            Devon Ryan
            • Jul 2011
            • 3478

            #6
            Yes, you don't lose the variance that way. If you have a crazy number of groups/samples then that might not be feasible, of course (though then the heatmap likely won't tell you much anyway).

            Comment

            Latest Articles

            Collapse

            • GATTACAT
              Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
              by GATTACAT
              Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
              07-01-2026, 11:43 AM
            • SEQadmin2
              Nine Things a Sample Prep Scientist Thinks About Before Sequencing
              by SEQadmin2


              I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

              Here are nine questions we think about, in roughly the order they matter, before...
              06-18-2026, 07:11 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, 07-02-2026, 11:08 AM
            0 responses
            14 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-30-2026, 05:37 AM
            0 responses
            15 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-26-2026, 11:10 AM
            0 responses
            20 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-17-2026, 06:09 AM
            0 responses
            54 views
            0 reactions
            Last Post SEQadmin2  
            Working...