Design for normalization using DESeq

Ayaka

Junior Member

Join Date: Apr 2014

Posts: 6
- Share
- Tweet
#1

Design for normalization using DESeq

03-07-2015, 05:21 PM

Hi all, I am analyzing 16S sequenced data of human fecal samples. I proccessed my data with qiime and have been using the R package phyloseq for the data analysis. I wish to use the phyloseq to DESeq command of the R package DESeq2 to normalize by data to stabilize the variance, and avoid rarefaction.

My question is that I am not certain of the design I should use. I attach 10 rows of my samples variables info.

#SampleID Treatment Treatment1 Time Sex Age Individual
1.1 P P0 T0 F 33 1
1.2 P P1 T1 F 33 1
2.1 O O0 T0 F 28 2
2.2 O O1 T1 F 28 2
3.1 Control C0 T0 M 24 3
3.2 Control C1 T1 M 24 3
4.1 Control C0 T0 M 28 4
4.2 Control C1 T1 M 28 4
5.1 O+P OP0 T0 M 24 5
5.2 O+P OP1 T1 M 24 5

I had a n=40, which I randomly assigned in 4 groups ( 3 treatments (O, P and the combination of O+P) and a control group). For each group I sequenced a fecal sample prior to the treatment (T0) and after it (T1). So in total I ended up with 80 libraries from 80 samples.
What I want to compare is the difference of composition/abundance 1) between measurements of T0 and T1 within the treatments, and the difference of composition/abundance 2) between the treatments.

At first I used Treatment1 for the design which is a variable that combines the treatment and time. Afterwords I saw in tutorials that people uses those kind of variables separated, and also incorporated patients variable so I used the design ~ Individual + Time + Treatment.
But R throws the error

error in DESeqDataSet(se, design = design, ignoreRank) :
the model matrix is not full rank, so the model cannot be fit as specified.
one or more variables or interaction terms in the design formula
are linear combinations of the others and must be removed

this also happens when I put Treatment and Individual in the design, but other combinations like Time and Individual or Treatment and Time, work just fine.

I thought it was good to add the individuals as a variable to the design, considering that the samples that are in the same treatment-time group are the biologica replica, but show great variability in the abundance count (composition of microbiota among individuals are very large in some cases).
I thought that adding the individual variable to the desing would help to account for that variability. However maybe by adding this variable, the degrees of freedom would be bigger and the weight of the Treatment and Time variables explaining the changes in the abundances could become insignificant if the changes are subte?

I have been trying to understand by my self reading the DESeq papers but I lack the statistical knowledge, and would like to ask you for help in understanding the error and what would be the best design for the analysis.

thank you!!! Cheers
Tags: None

Previous template Next

Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing

by GATTACAT

Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
- Channel: Articles
Today, 11:43 AM
Nine Things a Sample Prep Scientist Thinks About Before Sequencing

by SEQadmin2

I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

Here are nine questions we think about, in roughly the order they matter, before...
- Channel: Articles
06-18-2026, 07:11 AM
From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data

by SEQadmin2

Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.

The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
...
- Channel: Articles
06-02-2026, 10:05 AM

Topics	Statistics	Last Post
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, Yesterday, 05:37 AM	0 responses 7 views 0 reactions	Last Post by SEQadmin2 Yesterday, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 17 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 51 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 110 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM

Unconfigured Ad

Design for normalization using DESeq

Latest Articles

ad_right_rmr

News