Dear all,
I have a question that is similar to others I've seen, but I am still not sure how to solve it.
Data is:
* RNA-seq from Illumina Hiseq machine
* Mus musculus
* 24 samples
Input is count data from htseq-count.
I have different types of mice (control, mutant1, mutant2) and different parts of the brain sequenced for each (part1, part2).
So data is paired, e.g. part1 and part2 from the same mouse are paired.
Here is what my sample table looks like (ordered by condition), pairs basically represents the different mice:
sampleName fileName condition pairs
Control_part1_1 Control_part1_1_htseq.txt Control_part1 10
Control_part1_2 Control_part1_2_htseq.txt Control_part1 12
Control_part2_1 Control_part2_1_htseq.txt Control_part2 10
Control_part2_2 Control_part2_2_htseq.txt Control_part2 12
mutant1_part1_4 mutant1_part1_4_htseq.txt mutant1_part1 1
mutant1_part1_5 mutant1_part1_5_htseq.txt mutant1_part1 2
mutant1_part1_1 mutant1_part1_1_htseq.txt mutant1_part1 3
mutant1_part1_2 mutant1_part1_2_htseq.txt mutant1_part1 4
mutant1_part1_3 mutant1_part1_3_htseq.txt mutant1_part1 5
mutant1_part2_1 mutant1_part2_1_htseq.txt mutant1_part2 1
mutant1_part2_2 mutant1_part2_2_htseq.txt mutant1_part2 2
mutant1_part2_3 mutant1_part2_3_htseq.txt mutant1_part2 3
mutant1_part2_4 mutant1_part2_4_htseq.txt mutant1_part2 4
mutant1_part2_5 mutant1_part2_5_htseq.txt mutant1_part2 5
mutant2_part1_1 mutant2_part1_1_htseq.txt mutant2_part1 6
mutant2_part1_2 mutant2_part1_2_htseq.txt mutant2_part1 7
mutant2_part1_3 mutant2_part1_3_htseq.txt mutant2_part1 8
mutant2_part1_4 mutant2_part1_4_htseq.txt mutant2_part1 9
mutant2_part2_1 mutant2_part2_1_htseq.txt mutant2_part2 6
mutant2_part2_2 mutant2_part2_2_htseq.txt mutant2_part2 7
mutant2_part2_3 mutant2_part2_3_htseq.txt mutant2_part2 8
mutant2_part2_4 mutant2_part2_4_htseq.txt mutant2_part2 9
# I first tried unpaired analysis
se <- DESeqDataSetFromHTSeqCount(sampleTable=sampleTable,
directory=".",
design=~condition)
se1 <- DESeq(se)
# then paired analysis, taking into account the "pairs" column
se.p <- se1
design(se.p) <- formula(~pairs+condition)
se.p <- DESeq(se.p)
and I get the following error message:
"Error in checkFullRank(modelMatrix) :
the model matrix is not full rank, so the model cannot be fit as specified.
One or more variables or interaction terms in the design formula are linear
combinations of the others and must be removed.
See the section 'Model matrix not full rank' in vignette('DESeq2')
"
So I went to section "3.12 Model matrix not full rank" of the manual, and now I am a bit confused, as I don't quite understand why it fails (I don't really see the linear combination) and I don't know how to modify the design accurately so as to take into account the paired information...
The example in the vignette modifies the "ind" column (which corresponds to my "pairs" column) the following way:
## grp ind cnd ind.n
## 1 X 1 A 1
## 2 X 1 B 1
## 3 X 2 A 2
## 4 X 2 B 2
## 5 Y 3 A 1
## 6 Y 3 B 1
## 7 Y 4 A 2
## 8 Y 4 B 2
If I modify my design similarly, I would get (I just copy these 3 columns as it gets messy):
condition pairs pairs.n
Control_part1 10 1
Control_part1 12 2
Control_part2 10 1
Control_part2 12 2
mutant1_part1 1 1
mutant1_part1 2 2
mutant1_part1 3 3
mutant1_part1 4 4
mutant1_part1 5 5
mutant1_part2 1 1
mutant1_part2 2 2
mutant1_part2 3 3
mutant1_part2 4 4
mutant1_part2 5 5
mutant2_part1 6 1
mutant2_part1 7 2
mutant2_part1 8 3
mutant2_part1 9 4
mutant2_part2 6 1
mutant2_part2 7 2
mutant2_part2 8 3
mutant2_part2 9 4
Is it correct to use this design? Any suggestion?
Thanks!
I have a question that is similar to others I've seen, but I am still not sure how to solve it.
Data is:
* RNA-seq from Illumina Hiseq machine
* Mus musculus
* 24 samples
Input is count data from htseq-count.
I have different types of mice (control, mutant1, mutant2) and different parts of the brain sequenced for each (part1, part2).
So data is paired, e.g. part1 and part2 from the same mouse are paired.
Here is what my sample table looks like (ordered by condition), pairs basically represents the different mice:
sampleName fileName condition pairs
Control_part1_1 Control_part1_1_htseq.txt Control_part1 10
Control_part1_2 Control_part1_2_htseq.txt Control_part1 12
Control_part2_1 Control_part2_1_htseq.txt Control_part2 10
Control_part2_2 Control_part2_2_htseq.txt Control_part2 12
mutant1_part1_4 mutant1_part1_4_htseq.txt mutant1_part1 1
mutant1_part1_5 mutant1_part1_5_htseq.txt mutant1_part1 2
mutant1_part1_1 mutant1_part1_1_htseq.txt mutant1_part1 3
mutant1_part1_2 mutant1_part1_2_htseq.txt mutant1_part1 4
mutant1_part1_3 mutant1_part1_3_htseq.txt mutant1_part1 5
mutant1_part2_1 mutant1_part2_1_htseq.txt mutant1_part2 1
mutant1_part2_2 mutant1_part2_2_htseq.txt mutant1_part2 2
mutant1_part2_3 mutant1_part2_3_htseq.txt mutant1_part2 3
mutant1_part2_4 mutant1_part2_4_htseq.txt mutant1_part2 4
mutant1_part2_5 mutant1_part2_5_htseq.txt mutant1_part2 5
mutant2_part1_1 mutant2_part1_1_htseq.txt mutant2_part1 6
mutant2_part1_2 mutant2_part1_2_htseq.txt mutant2_part1 7
mutant2_part1_3 mutant2_part1_3_htseq.txt mutant2_part1 8
mutant2_part1_4 mutant2_part1_4_htseq.txt mutant2_part1 9
mutant2_part2_1 mutant2_part2_1_htseq.txt mutant2_part2 6
mutant2_part2_2 mutant2_part2_2_htseq.txt mutant2_part2 7
mutant2_part2_3 mutant2_part2_3_htseq.txt mutant2_part2 8
mutant2_part2_4 mutant2_part2_4_htseq.txt mutant2_part2 9
# I first tried unpaired analysis
se <- DESeqDataSetFromHTSeqCount(sampleTable=sampleTable,
directory=".",
design=~condition)
se1 <- DESeq(se)
# then paired analysis, taking into account the "pairs" column
se.p <- se1
design(se.p) <- formula(~pairs+condition)
se.p <- DESeq(se.p)
and I get the following error message:
"Error in checkFullRank(modelMatrix) :
the model matrix is not full rank, so the model cannot be fit as specified.
One or more variables or interaction terms in the design formula are linear
combinations of the others and must be removed.
See the section 'Model matrix not full rank' in vignette('DESeq2')
"
So I went to section "3.12 Model matrix not full rank" of the manual, and now I am a bit confused, as I don't quite understand why it fails (I don't really see the linear combination) and I don't know how to modify the design accurately so as to take into account the paired information...
The example in the vignette modifies the "ind" column (which corresponds to my "pairs" column) the following way:
## grp ind cnd ind.n
## 1 X 1 A 1
## 2 X 1 B 1
## 3 X 2 A 2
## 4 X 2 B 2
## 5 Y 3 A 1
## 6 Y 3 B 1
## 7 Y 4 A 2
## 8 Y 4 B 2
If I modify my design similarly, I would get (I just copy these 3 columns as it gets messy):
condition pairs pairs.n
Control_part1 10 1
Control_part1 12 2
Control_part2 10 1
Control_part2 12 2
mutant1_part1 1 1
mutant1_part1 2 2
mutant1_part1 3 3
mutant1_part1 4 4
mutant1_part1 5 5
mutant1_part2 1 1
mutant1_part2 2 2
mutant1_part2 3 3
mutant1_part2 4 4
mutant1_part2 5 5
mutant2_part1 6 1
mutant2_part1 7 2
mutant2_part1 8 3
mutant2_part1 9 4
mutant2_part2 6 1
mutant2_part2 7 2
mutant2_part2 8 3
mutant2_part2 9 4
Is it correct to use this design? Any suggestion?
Thanks!
Comment