good to know
Dear Simon,
I really appreciate your confirmation!
One more quick question.
If I have multiple disease given design. (e.g. paired sample)
how can I handle ?? could you please give some comments for this ? Please refer the following link!
with best regards,
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
same for me`
hello sadiexiaoyu
did you get your answers?
I also have exactly same question with you! And I totally agree with your point. As far as I understand your points is right, in that way, we should be able to test the effect of treatment by adjusting with subject(for paired sample).
Could you please some one (know better than me ), confirm this question?
Thanks in advance
Leave a comment:
-
Originally posted by Simon Anders View PostAbout your first question (post #18): You answered this yourself in the your next post. You need a dsign fram with one row for each column in your count data set. So each "sample" appears twice. (And hence, one maybe should not call it a "sample" but that's just semantics.)
You second question (post #19): Yes, this code is correct.
And for the third (post #20): I am not an expert on edgeR but I think this is correct, too.
Thank you!
Best,
Sadiexiaoyu
Leave a comment:
-
About your first question (post #18): You answered this yourself in the your next post. You need a dsign fram with one row for each column in your count data set. So each "sample" appears twice. (And hence, one maybe should not call it a "sample" but that's just semantics.)
You second question (post #19): Yes, this code is correct.
And for the third (post #20): I am not an expert on edgeR but I think this is correct, too.
Leave a comment:
-
Originally posted by Simon Anders View PostSure, we fixed the issue with the dispersion estimator long ago.
Let's say you have 2n samples, which come in n pairs. For example, we have a treatment and a control sample from each of n subjects. Then you proceed as follows:
Make a design frame, i.e., a table describing the 2n samples, with one row per sample and two columns, say, "subject" (with levels "Subject_1", "Subject_2", "Subject_3", ...) and "treatment" (with levels "no" and "yes") and pass this as second argument to "newCountDataSet" as shown in the vignette's section on GLMs. Then fit two GLMs, namely "count ~ subject" as reduced model and "count ~ subject + treatment" as full model and compare them, again exactly as shown in the vignette (where we have "libType" instead of "subject"). Voilà, a paired test.
AFAIK, it works the same way in edgeR, only that you don't specify the reduced formula explicitely but rather, edgeR simply drops the last term in the full formula to get the reduced one, i.e., "treatment" has to come after "subject".
You said "edgeR simply drops the last term in the full formula to get the reduced one", is that mean, for example, if I set
design<-model.matrix(~subject+treatment) in edge R, then edgeR just get "treatment" with adjusting "subject" differences;
if I set
design<-model.matrix(~treatment+subject) in edgeR, then edgeR will get "subject" with adjusting "treatment" differences.
Am I right?
Actually, I used
design<-model.matrix(~subject+treatment) in edgeR, and I get the following results:
> topTags(lrt)
Coefficient: treatment n
genes logFC logCPM LR PValue FDR
15498 gene10 -3.460666 8.556419 400.1173 5.192712e-89 9.278857e-85
20675 gene8 4.608737 7.967542 394.4721 8.796487e-88 7.859221e-84
15990 gene5 -3.287753 5.762309 377.7188 3.904981e-84 2.325937e-80
731 gene71 -3.515135 7.259228 376.8333 6.087180e-84 2.719295e-80
17873 gene889 -3.560003 7.921179 363.5712 4.698794e-81 1.679255e-77
334 gene34 -3.320911 8.088810 362.5978 7.654637e-81 2.279679e-77
22052 gene2272 -3.718390 8.846441 356.9188 1.319823e-79 3.369132e-76
8133 gene89 -3.004279 6.273042 336.3495 3.980106e-75 8.890063e-72
23453 gene275 -3.576274 7.439596 330.8462 6.287517e-74 1.248352e-70
5681 gene56 -3.205917 6.550010 330.3990 7.868486e-74 1.406020e-70
> colnames(design)
[1] "(Intercept)" "sample2" "sample3" "treatmentn"
I think the logFC is the foldchange between treatment n vs treatment b, using sample1 as the baseline for adjusting the differences among individuals. Is that right?
Best,
Sadiexiaoyu
Leave a comment:
-
Originally posted by Simon Anders View PostSure, we fixed the issue with the dispersion estimator long ago.
Let's say you have 2n samples, which come in n pairs. For example, we have a treatment and a control sample from each of n subjects. Then you proceed as follows:
Make a design frame, i.e., a table describing the 2n samples, with one row per sample and two columns, say, "subject" (with levels "Subject_1", "Subject_2", "Subject_3", ...) and "treatment" (with levels "no" and "yes") and pass this as second argument to "newCountDataSet" as shown in the vignette's section on GLMs. Then fit two GLMs, namely "count ~ subject" as reduced model and "count ~ subject + treatment" as full model and compare them, again exactly as shown in the vignette (where we have "libType" instead of "subject"). Voilà, a paired test.
AFAIK, it works the same way in edgeR, only that you don't specify the reduced formula explicitely but rather, edgeR simply drops the last term in the full formula to get the reduced one, i.e., "treatment" has to come after "subject".
I tried to use DESeq to do this paired analysis in the following (using DESeq manual "multi-factor comparison" as reference), is this right?
> x<-read.delim("matrix for DESeq_n_b_new.txt",row.names=1,stringsAsFactors=FALSE)
> head(x)
X1b X2b X3b X1n X2n X3n
gene1 1562 1380 1533 821 799 820
gene2 74 79 83 51 57 45
gene3 997 837 1230 700 513 705
gene4 83 59 81 49 71 67
gene5 517 456 512 301 322 307
gene6 0 0 2 0 0 0
> design=data.frame(row.names=colnames(x),treatment=c("b","b","b","n","n","n"),subject=c("sample1","sample2","sample3"))
> design
treatment subject
X1b b sample1
X2b b sample2
X3b b sample3
X1n n sample1
X2n n sample2
X3n n sample3
> cdsFull=newCountDataSet(x,design)
and using
> fit1=fitNbinomGLMs(cdsFilt,count~subject+treatment)
> fit0=fitNbinomGLMs(cdsFilt,count~subject)
> pvalsGLM=nbinomGLMTest(fit1,fit0)
> padjGLM=p.adjust(pvalsGLM,method="BH")
Leave a comment:
-
Originally posted by Simon Anders View PostSure, we fixed the issue with the dispersion estimator long ago.
Let's say you have 2n samples, which come in n pairs. For example, we have a treatment and a control sample from each of n subjects. Then you proceed as follows:
Make a design frame, i.e., a table describing the 2n samples, with one row per sample and two columns, say, "subject" (with levels "Subject_1", "Subject_2", "Subject_3", ...) and "treatment" (with levels "no" and "yes") and pass this as second argument to "newCountDataSet" as shown in the vignette's section on GLMs. Then fit two GLMs, namely "count ~ subject" as reduced model and "count ~ subject + treatment" as full model and compare them, again exactly as shown in the vignette (where we have "libType" instead of "subject"). Voilà, a paired test.
AFAIK, it works the same way in edgeR, only that you don't specify the reduced formula explicitely but rather, edgeR simply drops the last term in the full formula to get the reduced one, i.e., "treatment" has to come after "subject".
Thank you for your answer. But I still did not get it clearly.
1) Do you mean using design like this:
>design=data.frame(subject=c("sample1","sample2","sample3"),treatment=c("b,n","b,n","b,n"))
> design
subject treatment
1 sample1 b,n
2 sample2 b,n
3 sample3 b,n
However, my count_data "sample1b“, ”sample1n","sample2b","sample2n","sample3b","sample3n",how can I combine my count_data with this design?
Best,
Sadiexiaoyu
Leave a comment:
-
Originally posted by Simon Anders View PostSure, we fixed the issue with the dispersion estimator long ago.
Let's say you have 2n samples, which come in n pairs. For example, we have a treatment and a control sample from each of n subjects. Then you proceed as follows:
Make a design frame, i.e., a table describing the 2n samples, with one row per sample and two columns, say, "subject" (with levels "Subject_1", "Subject_2", "Subject_3", ...) and "treatment" (with levels "no" and "yes") and pass this as second argument to "newCountDataSet" as shown in the vignette's section on GLMs. Then fit two GLMs, namely "count ~ subject" as reduced model and "count ~ subject + treatment" as full model and compare them, again exactly as shown in the vignette (where we have "libType" instead of "subject"). Voilà, a paired test.
AFAIK, it works the same way in edgeR, only that you don't specify the reduced formula explicitely but rather, edgeR simply drops the last term in the full formula to get the reduced one, i.e., "treatment" has to come after "subject".
Thank you for your answer. But I still did not get it clearly.
1) Do you mean using design like this:
> design=data.frame(subject=c("sample1","sample2","sample3"),treatment=c("b,n","b,n","b,n"))
> design
subject treatment
1 sample1 b,n
2 sample2 b,n
3 sample3 b,n
However, my count_data "sample1b“, ”sample1n","sample2b","sample2n","sample3b","sample3n",could you tell me how can I combine my count_data with this design?
2) I tried to use DESeq to do this paired analysis in the following (using DESeq manual "multi-factor comparison" as reference), is this right?
> x<-read.delim("matrix for DESeq_n_b_new.txt",row.names=1,stringsAsFactors=FALSE)
> head(x)
X1b X2b X3b X1n X2n X3n
gene1 1562 1380 1533 821 799 820
gene2 74 79 83 51 57 45
gene3 997 837 1230 700 513 705
gene4 83 59 81 49 71 67
gene5 517 456 512 301 322 307
gene6 0 0 2 0 0 0
> design=data.frame(row.names=colnames(x),treatment=c("b","b","b","n","n","n"),subject=c("sample1","sample2","sample3"))
> design
treatment subject
X1b b sample1
X2b b sample2
X3b b sample3
X1n n sample1
X2n n sample2
X3n n sample3
> cdsFull=newCountDataSet(x,design)
and using
> fit1=fitNbinomGLMs(cdsFilt,count~subject+treatment)
> fit0=fitNbinomGLMs(cdsFilt,count~subject)
> pvalsGLM=nbinomGLMTest(fit1,fit0)
> padjGLM=p.adjust(pvalsGLM,method="BH")
3)You said "edgeR simply drops the last term in the full formula to get the reduced one", is that mean, for example, if I set
design<-model.matrix(~subject+treatment) in edge R, then edgeR just get "treatment" with adjusting "subject" differences;
if I set
design<-model.matrix(~treatment+subject) in edgeR, then edgeR will get "subject" with adjusting "treatment" differences.
Am I right?
Actually, I used
design<-model.matrix(~subject+treatment) in edgeR, and I get the following results:
> topTags(lrt)
Coefficient: treatment n
genes logFC logCPM LR PValue FDR
15498 gene10 -3.460666 8.556419 400.1173 5.192712e-89 9.278857e-85
20675 gene8 4.608737 7.967542 394.4721 8.796487e-88 7.859221e-84
15990 gene5 -3.287753 5.762309 377.7188 3.904981e-84 2.325937e-80
731 gene71 -3.515135 7.259228 376.8333 6.087180e-84 2.719295e-80
17873 gene889 -3.560003 7.921179 363.5712 4.698794e-81 1.679255e-77
334 gene34 -3.320911 8.088810 362.5978 7.654637e-81 2.279679e-77
22052 gene2272 -3.718390 8.846441 356.9188 1.319823e-79 3.369132e-76
8133 gene89 -3.004279 6.273042 336.3495 3.980106e-75 8.890063e-72
23453 gene275 -3.576274 7.439596 330.8462 6.287517e-74 1.248352e-70
5681 gene56 -3.205917 6.550010 330.3990 7.868486e-74 1.406020e-70
> colnames(design)
[1] "(Intercept)" "sample2" "sample3" "treatmentn"
I think the logFC is the foldchange between treatment n vs treatment b, using sample1 as the baseline for adjusting the differences among individuals. Is that right?
Best,
Sadiexiaoyu
Leave a comment:
-
Originally posted by Simon Anders View PostSure, we fixed the issue with the dispersion estimator long ago.
Let's say you have 2n samples, which come in n pairs. For example, we have a treatment and a control sample from each of n subjects. Then you proceed as follows:
Make a design frame, i.e., a table describing the 2n samples, with one row per sample and two columns, say, "subject" (with levels "Subject_1", "Subject_2", "Subject_3", ...) and "treatment" (with levels "no" and "yes") and pass this as second argument to "newCountDataSet" as shown in the vignette's section on GLMs. Then fit two GLMs, namely "count ~ subject" as reduced model and "count ~ subject + treatment" as full model and compare them, again exactly as shown in the vignette (where we have "libType" instead of "subject"). Voilà, a paired test.
AFAIK, it works the same way in edgeR, only that you don't specify the reduced formula explicitely but rather, edgeR simply drops the last term in the full formula to get the reduced one, i.e., "treatment" has to come after "subject".
Thank you for your answer. But I still did not get it clearly.
1) Do you mean using design like this:
> design=data.frame(subject=c("sample1","sample2","sample3"),treatment=c("b,n","b,n","b,n"))
> design
subject treatment
1 sample1 b,n
2 sample2 b,n
3 sample3 b,n
However, my count_data "sample1b“, ”sample1n","sample2b","sample2n","sample3b","sample3n",could you tell me how can I combine my count_data with this design?
2) I tried to use DESeq to do this paired analysis in the following (using DESeq manual "multi-factor comparison" as reference), is this right?
> x<-read.delim("matrix for DESeq_n_b_new.txt",row.names=1,stringsAsFactors=FALSE)
> head(x)
X1b X2b X3b X1n X2n X3n
gene1 1562 1380 1533 821 799 820
gene2 74 79 83 51 57 45
gene3 997 837 1230 700 513 705
gene4 83 59 81 49 71 67
gene5 517 456 512 301 322 307
gene6 0 0 2 0 0 0
> design=data.frame(row.names=colnames(x),treatment=c("b","b","b","n","n","n"),subject=c("sample1","sample2","sample3"))
> design
treatment subject
X1b b sample1
X2b b sample2
X3b b sample3
X1n n sample1
X2n n sample2
X3n n sample3
> cdsFull=newCountDataSet(x,design)
and using
> fit1=fitNbinomGLMs(cdsFilt,count~subject+treatment)
> fit0=fitNbinomGLMs(cdsFilt,count~subject)
> pvalsGLM=nbinomGLMTest(fit1,fit0)
> padjGLM=p.adjust(pvalsGLM,method="BH")
3)You said "edgeR simply drops the last term in the full formula to get the reduced one", is that mean, for example, if I set
design<-model.matrix(~subject+treatment) in edge R, then edgeR just get "treatment" with adjusting "subject" differences;
if I set
design<-model.matrix(~treatment+subject) in edgeR, then edgeR will get "subject" with adjusting "treatment" differences.
Am I right?
Actually, I used
design<-model.matrix(~subject+treatment) in edgeR, and I get the following results:
> topTags(lrt)
Coefficient: treatment n
genes logFC logCPM LR PValue FDR
15498 gene10 -3.460666 8.556419 400.1173 5.192712e-89 9.278857e-85
20675 gene8 4.608737 7.967542 394.4721 8.796487e-88 7.859221e-84
15990 gene5 -3.287753 5.762309 377.7188 3.904981e-84 2.325937e-80
731 gene71 -3.515135 7.259228 376.8333 6.087180e-84 2.719295e-80
17873 gene889 -3.560003 7.921179 363.5712 4.698794e-81 1.679255e-77
334 gene34 -3.320911 8.088810 362.5978 7.654637e-81 2.279679e-77
22052 gene2272 -3.718390 8.846441 356.9188 1.319823e-79 3.369132e-76
8133 gene89 -3.004279 6.273042 336.3495 3.980106e-75 8.890063e-72
23453 gene275 -3.576274 7.439596 330.8462 6.287517e-74 1.248352e-70
5681 gene56 -3.205917 6.550010 330.3990 7.868486e-74 1.406020e-70
> colnames(design)
[1] "(Intercept)" "sample2" "sample3" "treatmentn"
I think the logFC is the foldchange between treatment n vs treatment b, using sample1 as the baseline for adjusting the differences among individuals. Is that right?
Best,
Sadiexiaoyu
Leave a comment:
-
Originally posted by sadiexiaoyu View PostI tried to use edgeR to analyse paired samples, and I also would like to use DESeq, but I just found "multi-factor designs" using GLM in the DESeq manual. This cannot be used for paired samples, right? Could you give me any suggestion to analyse paired samples using DESeq? I noticed this thread was posted long time ago, so maybe there is new method....?
Let's say you have 2n samples, which come in n pairs. For example, we have a treatment and a control sample from each of n subjects. Then you proceed as follows:
Make a design frame, i.e., a table describing the 2n samples, with one row per sample and two columns, say, "subject" (with levels "Subject_1", "Subject_2", "Subject_3", ...) and "treatment" (with levels "no" and "yes") and pass this as second argument to "newCountDataSet" as shown in the vignette's section on GLMs. Then fit two GLMs, namely "count ~ subject" as reduced model and "count ~ subject + treatment" as full model and compare them, again exactly as shown in the vignette (where we have "libType" instead of "subject"). Voilà, a paired test.
AFAIK, it works the same way in edgeR, only that you don't specify the reduced formula explicitely but rather, edgeR simply drops the last term in the full formula to get the reduced one, i.e., "treatment" has to come after "subject".
Leave a comment:
-
Originally posted by Simon Anders View PostCole, if cufflinks can handle paired samples, I'd be really impressed, but I wonder if you simply were to fast with your reply and overlooked the fact that the question was about paired samples. If not, please correct me.
For readers unfamiliar with the issue, a quick reminder of text-book knowledge with an example for a paired t test:
Imagine you have 5 sample, for which you measure some quantitative trait before and after a certain treatment. Let's say this trait varies according to a normal distribution and the data looks like this
Code:S1 S2 S3 S4 S5 before treatment 4.00 7.30 11.13 8.50 9.50 after treatment 4.72 8.44 12.04 9.66 10.65
However, we did not use the sample pairing here. If we want to do this, we first take the differences and then the average, i.e., instead of subtracting the averages, we subtract each treated value from the corresponding untreated value.
The differences are:
Code:0.72 1.13 0.91 1.15 1.14
To come back to DEGs: Whenever samples are paired (i.e., the same sample is measured twice under different conditions) and unless the difference between the samples is typically much smaller than the effect of the treatment, we dramatically loose statistical power if our method is unable to make use of the pairing information.
To my knowledge, none of the currently released tools can do this, though.
We have recently expanded our DESeq package to be able to fit generalized linear models (GLMs), and these can be used to model the pairing. Unfortunately, our method to estimate the dispersion (which, for count data, takes the role of the standard deviation of the differences in the example above) does not work for paired designs. We have some ideas how to get around this and are testing them at the moment, but it does not yet work as well as we would like it.
As far as I know, the edgeR people seem to pursue similar ideas.
DESeq offers a function for a "variance stabilizing transformation" which translates the count data onto a continuous scale such that it becomes approximately homoscedastic. This allows then to use tools that worked well for microarrays, such as pairwise t-tests or Smyth's 'limma' package. However, the transformation costs power and introduces bias in case the library sizes are too different. Still, it may a good way to get started.
Simon
I tried to use edgeR to analyse paired samples, and I also would like to use DESeq, but I just found "multi-factor designs" using GLM in the DESeq manual. This cannot be used for paired samples, right? Could you give me any suggestion to analyse paired samples using DESeq? I noticed this thread was posted long time ago, so maybe there is new method....?
Thank you!
Best,
Sadiexiaoyu
Leave a comment:
-
Simon,
Nevermind, I think I know understand the issue now.
Thanks,
Praful
Leave a comment:
-
Simon,
I am sorry if this sounds stupid, but how does the experimental design of the poster differ from the pasillaGenes example in the DESeq documentation? I am probably not able to get a handle on what the poster might mean by paired samples, could you help me understand what we mean by a paired sample experimental design?
Thanks,
PrafulLast edited by aggp11; 01-11-2012, 11:06 AM.
Leave a comment:
-
Hi Sheng
edgeR can indeed be used to analyse RNA-seq data from paired designs, but your question is far, far too vague for me to be able to give you any sensible answer about how many samples you need.
In general, the answer is "as many as you can afford".
Cheers
Davis
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
Innovations in next-generation sequencing technologies and techniques are driving more precise and comprehensive exploration of complex biological systems. Current advancements include improved accessibility for long-read sequencing and significant progress in single-cell and 3D genomics. This article explores some of the most impactful developments in the field over the past year.
Long-Read Sequencing
Long-read sequencing has seen remarkable advancements,...-
Channel: Articles
12-02-2024, 01:49 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, 12-02-2024, 09:29 AM
|
0 responses
164 views
0 likes
|
Last Post
by seqadmin
12-02-2024, 09:29 AM
|
||
Started by seqadmin, 12-02-2024, 09:06 AM
|
0 responses
59 views
0 likes
|
Last Post
by seqadmin
12-02-2024, 09:06 AM
|
||
Started by seqadmin, 12-02-2024, 08:03 AM
|
0 responses
48 views
0 likes
|
Last Post
by seqadmin
12-02-2024, 08:03 AM
|
||
Started by seqadmin, 11-22-2024, 07:36 AM
|
0 responses
78 views
0 likes
|
Last Post
by seqadmin
11-22-2024, 07:36 AM
|
Leave a comment: