Originally posted by Michael Love
View Post
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Thanks again Michael for the swift reply and great work
I need to read more to understand the use of "~1" in different function
Leave a comment:
-
hi Wei,
Yes, there is a bug in v1.4, that if you've run the LRT pipeline, then you can't use the contrast argument if the LFC is already contained in resultsNames, but you can get it with the 'name' argument. And it's the same result table using 'name'. I've fixed this in the next release (v1.6 released in October).
( and yes to your question for Devon )
Leave a comment:
-
Originally posted by dpryan View PostI'd personally go with the combined design (i.e., the last one you showed). Especially since you only have 3 samples per group, you'll likely get a bit better variance estimates with that. If you wanted to do a classic 1-way anova like test, then you'd just use nbinomLRT() and compare that against a design of ~1.
Thanks a lot Devon! just one question about "design of ~1", you mentioned. you mean: dds <- nbinomLRT(dds, reduced = ~ 1) ?
Leave a comment:
-
Thanks a lot Michael for the detail and explanation!
I think LRT test makes more sense for me (compare all levels). Sorry for the question before to get the comparison with shRNA_A vs control. I got it with the "name".
> resultsNames(dds)
[1] "Intercept" "condition_shRNAA_vs_control" "condition_shRNAB_vs_control"
> resA <- results(dds, name="condition_shRNA_A_vs_control")
But with "contrast", I had one error:
> resA <- results(dds, contrast=c("condition","shRNA_A","control"))
Error in normalizeSingleBracketSubscript(j, x) :
subscript contains invalid names
Leave a comment:
-
I haven't seen any big difference in the ranking or set of genes {adj. p < alpha} between the Wald and the LRT. Some differences are:
The Wald statistic appeared more compatible with the LFC shrinkage, producing uniform p-values under the null (the numerator and denominator shrink at nearly the same rate), while after LFC shrinkage the LRT statistic under the null piles up a bit near 0 (p-values piling up at 1).
The Wald allow for easy testing of |beta| > theta, for theta > 0.
The standard Wald pipeline has LFC shrinkage, for which we have to estimate the MLE betas to estimate the prior, so I'm not sure if it's even faster one way or the other.
Leave a comment:
-
Michael: Quick (largely) unrelated question for you. Given an experiment with design ~condition, where condition just has two levels, is there much of a benefit to using a Wald test over an LRT other than the ease of computation? My understanding had always been that the Wald test approximated an LRT and was often preferred since you don't have to fit the alternate model.
Leave a comment:
-
Q1:
In my previous response I gave the answer to this one:
results(dds) gives the LRT p-values and adjusted p-values (note that the LFC column is only for one of the comparisons, see ?results for more information).
Code:LRT p-value: '~ condition' vs '~ 1'
Q2:
The likelihood ratio test compares the explanatory power of two models.
See http://en.wikipedia.org/wiki/Likelihood-ratio_test for background on this statistical test.
the full model (~ condition) includes the intercept and the information in 'condition' (note that in R formula, ~ condition is equivalent to ~ 1 + condition)
the reduced model (~ 1) is just an intercept term
resB is a comparison of one level vs the control, whereas the LRT tests all levels of condition.
Q3:
Which method you use depends on the biological question you want to ask.
Use the LRT if you want to describe the set of genes changing from any level of condition.
Use the Wald tests if you want to describe the set of genes from a single pair of levels (e.g. shRNA_B vs control).
Leave a comment:
-
Thanks a lot Devon and Michael!
Like you recommended I run all the samples together to get better variance estimates. I tried to my design as ~ condition+type. I had one error"the model matrix is not full rank, i.e. one or more variables in the design formula are linear combinations of the others", I guess this is a wrong design.
Here is my code:
> countdata = as.matrix(read.table("data.txt", header=T, row.names=1))
> head (countdata)
untreated1 untreated2 untreated3 treatedA1 treatedA2 treatedA3 treatedB1 treatedB2 treatedB3
ENSMUSG00000000001 1233 1091 1259 1234 1118 1082 1454 1086 1143
ENSMUSG00000000003 0 0 0 0 0 0 0 0 0
ENSMUSG00000000028 76 68 61 75 81 82 93 74 88
ENSMUSG00000000031 2 3 5 2 2 2 12 5 2
ENSMUSG00000000037 24 31 38 27 26 14 47 43 45
ENSMUSG00000000049 6 7 10 11 19 16 20 11 15
> coldata= read.table ("design.txt", header = T, row.names=1)
>
> coldata
condition type
untreated1 control untreated
untreated2 control untreated
untreated3 control untreated
treatedA1 shRNA_A treated
treatedA2 shRNA_A treated
treatedA3 shRNA_A treated
treatedB1 shRNA_B treated
treatedB2 shRNA_B treated
treatedB3 shRNA_B treated
> dds <- DESeqDataSetFromMatrix(countData = countdata, colData = coldata, design = ~ condition + type)
Error in DESeqDataSet(se, design = design, ignoreRank) :
the model matrix is not full rank, i.e. one or more variables in the design formula are linear combinations of the others
Then I do it like Michael's code, I delete the type column in coldata file.
> coldata
condition
untreated1 control
untreated2 control
untreated3 control
treatedA1 shRNA_A
treatedA2 shRNA_A
treatedA3 shRNA_A
treatedB1 shRNA_B
treatedB2 shRNA_B
treatedB3 shRNA_B
> dds <- DESeqDataSetFromMatrix(countData = countdata, colData = coldata, design = ~ condition)
> design(dds) = ~ condition
> dds = DESeq(dds, test="LRT", reduced = ~ 1)
estimating size factors
estimating dispersions
gene-wise dispersion estimates
mean-dispersion relationship
final dispersion estimates
fitting model and testing
> res <- results(dds)
> head(res)
log2 fold change: condition shRNA B vs control
LRT p-value: '~ condition' vs '~ 1'
DataFrame with 6 rows and 6 columns
baseMean log2FoldChange lfcSE stat pvalue
<numeric> <numeric> <numeric> <numeric> <numeric>
ENSMUSG00000000001 1183.648139 -0.1000936 0.09353996 1.191909 0.55103647
ENSMUSG00000000003 0.000000 NA NA NA NA
ENSMUSG00000000028 77.301980 0.1806468 0.21423655 1.095024 0.57838715
ENSMUSG00000000031 3.688155 0.7430438 0.92477711 2.323728 0.31290243
ENSMUSG00000000037 32.345099 0.4137095 0.32571884 6.944625 0.03104515
ENSMUSG00000000049 12.603642 0.8593232 0.53167437 3.999843 0.13534587
padj
<numeric>
ENSMUSG00000000001 0.9509148
ENSMUSG00000000003 NA
ENSMUSG00000000028 0.9573557
ENSMUSG00000000031 NA
ENSMUSG00000000037 0.3663522
ENSMUSG00000000049 NA
######question1: res only include one comparison "condition shRNA B vs control", how can I get the result "condition shRNA A vs control"? (sorry I checked the ?results, but did not manage to get it)
######question2: LRT p-value: '~ condition' vs '~ 1', I do not quite understand '~ 1'?
when I do two comparisons individually, then this is nbinomWaldTest test:
dds = DESeq(dds)
resA <- results(dds, contrast=c("condition","shRNA_A","control"))
resB <- results(dds, contrast=c("condition","shRNA_B","control"))
The result resB is slightly different from the res (with LRT test) in both number of genes and fold change number.
######question3: which method shall I use?
Thanks again for your time and help
Leave a comment:
-
Yes, as Devon states, if you want to test (control vs shRNA_A) and (control vs shRNA_B) at once you can use the LRT,
with a condition vector like so: c("control", ..., "shRNA_A", ..., "shRNA_B", ... )
you can use:
Code:design(dds) = ~ condition dds = DESeq(dds, test="LRT", reduced = ~ 1)
If you want to make the two comparisons individually, it's best to run all the samples together and produce results tables like so:
Code:dds = DESeq(dds) results(dds, contrast=c("condition","shRNA_A","control")) results(dds, contrast=c("condition","shRNA_B","control"))
Leave a comment:
-
I'd personally go with the combined design (i.e., the last one you showed). Especially since you only have 3 samples per group, you'll likely get a bit better variance estimates with that. If you wanted to do a classic 1-way anova like test, then you'd just use nbinomLRT() and compare that against a design of ~1.
Leave a comment:
-
Deseq2 design
Dear all,
I have one question about the design of the condition: I am doing gene silencing screening with 2 different shRNA (targeted to the same gene). Before I do the RNA-seq, I know both of these two shRNA bring my gene of interest down 50% compare with control (bench data), and I want to see the differential expression genes in the knock down condition. So I have 9 samples in total with 3 biological replicates per condition:
untreated1 control
untreated2 control
untreated3 control
treatedA1 shRNA_A
treatedA2 shRNA_A
treatedA3 shRNA_A
treatedB1 shRNA_B
treatedB2 shRNA_B
treatedB3 shRNA_B
First, I did the Deseq2 analysis with two groups comparison: control vs shRNA_A and control vs shRNA_B.
design as follows:
condition
untreated1 control
untreated2 control
untreated3 control
treatedA1 shRNA_A
treatedA2 shRNA_A
treatedA3 shRNA_A
condition
untreated1 control
untreated2 control
untreated3 control
treatedA1 shRNA_B
treatedA2 shRNA_B
treatedA3 shRNA_B
From the two dataset, I can see my gene goes down 50% (consistent with my bench data), but there are different number of regulated genes (with padj < 0.1) in the two dataset:
control vs shRNA_A 167 genes
control vs shRNA_B 431 genes
there are 50 common regulated genes in the two lists, but the non-overlapped genes have similar expression trend in the two shRNA condition (just because of the padj, those genes are not choose in the list).
Is my analysis design correct? or shall I do another type of design like this?
condition type
untreated1 control untreated
untreated2 control untreated
untreated3 control untreated
treatedA1 shRNA_A treated
treatedA2 shRNA_A treated
treatedA3 shRNA_A treated
treatedB1 shRNA_B treated
treatedB2 shRNA_B treated
treatedB3 shRNA_B treated
ideally I would like to see the genes in two shRNA treated condition together (like anova test), but I do not know how to design it?
Thanks in advance for any advice or comments!
Best,
WeiTags: None
Latest Articles
Collapse
-
by seqadmin
Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...-
Channel: Articles
09-23-2024, 06:35 AM -
-
by seqadmin
During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.
Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...-
Channel: Articles
09-09-2024, 10:59 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 04:51 AM
|
0 responses
8 views
0 likes
|
Last Post
by seqadmin
Yesterday, 04:51 AM
|
||
Started by seqadmin, 10-01-2024, 07:10 AM
|
0 responses
13 views
0 likes
|
Last Post
by seqadmin
10-01-2024, 07:10 AM
|
||
Started by seqadmin, 09-30-2024, 08:33 AM
|
0 responses
18 views
0 likes
|
Last Post
by seqadmin
09-30-2024, 08:33 AM
|
||
Started by seqadmin, 09-26-2024, 12:57 PM
|
0 responses
16 views
0 likes
|
Last Post
by seqadmin
09-26-2024, 12:57 PM
|
Leave a comment: