Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • dpryan
    replied
    Originally posted by Michael Love View Post
    I haven't seen any big difference in the ranking or set of genes {adj. p < alpha} between the Wald and the LRT. Some differences are:

    The Wald statistic appeared more compatible with the LFC shrinkage, producing uniform p-values under the null (the numerator and denominator shrink at nearly the same rate), while after LFC shrinkage the LRT statistic under the null piles up a bit near 0 (p-values piling up at 1).

    The Wald allow for easy testing of |beta| > theta, for theta > 0.

    The standard Wald pipeline has LFC shrinkage, for which we have to estimate the MLE betas to estimate the prior, so I'm not sure if it's even faster one way or the other.
    Thanks, that makes sense.

    Leave a comment:


  • Wei-HD
    replied
    Thanks again Michael for the swift reply and great work
    I need to read more to understand the use of "~1" in different function

    Leave a comment:


  • Michael Love
    replied
    hi Wei,

    Yes, there is a bug in v1.4, that if you've run the LRT pipeline, then you can't use the contrast argument if the LFC is already contained in resultsNames, but you can get it with the 'name' argument. And it's the same result table using 'name'. I've fixed this in the next release (v1.6 released in October).

    ( and yes to your question for Devon )

    Leave a comment:


  • Wei-HD
    replied
    Originally posted by dpryan View Post
    I'd personally go with the combined design (i.e., the last one you showed). Especially since you only have 3 samples per group, you'll likely get a bit better variance estimates with that. If you wanted to do a classic 1-way anova like test, then you'd just use nbinomLRT() and compare that against a design of ~1.

    Thanks a lot Devon! just one question about "design of ~1", you mentioned. you mean: dds <- nbinomLRT(dds, reduced = ~ 1) ?

    Leave a comment:


  • Wei-HD
    replied
    Thanks a lot Michael for the detail and explanation!

    I think LRT test makes more sense for me (compare all levels). Sorry for the question before to get the comparison with shRNA_A vs control. I got it with the "name".

    > resultsNames(dds)
    [1] "Intercept" "condition_shRNAA_vs_control" "condition_shRNAB_vs_control"
    > resA <- results(dds, name="condition_shRNA_A_vs_control")

    But with "contrast", I had one error:

    > resA <- results(dds, contrast=c("condition","shRNA_A","control"))

    Error in normalizeSingleBracketSubscript(j, x) :
    subscript contains invalid names

    Leave a comment:


  • Michael Love
    replied
    I haven't seen any big difference in the ranking or set of genes {adj. p < alpha} between the Wald and the LRT. Some differences are:

    The Wald statistic appeared more compatible with the LFC shrinkage, producing uniform p-values under the null (the numerator and denominator shrink at nearly the same rate), while after LFC shrinkage the LRT statistic under the null piles up a bit near 0 (p-values piling up at 1).

    The Wald allow for easy testing of |beta| > theta, for theta > 0.

    The standard Wald pipeline has LFC shrinkage, for which we have to estimate the MLE betas to estimate the prior, so I'm not sure if it's even faster one way or the other.

    Leave a comment:


  • dpryan
    replied
    Michael: Quick (largely) unrelated question for you. Given an experiment with design ~condition, where condition just has two levels, is there much of a benefit to using a Wald test over an LRT other than the ease of computation? My understanding had always been that the Wald test approximated an LRT and was often preferred since you don't have to fit the alternate model.

    Leave a comment:


  • Michael Love
    replied
    Q1:

    In my previous response I gave the answer to this one:

    results(dds) gives the LRT p-values and adjusted p-values (note that the LFC column is only for one of the comparisons, see ?results for more information).
    When you print 'res', it says:

    Code:
    LRT p-value: '~ condition' vs '~ 1'
    So the p-values are for the test of the counts depending on all levels of condition (control, shRNA_A, shRNA_B), while the LFC shown in this table is only for one of the fold changes. Please read the help for ?results which has a paragraph explaining this.


    Q2:

    The likelihood ratio test compares the explanatory power of two models.

    See http://en.wikipedia.org/wiki/Likelihood-ratio_test for background on this statistical test.

    the full model (~ condition) includes the intercept and the information in 'condition' (note that in R formula, ~ condition is equivalent to ~ 1 + condition)

    the reduced model (~ 1) is just an intercept term

    resB is a comparison of one level vs the control, whereas the LRT tests all levels of condition.

    Q3:

    Which method you use depends on the biological question you want to ask.

    Use the LRT if you want to describe the set of genes changing from any level of condition.

    Use the Wald tests if you want to describe the set of genes from a single pair of levels (e.g. shRNA_B vs control).

    Leave a comment:


  • Wei-HD
    replied
    Thanks a lot Devon and Michael!

    Like you recommended I run all the samples together to get better variance estimates. I tried to my design as ~ condition+type. I had one error"the model matrix is not full rank, i.e. one or more variables in the design formula are linear combinations of the others", I guess this is a wrong design.

    Here is my code:

    > countdata = as.matrix(read.table("data.txt", header=T, row.names=1))
    > head (countdata)
    untreated1 untreated2 untreated3 treatedA1 treatedA2 treatedA3 treatedB1 treatedB2 treatedB3
    ENSMUSG00000000001 1233 1091 1259 1234 1118 1082 1454 1086 1143
    ENSMUSG00000000003 0 0 0 0 0 0 0 0 0
    ENSMUSG00000000028 76 68 61 75 81 82 93 74 88
    ENSMUSG00000000031 2 3 5 2 2 2 12 5 2
    ENSMUSG00000000037 24 31 38 27 26 14 47 43 45
    ENSMUSG00000000049 6 7 10 11 19 16 20 11 15
    > coldata= read.table ("design.txt", header = T, row.names=1)
    >
    > coldata
    condition type
    untreated1 control untreated
    untreated2 control untreated
    untreated3 control untreated
    treatedA1 shRNA_A treated
    treatedA2 shRNA_A treated
    treatedA3 shRNA_A treated
    treatedB1 shRNA_B treated
    treatedB2 shRNA_B treated
    treatedB3 shRNA_B treated
    > dds <- DESeqDataSetFromMatrix(countData = countdata, colData = coldata, design = ~ condition + type)
    Error in DESeqDataSet(se, design = design, ignoreRank) :
    the model matrix is not full rank, i.e. one or more variables in the design formula are linear combinations of the others



    Then I do it like Michael's code, I delete the type column in coldata file.

    > coldata
    condition
    untreated1 control
    untreated2 control
    untreated3 control
    treatedA1 shRNA_A
    treatedA2 shRNA_A
    treatedA3 shRNA_A
    treatedB1 shRNA_B
    treatedB2 shRNA_B
    treatedB3 shRNA_B

    > dds <- DESeqDataSetFromMatrix(countData = countdata, colData = coldata, design = ~ condition)
    > design(dds) = ~ condition
    > dds = DESeq(dds, test="LRT", reduced = ~ 1)
    estimating size factors
    estimating dispersions
    gene-wise dispersion estimates
    mean-dispersion relationship
    final dispersion estimates
    fitting model and testing
    > res <- results(dds)
    > head(res)
    log2 fold change: condition shRNA B vs control
    LRT p-value: '~ condition' vs '~ 1'
    DataFrame with 6 rows and 6 columns
    baseMean log2FoldChange lfcSE stat pvalue
    <numeric> <numeric> <numeric> <numeric> <numeric>
    ENSMUSG00000000001 1183.648139 -0.1000936 0.09353996 1.191909 0.55103647
    ENSMUSG00000000003 0.000000 NA NA NA NA
    ENSMUSG00000000028 77.301980 0.1806468 0.21423655 1.095024 0.57838715
    ENSMUSG00000000031 3.688155 0.7430438 0.92477711 2.323728 0.31290243
    ENSMUSG00000000037 32.345099 0.4137095 0.32571884 6.944625 0.03104515
    ENSMUSG00000000049 12.603642 0.8593232 0.53167437 3.999843 0.13534587
    padj
    <numeric>
    ENSMUSG00000000001 0.9509148
    ENSMUSG00000000003 NA
    ENSMUSG00000000028 0.9573557
    ENSMUSG00000000031 NA
    ENSMUSG00000000037 0.3663522
    ENSMUSG00000000049 NA

    ######question1: res only include one comparison "condition shRNA B vs control", how can I get the result "condition shRNA A vs control"? (sorry I checked the ?results, but did not manage to get it)
    ######question2: LRT p-value: '~ condition' vs '~ 1', I do not quite understand '~ 1'?

    when I do two comparisons individually, then this is nbinomWaldTest test:

    dds = DESeq(dds)
    resA <- results(dds, contrast=c("condition","shRNA_A","control"))
    resB <- results(dds, contrast=c("condition","shRNA_B","control"))

    The result resB is slightly different from the res (with LRT test) in both number of genes and fold change number.
    ######question3: which method shall I use?

    Thanks again for your time and help

    Leave a comment:


  • Michael Love
    replied
    Yes, as Devon states, if you want to test (control vs shRNA_A) and (control vs shRNA_B) at once you can use the LRT,

    with a condition vector like so: c("control", ..., "shRNA_A", ..., "shRNA_B", ... )

    you can use:

    Code:
    design(dds) = ~ condition
    dds = DESeq(dds, test="LRT", reduced = ~ 1)
    results(dds) gives the LRT p-values and adjusted p-values (note that the LFC column is only for one of the comparisons, see ?results for more information).

    If you want to make the two comparisons individually, it's best to run all the samples together and produce results tables like so:

    Code:
    dds = DESeq(dds)
    results(dds, contrast=c("condition","shRNA_A","control"))
    results(dds, contrast=c("condition","shRNA_B","control"))

    Leave a comment:


  • dpryan
    replied
    I'd personally go with the combined design (i.e., the last one you showed). Especially since you only have 3 samples per group, you'll likely get a bit better variance estimates with that. If you wanted to do a classic 1-way anova like test, then you'd just use nbinomLRT() and compare that against a design of ~1.

    Leave a comment:


  • Wei-HD
    started a topic Deseq2 design

    Deseq2 design

    Dear all,

    I have one question about the design of the condition: I am doing gene silencing screening with 2 different shRNA (targeted to the same gene). Before I do the RNA-seq, I know both of these two shRNA bring my gene of interest down 50% compare with control (bench data), and I want to see the differential expression genes in the knock down condition. So I have 9 samples in total with 3 biological replicates per condition:

    untreated1 control
    untreated2 control
    untreated3 control
    treatedA1 shRNA_A
    treatedA2 shRNA_A
    treatedA3 shRNA_A
    treatedB1 shRNA_B
    treatedB2 shRNA_B
    treatedB3 shRNA_B

    First, I did the Deseq2 analysis with two groups comparison: control vs shRNA_A and control vs shRNA_B.
    design as follows:
    condition
    untreated1 control
    untreated2 control
    untreated3 control
    treatedA1 shRNA_A
    treatedA2 shRNA_A
    treatedA3 shRNA_A

    condition
    untreated1 control
    untreated2 control
    untreated3 control
    treatedA1 shRNA_B
    treatedA2 shRNA_B
    treatedA3 shRNA_B


    From the two dataset, I can see my gene goes down 50% (consistent with my bench data), but there are different number of regulated genes (with padj < 0.1) in the two dataset:
    control vs shRNA_A 167 genes
    control vs shRNA_B 431 genes

    there are 50 common regulated genes in the two lists, but the non-overlapped genes have similar expression trend in the two shRNA condition (just because of the padj, those genes are not choose in the list).

    Is my analysis design correct? or shall I do another type of design like this?

    condition type
    untreated1 control untreated
    untreated2 control untreated
    untreated3 control untreated
    treatedA1 shRNA_A treated
    treatedA2 shRNA_A treated
    treatedA3 shRNA_A treated
    treatedB1 shRNA_B treated
    treatedB2 shRNA_B treated
    treatedB3 shRNA_B treated

    ideally I would like to see the genes in two shRNA treated condition together (like anova test), but I do not know how to design it?

    Thanks in advance for any advice or comments!

    Best,
    Wei

Latest Articles

Collapse

  • seqadmin
    Recent Developments in Metagenomics
    by seqadmin





    Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
    09-23-2024, 06:35 AM
  • seqadmin
    Understanding Genetic Influence on Infectious Disease
    by seqadmin




    During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

    Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
    09-09-2024, 10:59 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 04:51 AM
0 responses
8 views
0 likes
Last Post seqadmin  
Started by seqadmin, 10-01-2024, 07:10 AM
0 responses
13 views
0 likes
Last Post seqadmin  
Started by seqadmin, 09-30-2024, 08:33 AM
0 responses
18 views
0 likes
Last Post seqadmin  
Started by seqadmin, 09-26-2024, 12:57 PM
0 responses
16 views
0 likes
Last Post seqadmin  
Working...
X