Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • RockChalkJayhawk
    replied
    Originally posted by Simon Anders View Post
    Let's say you have four different condition, each with two replicates: A1, A2, B1, B2, C1, C2, D1, D2.

    Then, you can use DESeq as it is to do contrasts such as 'A vs B', 'A vs C', 'B vs D', etc.

    What does not yet work is interaction contrasts. So, let's say, you have the conditions wt, A, B, AB; where wt is the wild-type, A and B are some treatments or mutations, and AB is the combination of both. You now might either be interested in simple contrasts, as above, which is fine. Or you want to check for interactions, i.e., you say that the effect of A is given by the difference dA between A and wt (dA = A - wt) and similar for B (dB = B - wt) and AB (dAB = AB - wt), and you now wonder whether dAB = dA + dB. To check this, you need an interaction contrast, and this is something that I am currently working on, but I might still need a few weeks.

    For now, you can use variance-stabilize the data (function 'getVarianceStabilizedData) and then use an ordinary linear model or the 'limma' package to check for interaction. That works fine but the power is reduced.

    Just using the raw count data, scaled by the library sizes, is not a good idea, if you intend to feed it a linear model. Ordinary least squares regression requires homoscedasticity, and count data is heteroscedastic. The point of a variance-stabilizing transformation is precisely to remedy that.
    Whoops! I just saw it in your vignette. That's what I need since I'm actually doing a time-course. It'll just make it easier to generate some figures.

    Leave a comment:


  • Simon Anders
    replied
    Let's say you have four different condition, each with two replicates: A1, A2, B1, B2, C1, C2, D1, D2.

    Then, you can use DESeq as it is to do contrasts such as 'A vs B', 'A vs C', 'B vs D', etc.

    What does not yet work is interaction contrasts. So, let's say, you have the conditions wt, A, B, AB; where wt is the wild-type, A and B are some treatments or mutations, and AB is the combination of both. You now might either be interested in simple contrasts, as above, which is fine. Or you want to check for interactions, i.e., you say that the effect of A is given by the difference dA between A and wt (dA = A - wt) and similar for B (dB = B - wt) and AB (dAB = AB - wt), and you now wonder whether dAB = dA + dB. To check this, you need an interaction contrast, and this is something that I am currently working on, but I might still need a few weeks.

    For now, you can use variance-stabilize the data (function 'getVarianceStabilizedData) and then use an ordinary linear model or the 'limma' package to check for interaction. That works fine but the power is reduced.

    Just using the raw count data, scaled by the library sizes, is not a good idea, if you intend to feed it a linear model. Ordinary least squares regression requires homoscedasticity, and count data is heteroscedastic. The point of a variance-stabilizing transformation is precisely to remedy that.

    Leave a comment:


  • RockChalkJayhawk
    replied
    Multi-sample comaprison

    Simon,

    Lets say I want to compare 4 samples each with 2 replicates (an ANOVA for example). As of now, DESeq doesn't have this functionality correct?

    Is there a way I can export the entire matrix of normalized values rather than making the 1X1 comparison? For example, would it be correct to multiply all of the values in T1b by 0.5587394, or T2 by 1.5823096, etc? (these represent the column names ans sizeFactors from the DESeq vignette)

    Leave a comment:


  • Simon Anders
    replied
    If you followed the example in the vignette, you have come across the line

    resSig <- res[ res$padj < .1, ]

    This is where we select the genes, which we want to consider significant. res$padj is the Benjamini-Hoch-berg-adjusted p value, and selecting all genes with padj below 0.1 controls the FDR at 10%. To get 1%, just put .01 here.

    Leave a comment:


  • gen2prot
    replied
    Hi Simon,

    How can I set the FDR at 1% instead of 10%. Is there a way of doing this through the nbinomTest function? I am not an R programmer so I do not know.

    Thanks
    Abhijit

    Leave a comment:


  • Simon Anders
    replied
    The same lines.

    A false dicovery rate (FDR) of 10% means that (in expectation) 10% of your hits are false positives. To get FDR control at 10%, you adjust the raw p values for multiple testing with an FDR-controlling method such as the one by Benjamini and Hochber or the one by Storey, and the take all those genes with an adjusted p value below the desired FDR. DESeq does this for you, using the Benjamini-Hochberg method.

    Controlling at FDR 10% is a common, but, of course, arbitrary choice. It is up to you to decide how stringent you want to be.

    The concept of FDR has been introduced by Benjamini and Hochberg in 1995 (J Roy Stat Soc B 57 289). Since then, there has been a lot of research on how best to adjust p values to allow for FDR control, and how well this works in case of correlation.

    Leave a comment:


  • gen2prot
    replied
    Hi Simon,

    Thank you for your help. I got DESeq to work. However, I am trying to figure out a good value to use for the FDR cutoff. This maybe a very stupid question, but I'll ask anyways. Is an FDR of 10% more stringent than an FDR of 1%. I am thinking in terms of p-values where lesser numbers are higher stringency. Should I imagine FDR along the same lines or the opposite.

    Thanks
    Abhijit

    Leave a comment:


  • lpachter
    replied
    Dear Wolfgang,

    Thank you for pointing out my typo- namely that "measurement" should have been "estimate".

    Leave a comment:


  • Wolfgang Huber
    replied
    Originally posted by lpachter View Post
    I'd like to clarify one point here:

    ...for a measurement to be useful (whatever units it is in) it is important to know its variance. ....
    While we're at it...: there is also no such thing as "the variance of a measurement". There is the sample variance of a set of measurements; and there is the variance of a statistical ensemble (physicists' language) or distribution in a probabilistic model (mathematicians' language), whose "true" value we often do not know, but which we can aim to estimate from data.

    The choice of ensemble matters: repeated measurements on the same sample using the same machine will have smaller variance than repeated measurements using different machines, or different biological replicates, e.g. different cell passages.

    Leave a comment:


  • Simon Anders
    replied
    Hi Abhijit

    have you followed the instructions on http://www-huber.embl.de/users/anders/DESeq/ ?

    If so, please send me the output from the installation. The real error message should be hidden in there.

    Cheers
    Simon

    Leave a comment:


  • gen2prot
    replied
    Hi Simon,

    I am trying to install DESeq on MAC OSX Leopard. I get the error message "In getDependencies(pkgs, dependencies, available, lib) : package ‘DESeq’ is not available". Any clue why this is happening. I found that DESeq needs BioBase to be installed. I have done that, and yet I get this error. Any ideas?

    Thanks
    Abhijit

    Leave a comment:


  • Boel
    replied
    Originally posted by Cole Trapnell View Post
    One more thing: when cuffdiff reports the results of differential testing, it's being performed on the actual counts (after log transformation) for the reasons that Simon suggests.
    Is there a way to output the actual counts?

    Leave a comment:


  • steven
    replied
    Originally posted by lpachter View Post
    There is no such thing as "FPKM data" (or for that matter "RPKM data"). The only data in an RNA-Seq experiment is the read counts.
    I heard from people working on alignment that "the only data is the read sequences" and from people working on base calling that "the only data is the raw images".. at least, one consensus: for sure that makes a lot of data

    Leave a comment:


  • RockChalkJayhawk
    replied
    Originally posted by Cole Trapnell View Post
    One more thing: when cuffdiff reports the results of differential testing, it's being performed on the actual counts (after log transformation) for the reasons that Simon suggests.
    Cole,

    Do you think it would be a good idea to use quantile normalization on tag counts before running cuffdiff or is this unecessary given that it didn't actually use RPKM values (which I didn't know, but am glad to hear)?

    Leave a comment:


  • Cole Trapnell
    replied
    One more thing: when cuffdiff reports the results of differential testing, it's being performed on the actual counts (after log transformation) for the reasons that Simon suggests.

    Leave a comment:

Latest Articles

Collapse

  • seqadmin
    Essential Discoveries and Tools in Epitranscriptomics
    by seqadmin




    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
    04-22-2024, 07:01 AM
  • seqadmin
    Current Approaches to Protein Sequencing
    by seqadmin


    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
    04-04-2024, 04:25 PM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 11:49 AM
0 responses
12 views
0 likes
Last Post seqadmin  
Started by seqadmin, Yesterday, 08:47 AM
0 responses
16 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-11-2024, 12:08 PM
0 responses
61 views
0 likes
Last Post seqadmin  
Started by seqadmin, 04-10-2024, 10:19 PM
0 responses
60 views
0 likes
Last Post seqadmin  
Working...
X