Originally posted by dpryan
View Post
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Originally posted by FJwlf View PostHi everyone!
just a silly question... why is a (-) and not a division (/) to calculate the rlogFC;
res <- data.frame(
assay(rld),
avgLogExpr = ( assay(rld)[,2] + assay(rld)[,1] ) / 2,
rLogFC = assay(rld)[,2] / assay(rld)[,1] )
Thank you!!
Leave a comment:
-
Originally posted by Simon Anders View PostTo be honest, we couldn't yet be bothered to explain how to analyse such data in DESeq2. It's tricky to write up because too many people will misinterpret whatever I write as if it were actually possible to conduct a meaningful statistical analysis when comparing just two samples.
So, if you promise to not use any such comparisons for actual science, here is how you do it:
Start as above:
Code:library(DESeq2) library(pasilla) data("pasillaGenes") countData <- counts(pasillaGenes) countData<-countData[,c("treated1fb","untreated1fb")] colData <- pData(pasillaGenes)[c("treated1fb","untreated1fb"),c("condition","type")] dds <- DESeqDataSetFromMatrix( countData = countData, colData = colData, design = ~ condition)
Code:rld <- rlogTransformation( dds )
Code:res <- data.frame( assay(rld), avgLogExpr = ( assay(rld)[,2] + assay(rld)[,1] ) / 2, rLogFC = assay(rld)[,2] - assay(rld)[,1] )
Code:> head( res[ order(res$rLogFC), ] ) treated1fb untreated1fb avgLogExpr rLogFC FBgn0011260 7.830359 6.627326 7.228842 -1.203033 FBgn0001226 10.128636 8.929985 9.529311 -1.198652 FBgn0034718 8.503006 7.315640 7.909323 -1.187366 FBgn0003501 7.927864 6.743974 7.335919 -1.183889 FBgn0033635 11.126300 9.973979 10.550139 -1.152321 FBgn0033367 13.411814 12.269436 12.840625 -1.142378
The advantage of this procedure is that it does not produce any p values (which would be misleading anyway).
just a silly question... why is a (-) and not a division (/) to calculate the rlogFC;
res <- data.frame(
assay(rld),
avgLogExpr = ( assay(rld)[,2] + assay(rld)[,1] ) / 2,
rLogFC = assay(rld)[,2] / assay(rld)[,1] )
Thank you!!
Leave a comment:
-
Hi,
Can anyone advise me if its okay to normalize my data-set before mapping my reads to the genome?
Thanks
Leave a comment:
-
Hi
I want to use the DESEQ package between a control (3 biological replicates) and treatment (1 biological replicate).
IN DESeq I herefore used the following code, and got 266 genes with padj < 0.05:
table <- read.delim("test.txt")
row.names(table) <- table$Feature_ID
count_table <- table[, -1]
conds <- c("ctrl", "ctrl", "ctrl", "treatment")
cds <- newCountDataSet(count_table, conds)
cds <- estimateSizeFactors(cds)
cds <- estimateDispersions(cds, method="blind", sharingMode="fit-only")
results <- nbinomTest(cds, "ctrl", "treatment")
In DESeq2 I used the follwing command, but got > 10000 genes with padj < 0.05:
table <- read.delim("test.txt")
row.names(table) <- table$Feature_ID
count_table <- table[, -1]
colData <- DataFrame(condition=factor(c("ctrl", "ctrl", "ctrl", "treatment")))
dds <- DESeqDataSetFromMatrix(count_table, colData, formula(~ condition))
results <- DESeq(dds, minReplicatesForReplace=Inf)
So probably I need to add extra parameters to the DESEQ2 analysis but for now I can't figure out how?
Thank you for helping
WannesLast edited by tompoes; 12-02-2015, 12:31 PM.
Leave a comment:
-
Originally posted by whataBamBam View PostYeah the first part is what I meant - well kind of. Yes a lower power test makes it more difficult to observe significant results - but we observe them. So the test had enough power to detect the differences it detected but there could be other differences it did not detect because it did not have enough power. This is what I mean by saying it's conservative.
The next part I'm less sure about.. but I think this paradox, Lindleys paradox would only apply if there were a very large number of replicates? Which we aren't ever likely to see.
Leave a comment:
-
Originally posted by rskr View PostIf the hypothesis test was significant, this would indicate that there isn't a problem with power, since a lower powered test should make it more difficult to get significant results, if the test was actually answering the question you were asking. Though in theory this does seem a little confusing, because getting a hypothesis to fit thousands of sample should be harder than just a few samples, however with thousands of samples the means in the population can be known very accurately, so even trivial differences like color between two placebos can be significant.
http://en.wikipedia.org/wiki/Lindley's_paradox
The next part I'm less sure about.. but I think this paradox, Lindleys paradox would only apply if there were a very large number of replicates? Which we aren't ever likely to see.
Leave a comment:
-
Originally posted by whataBamBam View PostGreat. Actually my original interpretation (before I posted this) was correct then. That the p values are perfectly valid (in fact conservative) and the problem of no replicates is actually low statistical power.
So basically you are saying that you have less statistical power because you have overestimated the variance. And if you see significant differences DESPITE this low statistical power then go for it.
To be fair it says in the vignette (or the paper I can't remember which) that there is simply low statistical power if you have no replicates.
Leave a comment:
-
Originally posted by Michael Love View PostThe section of the original DESeq paper might shed some light:
"Working without replicates
DESeq allows analysis of experiments with no biological replicates in one or even both of the conditions. While one may not want to draw strong conclusions from such an analysis, it may still be useful for exploration and hypothesis generation. If replicates are available only for one of the conditions, one might choose to assume that the variance-mean dependence estimated from the data for that condition holds as well for the unreplicated one. If neither condition has replicates, one can still perform an analysis based on the assumption that for most genes, there is no true differential expression, and that a valid mean-variance relationship can be estimated from treating the two samples as if they were replicates. A minority of differentially abundant genes will act as outliers; however, they will not have a severe impact on the gamma-family GLM fit, as the gamma distribution for low values of the shape parameter has a heavy right-hand tail. Some overestimation of the variance may be expected, which will make that approach conservative."
So basically you are saying that you have less statistical power because you have overestimated the variance. And if you see significant differences DESPITE this low statistical power then go for it.
To be fair it says in the vignette (or the paper I can't remember which) that there is simply low statistical power if you have no replicates.
Leave a comment:
-
Hello,
I deviate a bit from the discussion but my question fits well to this topic.
I am analyzing RNASeq data of 2 conditions and 7 replicates per condition. I performed the analysis with DESeq2, this is ok.
In one condition, two samples come from the same patient with a leukemia. At the second time, the patient is in relapse.
I would like to find what changed between these 2 time points. Of course, it can be due to additional somatic mutations. I want to study the differences on RNA as well.
I think about performing a DE analysis between these 2 "conditions" (remission, relapse). Do you think it is meaningful to perform this analysis, knowing there is only one patient?
How are you use to handle these cases? It is very patient-dependent I guess, I don't know if we could see general changes when using small groups of patients in remission and relapse.
Leave a comment:
-
I wouldn't say that it is suspect to return low p-values. The point of using most software specifically designed for microarray or sequencing data, rather than just running linear or generalized linear models row-by-row, is that information can be pooled across genes.
for example, if you observe two random Normal variables and want to know if they have different means, you're out of luck. But if I provide you the population variance then you can compute a probability of seeing such a difference. Here we have an in-between case, we can learn something about the variance over all genes, but are not provided the population variance for a given gene. The question is how much can we learn about the variance of a single gene by looking over all genes. Are most of the genes not differentially expressed, such that one can learn about the mean-variance relationship over the majority of genes? The paragraph I quote above points out that the behavior will be conservative, in overestimating the variance of all genes by including some number of differentially expressed genes.
Leave a comment:
-
Originally posted by whataBamBam View PostThere are loads of threads on here about DeSeq and experiments with no replicates. And Simon Anders has been very patient coming on here and repeatedly explaining why they are a bad idea.
So I have come along after an experiment has been done to have a second look at the data (some bioinformatics already done by the sequencing provider) and.. yes.. they didn't have any replicates!
The bioinformatics guys that came before used DeSeq and they DID find differentially expressed transcripts. Even with adjusted p-values.
So just to be clear - I should be telling these guys they can't trust the p-values anyway since they have no replicates and to just look at fold change? We're going for just taking the top 200 absaolute fold change (for now) and confirming with qPCR.
Leave a comment:
-
The section of the original DESeq paper might shed some light:
"Working without replicates
DESeq allows analysis of experiments with no biological replicates in one or even both of the conditions. While one may not want to draw strong conclusions from such an analysis, it may still be useful for exploration and hypothesis generation. If replicates are available only for one of the conditions, one might choose to assume that the variance-mean dependence estimated from the data for that condition holds as well for the unreplicated one. If neither condition has replicates, one can still perform an analysis based on the assumption that for most genes, there is no true differential expression, and that a valid mean-variance relationship can be estimated from treating the two samples as if they were replicates. A minority of differentially abundant genes will act as outliers; however, they will not have a severe impact on the gamma-family GLM fit, as the gamma distribution for low values of the shape parameter has a heavy right-hand tail. Some overestimation of the variance may be expected, which will make that approach conservative."
Leave a comment:
-
There are loads of threads on here about DeSeq and experiments with no replicates. And Simon Anders has been very patient coming on here and repeatedly explaining why they are a bad idea.
So I have come along after an experiment has been done to have a second look at the data (some bioinformatics already done by the sequencing provider) and.. yes.. they didn't have any replicates!
The bioinformatics guys that came before used DeSeq and they DID find differentially expressed transcripts. Even with adjusted p-values.
So just to be clear - I should be telling these guys they can't trust the p-values anyway since they have no replicates and to just look at fold change? We're going for just taking the top 200 absaolute fold change (for now) and confirming with qPCR.
Leave a comment:
Latest Articles
Collapse
-
by seqadmin
This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.
The Headliner
The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...-
Channel: Articles
03-03-2025, 01:39 PM -
-
by seqadmin
The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...-
Channel: Articles
02-24-2025, 06:31 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 12:50 PM
|
0 responses
10 views
0 likes
|
Last Post
by seqadmin
Today, 12:50 PM
|
||
Started by seqadmin, 03-03-2025, 01:15 PM
|
0 responses
181 views
0 likes
|
Last Post
by seqadmin
03-03-2025, 01:15 PM
|
||
Started by seqadmin, 02-28-2025, 12:58 PM
|
0 responses
275 views
0 likes
|
Last Post
by seqadmin
02-28-2025, 12:58 PM
|
||
Started by seqadmin, 02-24-2025, 02:48 PM
|
0 responses
663 views
0 likes
|
Last Post
by seqadmin
02-24-2025, 02:48 PM
|
Leave a comment: