when you perform edgeR analysis comparing two groups (mock vs treat), you get a list of genes with probabilities of differential expression with certain logFC. Assuming for this logFC edgeR is calculating a mean expression value for samples within each group,
(1) how can I see what value is used (by edgeR) for each gene of each group?
(2) how is this mean expression value calculated? (is it an average cpm values of samples of each group?)
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
I meant as extra columns for edgeR, not as a "leading gene thing". Stuff like a normalized mean and variability for each group. I've taken to moderating edgeR's results by removing results where the within-group variability is high, for example. (I think people are interpreting the logCPM value in that context... because they are expecting outputs that summarize group-wise summary information and don't see it.)
Leave a comment:
-
Originally posted by earonesty View PostIt *might* be informative to show a summary of original expression values and variability per group - just so you can see if edgeR's being silly and might need some tweaking ... (like when expression values are zero, or when there's outliers, or when you forgot to use tagwise dispersion, or when you used it... and would like to know what value edgeR used.... etc).
Leave a comment:
-
Originally posted by Gordon Smyth View PostBut there's nothing in the documentation that says so.
Since this is your suggestion, the onus is on you to suggest a reason why this would be more informative. For my part, I don't know even how you would define "the actual A" to be for a general contrast. For example, suppose you have contrast=c(-0.4,0.5,0,-0.1). How you would you define A and what would it tell you?
Leave a comment:
-
Originally posted by danielfortin86 View PostThanks! I was interpreting the documentation as meaning that the average was calculated for specific factor level combinations / contrasts.
Wouldn't plotting the actual "A" for a specific contrast be more informative? Presumably that can be extracted by multiplying the matrix with the particular coefficients.
What's the rationale for plotting the experiment-wide "A" rather than the condition-specific one?
My question also applies to dispersion? What about condition-specific dispersion? Any help you could provide would be greatly appreciated!
Leave a comment:
-
Thanks! I was interpreting the documentation as meaning that the average was calculated for specific factor level combinations / contrasts. Wouldn't plotting the actual "A" for a specific contrast be more informative? Presumably that can be extracted by multiplying the matrix with the particular coefficients. What's the rationale for plotting the experiment-wide "A" rather than the condition-specific one? My question also applies to dispersion? What about condition-specific dispersion? Any help you could provide would be greatly appreciated!
D
Leave a comment:
-
This is not a bug, rather you are not understanding what logCPM represents.
The help page for glmLRT says that logCPM is "the average log2-counts-per-million". The average is taken over all libraries in your dataset y, and hence is always the same regardless of the contrast being tested.
logPCM is intended as a measure of the overall expression level of the transcript, and is displayed by functions such as plotBCV().
BTW, it is not a simple average. The logCPM is computed using the edgeR function mglmOneGroup(), taking into account the estimated dispersions and the library sizes.
Gordon
Leave a comment:
-
EdgeR LogCPM Bug?
When I output the logCPM values for my genes, they are identical across contrasts. However, when I output the logFold values these seem to be correctly calculated for each contrast. Is this a bug or am I not understanding what the logCPM is that is returned by glmLRT?
Code:library(edgeR) # Read in the quantification data data <- read.table('File.txt', header = TRUE, row.names = 1) keep <- rowSums(cpm(data) > 2) >= min(summary(3)) data <- data[keep,] y <- DGEList(counts = data) y <- calcNormFactors(y) rownames(design) <- colnames(y) y <- estimateGLMCommonDisp(y, design, verbose = TRUE) y <- estimateGLMTrendedDisp(y, design) y <- estimateGLMTagwiseDisp(y, design) # Fit the linear model to the design matrix fit <- glmFit(y, design) # Make Contrast Matrix my.contrasts <- makeContrasts(....... # Different contrasts levels = design) # Loop to calculate the contrasts for (i in 1:dim(my.contrasts)[2]) { # Create the log ratios lrt <- glmLRT(y, fit, contrast = my.contrasts[,i]) tempLR <- lrt$table[2] if(exists("LogCPM")){ LogCPM< <- cbind(LogCPM, tempLR) } else{ LogCPM <- tempLR } } # rows of LogCPM are identical?!?
Latest Articles
Collapse
-
by seqadmin
The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...-
Channel: Articles
07-08-2024, 03:19 PM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Yesterday, 06:46 AM
|
0 responses
9 views
0 likes
|
Last Post
by seqadmin
Yesterday, 06:46 AM
|
||
Started by seqadmin, 07-24-2024, 11:09 AM
|
0 responses
26 views
0 likes
|
Last Post
by seqadmin
07-24-2024, 11:09 AM
|
||
Started by seqadmin, 07-19-2024, 07:20 AM
|
0 responses
160 views
0 likes
|
Last Post
by seqadmin
07-19-2024, 07:20 AM
|
||
Started by seqadmin, 07-16-2024, 05:49 AM
|
0 responses
127 views
0 likes
|
Last Post
by seqadmin
07-16-2024, 05:49 AM
|
Leave a comment: