Unconfigured Ad

**dpryan** · 09-14-2013, 12:06 PM

Well, it makes sense to just use Fisher's test, which uses a hypergeometric distribution but more directly does what you apparently want to do. So something like:

Code:

d <- as.matrix(read.csv("help.csv", row.names=1))
totals <- apply(d, 2, sum)
pvals <- c(rep(NA, nrow(d)))
for(i in seq(nrow(d))) {
    pvals[i] <- fisher.test(matrix(c(d[i,1], totals[1]-d[i,1], d[i,2], totals[2]-d[i,2]), nrow=2))$p.value
}
padj <- p.adjust(pvals)

The adjusted p-values for OSP_8.100.Spring.Plain vs. CH_1.Crater.Hills are then in the padj vector (the order is the same as in your csv file). In R, loops are actually pretty slow, so you could use "apply" instead to make things faster if you have more data. For the other comparisons, just change the "pvals[i] <- ..." line appropriately.

**JackieBadger** · 09-14-2013, 02:17 PM

http://www.biomedcentral.com/1756-0500/3/10

**dpryan** · 09-16-2013, 12:14 AM

Actually, the more I think about it, the more I wonder if this is really the correct solution for you. Basically, using Fisher's exact test as is will work fine if the counts for COG A don't affect those for COG B. You could come up with an example where just one COG was actually different and the others were the same, however this could result in the observed counts for each COG being different do to using the total counts. This is actually why DESeq and the other packages oriented toward differential expression analysis of RNAseq experiments use different library size normalizations. I suspect that you might actually have to go more along those routes, but it's difficult to say without knowing more about exactly how these counts were created. You might just reply to Simon Ander's question in the other thread, since I suspect he had similar thoughts.

**SDPA_Pet** · 09-16-2013, 07:14 AM

Originally posted by dpryan View Post

Actually, the more I think about it, the more I wonder if this is really the correct solution for you. Basically, using Fisher's exact test as is will work fine if the counts for COG A don't affect those for COG B. You could come up with an example where just one COG was actually different and the others were the same, however this could result in the observed counts for each COG being different do to using the total counts. This is actually why DESeq and the other packages oriented toward differential expression analysis of RNAseq experiments use different library size normalizations. I suspect that you might actually have to go more along those routes, but it's difficult to say without knowing more about exactly how these counts were created. You might just reply to Simon Ander's question in the other thread, since I suspect he had similar thoughts.

Hi. Do you mean the COG B = COG category? I just give you an example. Actually, I will do all COG functions. It is a big data set. Thousands of functions. I use COG category here, instead.

**dpryan** · 09-16-2013, 07:25 AM

Originally posted by SDPA_Pet View Post

Hi. Do you mean the COG B = COG category? I just give you an example. Actually, I will do all COG functions. It is a big data set. Thousands of functions. I use COG category here, instead.

It doesn't really matter if these are categories or functions, my concern would hold either way.

Topics	Statistics	Last Post
New Analysis Splits Leukemia Into 16 Epigenomic Subgroups by SEQadmin2 Started by SEQadmin2, 07-09-2026, 10:04 AM	0 responses 11 views 0 reactions	Last Post by SEQadmin2 07-09-2026, 10:04 AM
Genome-Wide CRISPR Screen Uncovers Unlikely Psoriasis Target by SEQadmin2 Started by SEQadmin2, 07-08-2026, 10:08 AM	0 responses 9 views 0 reactions	Last Post by SEQadmin2 07-08-2026, 10:08 AM
Engineered Protein Motor Takes Its First Steps Along DNA Track by SEQadmin2 Started by SEQadmin2, 07-07-2026, 11:05 AM	0 responses 17 views 0 reactions	Last Post by SEQadmin2 07-07-2026, 11:05 AM
High-Resolution Sequencing Exposes Hidden Toxoplasma Diversity by SEQadmin2 Started by SEQadmin2, 07-02-2026, 11:08 AM	0 responses 31 views 0 reactions	Last Post by SEQadmin2 07-02-2026, 11:08 AM

Unconfigured Ad

Find representative genes

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News