Seqanswers Leaderboard Ad

**dariober** · 09-02-2013, 12:54 PM

Originally posted by puggie View Post

Dear Forum Members,
I have been analyzing correlations between ENCODE data and my own data. Specifically I have been looking at overlapping (or intersecting) coordinates between data sets to assess colocalization. Now, I would like to add some statistics to the analysis. The data is basically a number of colocalized features to a number of colocalized features generated by random iterations.

So e.g. I could have something like, 900 colocalize, 300 do not from my data, and then by random iterations 200 localize while 700 dont.

900 200

300 700

Is it strong enough to apply Fisher's exact test or should I opt for something different. I have approx. 60 of such four-value tables for which I need to determine statistical significance.

I appreciate any comments on this

Hi- Rather than applying the Fisher test, I would perform a (large) number of simulations to produce a distribution of the number of features that colocalize by chance. Then I would see where my observed proportion maps on this null distribution. If it maps towards the tails, than my observation is not due to chance.
I think the tricky problem of this approach is to produce realistic null distributions given the genome under study.

I think this paper and the associated R package GenometriCorr might be useful http://www.ncbi.nlm.nih.gov/pubmed/22693437.

Just some thoughts...
Dario

**puggie** · 09-02-2013, 01:47 PM

thanks for your reply I will look into that. I should also mention that the table I showed above is errornous.

It should be like
900 200
300 1000

So e.g. I have 1200 features, 900 hundred colocalize while 300 dont. Then by 100 random iterations (computer picking random features) I get 200 (averaged) colocalize by chance while 1000 dont.

The numbers are just examples, it was just to show that features for the random simulations are of same size as original data.

**rskr** · 09-03-2013, 06:37 AM

Originally posted by dariober View Post

Hi- Rather than applying the Fisher test, I would perform a (large) number of simulations to produce a distribution of the number of features that colocalize by chance. Then I would see where my observed proportion maps on this null distribution. If it maps towards the tails, than my observation is not due to chance.
I think the tricky problem of this approach is to produce realistic null distributions given the genome under study.

I think this paper and the associated R package GenometriCorr might be useful http://www.ncbi.nlm.nih.gov/pubmed/22693437.

Just some thoughts...
Dario

That is invalid, since you have to pick a distribution to sample from "randomly" IE uniform, normal, poisson etc.

**dariober** · 09-03-2013, 06:50 AM

Originally posted by rskr View Post

That is invalid, since you have to pick a distribution to sample from "randomly" IE uniform, normal, poisson etc.

Hi- I agree, when I said that one has to pick a realistic null distribution I referred to this problem. At most you can say that the observed data doesn't come from a random uniform, normal or whatever distribution. Does it make sense? (I'd like to hear more opinions about the question puggie posted)

Topics	Statistics	Last Post
Expanding the Horizons of Cellular Research with the Single Cell Atlas by seqadmin Started by seqadmin, 04-25-2024, 11:49 AM	0 responses 19 views 0 likes	Last Post by seqadmin 04-25-2024, 11:49 AM
Genetic Variants and Diabetes Risk in Childhood Cancer Survivors by seqadmin Started by seqadmin, 04-24-2024, 08:47 AM	0 responses 17 views 0 likes	Last Post by seqadmin 04-24-2024, 08:47 AM
Cancer Metastasis: A Deep Dive into Cellular Plasticity by seqadmin Started by seqadmin, 04-11-2024, 12:08 PM	0 responses 62 views 0 likes	Last Post by seqadmin 04-11-2024, 12:08 PM
Proteogenomic Profiles Offer New Clues in Prostate Cancer by seqadmin Started by seqadmin, 04-10-2024, 10:19 PM	0 responses 60 views 0 likes	Last Post by seqadmin 04-10-2024, 10:19 PM

Seqanswers Leaderboard Ad

Announcement

Proper statistical test for high-throughput data correlations?

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News