Unconfigured Ad

**dariober** · 09-02-2013, 12:54 PM

Originally posted by puggie View Post

Dear Forum Members,
I have been analyzing correlations between ENCODE data and my own data. Specifically I have been looking at overlapping (or intersecting) coordinates between data sets to assess colocalization. Now, I would like to add some statistics to the analysis. The data is basically a number of colocalized features to a number of colocalized features generated by random iterations.

So e.g. I could have something like, 900 colocalize, 300 do not from my data, and then by random iterations 200 localize while 700 dont.

900 200

300 700

Is it strong enough to apply Fisher's exact test or should I opt for something different. I have approx. 60 of such four-value tables for which I need to determine statistical significance.

I appreciate any comments on this

Hi- Rather than applying the Fisher test, I would perform a (large) number of simulations to produce a distribution of the number of features that colocalize by chance. Then I would see where my observed proportion maps on this null distribution. If it maps towards the tails, than my observation is not due to chance.
I think the tricky problem of this approach is to produce realistic null distributions given the genome under study.

I think this paper and the associated R package GenometriCorr might be useful http://www.ncbi.nlm.nih.gov/pubmed/22693437.

Just some thoughts...
Dario

**puggie** · 09-02-2013, 01:47 PM

thanks for your reply I will look into that. I should also mention that the table I showed above is errornous.

It should be like
900 200
300 1000

So e.g. I have 1200 features, 900 hundred colocalize while 300 dont. Then by 100 random iterations (computer picking random features) I get 200 (averaged) colocalize by chance while 1000 dont.

The numbers are just examples, it was just to show that features for the random simulations are of same size as original data.

**rskr** · 09-03-2013, 06:37 AM

Originally posted by dariober View Post

Hi- Rather than applying the Fisher test, I would perform a (large) number of simulations to produce a distribution of the number of features that colocalize by chance. Then I would see where my observed proportion maps on this null distribution. If it maps towards the tails, than my observation is not due to chance.
I think the tricky problem of this approach is to produce realistic null distributions given the genome under study.

I think this paper and the associated R package GenometriCorr might be useful http://www.ncbi.nlm.nih.gov/pubmed/22693437.

Just some thoughts...
Dario

That is invalid, since you have to pick a distribution to sample from "randomly" IE uniform, normal, poisson etc.

**dariober** · 09-03-2013, 06:50 AM

Originally posted by rskr View Post

That is invalid, since you have to pick a distribution to sample from "randomly" IE uniform, normal, poisson etc.

Hi- I agree, when I said that one has to pick a realistic null distribution I referred to this problem. At most you can say that the observed data doesn't come from a random uniform, normal or whatever distribution. Does it make sense? (I'd like to hear more opinions about the question puggie posted)

Topics	Statistics	Last Post
High-Resolution Sequencing Exposes Hidden Toxoplasma Diversity by SEQadmin2 Started by SEQadmin2, Yesterday, 11:08 AM	0 responses 6 views 0 reactions	Last Post by SEQadmin2 Yesterday, 11:08 AM
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, 06-30-2026, 05:37 AM	0 responses 11 views 0 reactions	Last Post by SEQadmin2 06-30-2026, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 19 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 53 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM

Unconfigured Ad

Proper statistical test for high-throughput data correlations?

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News