Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • puggie
    Member
    • Nov 2011
    • 52

    Proper statistical test for high-throughput data correlations?

    Dear Forum Members,
    I have been analyzing correlations between ENCODE data and my own data. Specifically I have been looking at overlapping (or intersecting) coordinates between data sets to assess colocalization. Now, I would like to add some statistics to the analysis. The data is basically a number of colocalized features to a number of colocalized features generated by random iterations.

    So e.g. I could have something like, 900 colocalize, 300 do not from my data, and then by random iterations 200 localize while 700 dont.

    900 200

    300 700

    Is it strong enough to apply Fisher's exact test or should I opt for something different. I have approx. 60 of such four-value tables for which I need to determine statistical significance.

    I appreciate any comments on this
  • dariober
    Senior Member
    • May 2010
    • 311

    #2
    Originally posted by puggie View Post
    Dear Forum Members,
    I have been analyzing correlations between ENCODE data and my own data. Specifically I have been looking at overlapping (or intersecting) coordinates between data sets to assess colocalization. Now, I would like to add some statistics to the analysis. The data is basically a number of colocalized features to a number of colocalized features generated by random iterations.

    So e.g. I could have something like, 900 colocalize, 300 do not from my data, and then by random iterations 200 localize while 700 dont.

    900 200

    300 700

    Is it strong enough to apply Fisher's exact test or should I opt for something different. I have approx. 60 of such four-value tables for which I need to determine statistical significance.

    I appreciate any comments on this
    Hi- Rather than applying the Fisher test, I would perform a (large) number of simulations to produce a distribution of the number of features that colocalize by chance. Then I would see where my observed proportion maps on this null distribution. If it maps towards the tails, than my observation is not due to chance.
    I think the tricky problem of this approach is to produce realistic null distributions given the genome under study.

    I think this paper and the associated R package GenometriCorr might be useful http://www.ncbi.nlm.nih.gov/pubmed/22693437.

    Just some thoughts...
    Dario

    Comment

    • puggie
      Member
      • Nov 2011
      • 52

      #3
      thanks for your reply I will look into that. I should also mention that the table I showed above is errornous.

      It should be like
      900 200
      300 1000

      So e.g. I have 1200 features, 900 hundred colocalize while 300 dont. Then by 100 random iterations (computer picking random features) I get 200 (averaged) colocalize by chance while 1000 dont.

      The numbers are just examples, it was just to show that features for the random simulations are of same size as original data.

      Comment

      • rskr
        Senior Member
        • Oct 2010
        • 249

        #4
        Originally posted by dariober View Post
        Hi- Rather than applying the Fisher test, I would perform a (large) number of simulations to produce a distribution of the number of features that colocalize by chance. Then I would see where my observed proportion maps on this null distribution. If it maps towards the tails, than my observation is not due to chance.
        I think the tricky problem of this approach is to produce realistic null distributions given the genome under study.

        I think this paper and the associated R package GenometriCorr might be useful http://www.ncbi.nlm.nih.gov/pubmed/22693437.

        Just some thoughts...
        Dario
        That is invalid, since you have to pick a distribution to sample from "randomly" IE uniform, normal, poisson etc.

        Comment

        • dariober
          Senior Member
          • May 2010
          • 311

          #5
          Originally posted by rskr View Post
          That is invalid, since you have to pick a distribution to sample from "randomly" IE uniform, normal, poisson etc.
          Hi- I agree, when I said that one has to pick a realistic null distribution I referred to this problem. At most you can say that the observed data doesn't come from a random uniform, normal or whatever distribution. Does it make sense? (I'd like to hear more opinions about the question puggie posted)

          Comment

          Latest Articles

          Collapse

          • GATTACAT
            Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
            by GATTACAT
            Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
            07-01-2026, 11:43 AM
          • SEQadmin2
            Nine Things a Sample Prep Scientist Thinks About Before Sequencing
            by SEQadmin2


            I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

            Here are nine questions we think about, in roughly the order they matter, before...
            06-18-2026, 07:11 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, Yesterday, 11:08 AM
          0 responses
          6 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-30-2026, 05:37 AM
          0 responses
          11 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-26-2026, 11:10 AM
          0 responses
          19 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-17-2026, 06:09 AM
          0 responses
          53 views
          0 reactions
          Last Post SEQadmin2  
          Working...