Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Proper statistical test for high-throughput data correlations?

    Dear Forum Members,
    I have been analyzing correlations between ENCODE data and my own data. Specifically I have been looking at overlapping (or intersecting) coordinates between data sets to assess colocalization. Now, I would like to add some statistics to the analysis. The data is basically a number of colocalized features to a number of colocalized features generated by random iterations.

    So e.g. I could have something like, 900 colocalize, 300 do not from my data, and then by random iterations 200 localize while 700 dont.

    900 200

    300 700

    Is it strong enough to apply Fisher's exact test or should I opt for something different. I have approx. 60 of such four-value tables for which I need to determine statistical significance.

    I appreciate any comments on this

  • #2
    Originally posted by puggie View Post
    Dear Forum Members,
    I have been analyzing correlations between ENCODE data and my own data. Specifically I have been looking at overlapping (or intersecting) coordinates between data sets to assess colocalization. Now, I would like to add some statistics to the analysis. The data is basically a number of colocalized features to a number of colocalized features generated by random iterations.

    So e.g. I could have something like, 900 colocalize, 300 do not from my data, and then by random iterations 200 localize while 700 dont.

    900 200

    300 700

    Is it strong enough to apply Fisher's exact test or should I opt for something different. I have approx. 60 of such four-value tables for which I need to determine statistical significance.

    I appreciate any comments on this
    Hi- Rather than applying the Fisher test, I would perform a (large) number of simulations to produce a distribution of the number of features that colocalize by chance. Then I would see where my observed proportion maps on this null distribution. If it maps towards the tails, than my observation is not due to chance.
    I think the tricky problem of this approach is to produce realistic null distributions given the genome under study.

    I think this paper and the associated R package GenometriCorr might be useful http://www.ncbi.nlm.nih.gov/pubmed/22693437.

    Just some thoughts...
    Dario

    Comment


    • #3
      thanks for your reply I will look into that. I should also mention that the table I showed above is errornous.

      It should be like
      900 200
      300 1000

      So e.g. I have 1200 features, 900 hundred colocalize while 300 dont. Then by 100 random iterations (computer picking random features) I get 200 (averaged) colocalize by chance while 1000 dont.

      The numbers are just examples, it was just to show that features for the random simulations are of same size as original data.

      Comment


      • #4
        Originally posted by dariober View Post
        Hi- Rather than applying the Fisher test, I would perform a (large) number of simulations to produce a distribution of the number of features that colocalize by chance. Then I would see where my observed proportion maps on this null distribution. If it maps towards the tails, than my observation is not due to chance.
        I think the tricky problem of this approach is to produce realistic null distributions given the genome under study.

        I think this paper and the associated R package GenometriCorr might be useful http://www.ncbi.nlm.nih.gov/pubmed/22693437.

        Just some thoughts...
        Dario
        That is invalid, since you have to pick a distribution to sample from "randomly" IE uniform, normal, poisson etc.

        Comment


        • #5
          Originally posted by rskr View Post
          That is invalid, since you have to pick a distribution to sample from "randomly" IE uniform, normal, poisson etc.
          Hi- I agree, when I said that one has to pick a realistic null distribution I referred to this problem. At most you can say that the observed data doesn't come from a random uniform, normal or whatever distribution. Does it make sense? (I'd like to hear more opinions about the question puggie posted)

          Comment

          Latest Articles

          Collapse

          • seqadmin
            Essential Discoveries and Tools in Epitranscriptomics
            by seqadmin




            The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
            04-22-2024, 07:01 AM
          • seqadmin
            Current Approaches to Protein Sequencing
            by seqadmin


            Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
            04-04-2024, 04:25 PM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by seqadmin, 04-25-2024, 11:49 AM
          0 responses
          19 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-24-2024, 08:47 AM
          0 responses
          17 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-11-2024, 12:08 PM
          0 responses
          62 views
          0 likes
          Last Post seqadmin  
          Started by seqadmin, 04-10-2024, 10:19 PM
          0 responses
          60 views
          0 likes
          Last Post seqadmin  
          Working...
          X