Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Find representative genes

    Hi, I still don't know how to use the find over representative genes via phyper (someone recommends this command) or other R package. I attached a csv file as an example. Can anyone write a R scripts for me with my dataset.

    In my dataset, the first row are my sample name. The first column is COG category ID. The numbers are gene counts.

    Thank you.
    Attached Files

  • #2
    Well, it makes sense to just use Fisher's test, which uses a hypergeometric distribution but more directly does what you apparently want to do. So something like:
    Code:
    d <- as.matrix(read.csv("help.csv", row.names=1))
    totals <- apply(d, 2, sum)
    pvals <- c(rep(NA, nrow(d)))
    for(i in seq(nrow(d))) {
        pvals[i] <- fisher.test(matrix(c(d[i,1], totals[1]-d[i,1], d[i,2], totals[2]-d[i,2]), nrow=2))$p.value
    }
    padj <- p.adjust(pvals)
    The adjusted p-values for OSP_8.100.Spring.Plain vs. CH_1.Crater.Hills are then in the padj vector (the order is the same as in your csv file). In R, loops are actually pretty slow, so you could use "apply" instead to make things faster if you have more data. For the other comparisons, just change the "pvals[i] <- ..." line appropriately.

    Comment


    • #3
      Background Gene-list annotations are critical for researchers to explore the complex relationships between genes and functionalities. Currently, the annotations of a gene list are usually summarized by a table or a barplot. As such, potentially biologically important complexities such as one gene belonging to multiple annotation categories are difficult to extract. We have devised explicit and efficient visualization methods that provide intuitive methods for interrogating the intrinsic connections between biological categories and genes. Findings We have constructed a data model and now present two novel methods in a Bioconductor package, "GeneAnswers", to simultaneously visualize genes, concepts (a.k.a. annotation categories), and concept-gene connections (a.k.a. annotations): the "Concept-and-Gene Network" and the "Concept-and-Gene Cross Tabulation". These methods have been tested and validated with microarray-derived gene lists. Conclusions These new visualization methods can effectively present annotations using Gene Ontology, Disease Ontology, or any other user-defined gene annotations that have been pre-associated with an organism's genome by human curation, automated pipelines, or a combination of the two. The gene-annotation data model and associated methods are available in the Bioconductor package called "GeneAnswers " described in this publication.

      Comment


      • #4
        Actually, the more I think about it, the more I wonder if this is really the correct solution for you. Basically, using Fisher's exact test as is will work fine if the counts for COG A don't affect those for COG B. You could come up with an example where just one COG was actually different and the others were the same, however this could result in the observed counts for each COG being different do to using the total counts. This is actually why DESeq and the other packages oriented toward differential expression analysis of RNAseq experiments use different library size normalizations. I suspect that you might actually have to go more along those routes, but it's difficult to say without knowing more about exactly how these counts were created. You might just reply to Simon Ander's question in the other thread, since I suspect he had similar thoughts.

        Comment


        • #5
          Originally posted by dpryan View Post
          Actually, the more I think about it, the more I wonder if this is really the correct solution for you. Basically, using Fisher's exact test as is will work fine if the counts for COG A don't affect those for COG B. You could come up with an example where just one COG was actually different and the others were the same, however this could result in the observed counts for each COG being different do to using the total counts. This is actually why DESeq and the other packages oriented toward differential expression analysis of RNAseq experiments use different library size normalizations. I suspect that you might actually have to go more along those routes, but it's difficult to say without knowing more about exactly how these counts were created. You might just reply to Simon Ander's question in the other thread, since I suspect he had similar thoughts.
          Hi. Do you mean the COG B = COG category? I just give you an example. Actually, I will do all COG functions. It is a big data set. Thousands of functions. I use COG category here, instead.

          Comment


          • #6
            Originally posted by SDPA_Pet View Post
            Hi. Do you mean the COG B = COG category? I just give you an example. Actually, I will do all COG functions. It is a big data set. Thousands of functions. I use COG category here, instead.
            It doesn't really matter if these are categories or functions, my concern would hold either way.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Understanding Genetic Influence on Infectious Disease
              by seqadmin




              During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

              Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
              09-09-2024, 10:59 AM
            • seqadmin
              Addressing Off-Target Effects in CRISPR Technologies
              by seqadmin






              The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...
              08-27-2024, 04:44 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Today, 06:25 AM
            0 responses
            13 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 01:02 PM
            0 responses
            12 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 09-18-2024, 06:39 AM
            0 responses
            14 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 09-11-2024, 02:44 PM
            0 responses
            14 views
            0 likes
            Last Post seqadmin  
            Working...
            X