Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Find representative genes

    Hi, I still don't know how to use the find over representative genes via phyper (someone recommends this command) or other R package. I attached a csv file as an example. Can anyone write a R scripts for me with my dataset.

    In my dataset, the first row are my sample name. The first column is COG category ID. The numbers are gene counts.

    Thank you.
    Attached Files

  • #2
    Well, it makes sense to just use Fisher's test, which uses a hypergeometric distribution but more directly does what you apparently want to do. So something like:
    Code:
    d <- as.matrix(read.csv("help.csv", row.names=1))
    totals <- apply(d, 2, sum)
    pvals <- c(rep(NA, nrow(d)))
    for(i in seq(nrow(d))) {
        pvals[i] <- fisher.test(matrix(c(d[i,1], totals[1]-d[i,1], d[i,2], totals[2]-d[i,2]), nrow=2))$p.value
    }
    padj <- p.adjust(pvals)
    The adjusted p-values for OSP_8.100.Spring.Plain vs. CH_1.Crater.Hills are then in the padj vector (the order is the same as in your csv file). In R, loops are actually pretty slow, so you could use "apply" instead to make things faster if you have more data. For the other comparisons, just change the "pvals[i] <- ..." line appropriately.

    Comment


    • #3
      Background Gene-list annotations are critical for researchers to explore the complex relationships between genes and functionalities. Currently, the annotations of a gene list are usually summarized by a table or a barplot. As such, potentially biologically important complexities such as one gene belonging to multiple annotation categories are difficult to extract. We have devised explicit and efficient visualization methods that provide intuitive methods for interrogating the intrinsic connections between biological categories and genes. Findings We have constructed a data model and now present two novel methods in a Bioconductor package, "GeneAnswers", to simultaneously visualize genes, concepts (a.k.a. annotation categories), and concept-gene connections (a.k.a. annotations): the "Concept-and-Gene Network" and the "Concept-and-Gene Cross Tabulation". These methods have been tested and validated with microarray-derived gene lists. Conclusions These new visualization methods can effectively present annotations using Gene Ontology, Disease Ontology, or any other user-defined gene annotations that have been pre-associated with an organism's genome by human curation, automated pipelines, or a combination of the two. The gene-annotation data model and associated methods are available in the Bioconductor package called "GeneAnswers " described in this publication.

      Comment


      • #4
        Actually, the more I think about it, the more I wonder if this is really the correct solution for you. Basically, using Fisher's exact test as is will work fine if the counts for COG A don't affect those for COG B. You could come up with an example where just one COG was actually different and the others were the same, however this could result in the observed counts for each COG being different do to using the total counts. This is actually why DESeq and the other packages oriented toward differential expression analysis of RNAseq experiments use different library size normalizations. I suspect that you might actually have to go more along those routes, but it's difficult to say without knowing more about exactly how these counts were created. You might just reply to Simon Ander's question in the other thread, since I suspect he had similar thoughts.

        Comment


        • #5
          Originally posted by dpryan View Post
          Actually, the more I think about it, the more I wonder if this is really the correct solution for you. Basically, using Fisher's exact test as is will work fine if the counts for COG A don't affect those for COG B. You could come up with an example where just one COG was actually different and the others were the same, however this could result in the observed counts for each COG being different do to using the total counts. This is actually why DESeq and the other packages oriented toward differential expression analysis of RNAseq experiments use different library size normalizations. I suspect that you might actually have to go more along those routes, but it's difficult to say without knowing more about exactly how these counts were created. You might just reply to Simon Ander's question in the other thread, since I suspect he had similar thoughts.
          Hi. Do you mean the COG B = COG category? I just give you an example. Actually, I will do all COG functions. It is a big data set. Thousands of functions. I use COG category here, instead.

          Comment


          • #6
            Originally posted by SDPA_Pet View Post
            Hi. Do you mean the COG B = COG category? I just give you an example. Actually, I will do all COG functions. It is a big data set. Thousands of functions. I use COG category here, instead.
            It doesn't really matter if these are categories or functions, my concern would hold either way.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Recent Advances in Sequencing Analysis Tools
              by seqadmin


              The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
              Yesterday, 07:48 AM
            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 07:17 AM
            0 responses
            11 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-02-2024, 08:06 AM
            0 responses
            19 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-30-2024, 12:17 PM
            0 responses
            20 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-29-2024, 10:49 AM
            0 responses
            28 views
            0 likes
            Last Post seqadmin  
            Working...
            X