Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • SDPA_Pet
    Senior Member
    • Apr 2013
    • 222

    Which R package can do this

    Hi, I have dataset of list name of genes and the gene hits. I would like to calculate over representation genes in R.

    I was wondering which R package can do this.

    Can anyone recommend some good R packages for analyzing and plotting metagenomics data (best for microbes).

    Thanks.
  • JackieBadger
    Senior Member
    • Mar 2009
    • 385

    #2
    Bioconductor should do the trick http://www.bioconductor.org

    Comment

    • dariober
      Senior Member
      • May 2010
      • 311

      #3
      Originally posted by SDPA_Pet View Post
      Hi, I have dataset of list name of genes and the gene hits. I would like to calculate over representation genes in R.

      I was wondering which R package can do this.
      I think what you want is to apply the hypergeometric test which in R is implememted in the function phyper

      Code:
      phyper(q, m, n, k, ...)
      
      x, q 	vector of quantiles representing the number of white balls drawn without replacement from an urn which contains both black and white balls.
      m 	the number of white balls in the urn.
      n 	the number of black balls in the urn.
      k 	the number of balls drawn from the urn.
      ...

      Comment

      • SDPA_Pet
        Senior Member
        • Apr 2013
        • 222

        #4
        Originally posted by JackieBadger View Post
        Bioconductor should do the trick http://www.bioconductor.org
        Hi, I checked bioconductor which includes lots of package. Can you tell me which one can help me find over representation genes.

        Also, which package can help me draw a heat map.

        Thank you.

        Comment

        • mikep
          Member
          • Feb 2011
          • 45

          #5
          Originally posted by SDPA_Pet View Post
          Hi, I checked bioconductor which includes lots of package. Can you tell me which one can help me find over representation genes.

          Also, which package can help me draw a heat map.

          Thank you.
          For all those people who find it more convenient to bother you with their question rather than to Google it for themselves.


          However, you're probably better off with MEV

          Comment

          • Simon Anders
            Senior Member
            • Feb 2010
            • 995

            #6
            Maybe, if you explained more about what kind of data you have, you might get more helpful responses. "Metagnomics data" could be anything from a buch of FASTQ files to a list of species.

            Comment

            • SDPA_Pet
              Senior Member
              • Apr 2013
              • 222

              #7
              Hi Simon,

              Sorry about the confused. I have a table generated from metagenomic data. For each sample, I have two columns. One column the name of gene and the other the number of hits. Total, I have 10 samples. I would like to find out which genes are over representative.

              That is it.

              Comment

              • Blahah404
                Member
                • Dec 2011
                • 48

                #8
                In that case, dariober's suggestion of the hypergeometric test is appropriate.

                Comment

                • SDPA_Pet
                  Senior Member
                  • Apr 2013
                  • 222

                  #9
                  Hi Blahah, I am newbie. If I want to do hypergeometric test, which package I should use. Can you give me the R package name. That's all I want to know.

                  Someone just tells me use bio-conduct, but it includes hundreds of packages.

                  Comment

                  • Blahah404
                    Member
                    • Dec 2011
                    • 48

                    #10
                    @SDPA_Pet read @dariober's post above... he tells you the R function is phyper. You don't need a package - it's in the R base installation.

                    Comment

                    • SDPA_Pet
                      Senior Member
                      • Apr 2013
                      • 222

                      #11
                      OK, Thanks.

                      Comment

                      • SDPA_Pet
                        Senior Member
                        • Apr 2013
                        • 222

                        #12
                        BTW, what is the functional level I should do the analysis. I can do COG function level (the lowest) or I can do COG categories (the highest).

                        If I do lowest, there will be thousands of functional genes.

                        Comment

                        • SDPA_Pet
                          Senior Member
                          • Apr 2013
                          • 222

                          #13
                          Hi, I still don't know how to use the find over representative genes via phyper (someone recommends this command) or other R package. I attached a csv file as an example. Can anyone write a R scripts for me with my dataset.

                          In my dataset, the first row are my sample name. The first column is COG category ID. The numbers are gene counts.

                          Thank you.
                          Attached Files

                          Comment

                          • Simon Anders
                            Senior Member
                            • Feb 2010
                            • 995

                            #14
                            Most people here seemed to have jumped to the conclusion that you want to do an enrichment test, and there, in fact, the hypergeometric test (also known as Fisher's exact test) is the customary thing to do, usually with the R function 'fisher.test', which internally calls 'phyper'.

                            I really don't see how this applies here. Please explain your setting again: By number of "hits" in your table, you mean the number of sequencing reads that mapped to this gene, right?

                            Now, what do you mean by "overrepresented"? Are you looking for genes which appear more often in one kind of samples than in the other? (E.g.: You have 5 samples from shallow water, 5 from deep water: Which genes differ in their abundance between these two types?)

                            What kind of samples are we talking about?

                            Comment

                            • SDPA_Pet
                              Senior Member
                              • Apr 2013
                              • 222

                              #15
                              Hi Simon,

                              I am sorry I didn't explain it clearly.

                              The number of "hits" in your table, I mean the number of sequencing reads that mapped to this gene. ( you are right)
                              In the file that I attached, I am interested in the 2nd column (OSP_8 100 Spring Plain). I want to compare the 2nd column to the 3rd and 4th column.

                              "Over-representative": I want to find that which genes in the sample OSP_8 100 Spring Plain are more abundant (or different) than other 2 samples.

                              Do you know how to write the code?

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


                                Here are nine questions we think about, in roughly the order they matter, before...
                                Yesterday, 07:11 AM
                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 06-17-2026, 06:09 AM
                              0 responses
                              20 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-09-2026, 11:58 AM
                              0 responses
                              38 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-05-2026, 10:09 AM
                              0 responses
                              44 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-04-2026, 08:59 AM
                              0 responses
                              49 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...