Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Which R package can do this

    Hi, I have dataset of list name of genes and the gene hits. I would like to calculate over representation genes in R.

    I was wondering which R package can do this.

    Can anyone recommend some good R packages for analyzing and plotting metagenomics data (best for microbes).

    Thanks.

  • #2
    Bioconductor should do the trick http://www.bioconductor.org

    Comment


    • #3
      Originally posted by SDPA_Pet View Post
      Hi, I have dataset of list name of genes and the gene hits. I would like to calculate over representation genes in R.

      I was wondering which R package can do this.
      I think what you want is to apply the hypergeometric test which in R is implememted in the function phyper

      Code:
      phyper(q, m, n, k, ...)
      
      x, q 	vector of quantiles representing the number of white balls drawn without replacement from an urn which contains both black and white balls.
      m 	the number of white balls in the urn.
      n 	the number of black balls in the urn.
      k 	the number of balls drawn from the urn.
      ...

      Comment


      • #4
        Originally posted by JackieBadger View Post
        Bioconductor should do the trick http://www.bioconductor.org
        Hi, I checked bioconductor which includes lots of package. Can you tell me which one can help me find over representation genes.

        Also, which package can help me draw a heat map.

        Thank you.

        Comment


        • #5
          Originally posted by SDPA_Pet View Post
          Hi, I checked bioconductor which includes lots of package. Can you tell me which one can help me find over representation genes.

          Also, which package can help me draw a heat map.

          Thank you.
          For all those people who find it more convenient to bother you with their question rather than to Google it for themselves.


          However, you're probably better off with MEV

          Comment


          • #6
            Maybe, if you explained more about what kind of data you have, you might get more helpful responses. "Metagnomics data" could be anything from a buch of FASTQ files to a list of species.

            Comment


            • #7
              Hi Simon,

              Sorry about the confused. I have a table generated from metagenomic data. For each sample, I have two columns. One column the name of gene and the other the number of hits. Total, I have 10 samples. I would like to find out which genes are over representative.

              That is it.

              Comment


              • #8
                In that case, dariober's suggestion of the hypergeometric test is appropriate.

                Comment


                • #9
                  Hi Blahah, I am newbie. If I want to do hypergeometric test, which package I should use. Can you give me the R package name. That's all I want to know.

                  Someone just tells me use bio-conduct, but it includes hundreds of packages.

                  Comment


                  • #10
                    @SDPA_Pet read @dariober's post above... he tells you the R function is phyper. You don't need a package - it's in the R base installation.

                    Comment


                    • #11
                      OK, Thanks.

                      Comment


                      • #12
                        BTW, what is the functional level I should do the analysis. I can do COG function level (the lowest) or I can do COG categories (the highest).

                        If I do lowest, there will be thousands of functional genes.

                        Comment


                        • #13
                          Hi, I still don't know how to use the find over representative genes via phyper (someone recommends this command) or other R package. I attached a csv file as an example. Can anyone write a R scripts for me with my dataset.

                          In my dataset, the first row are my sample name. The first column is COG category ID. The numbers are gene counts.

                          Thank you.
                          Attached Files

                          Comment


                          • #14
                            Most people here seemed to have jumped to the conclusion that you want to do an enrichment test, and there, in fact, the hypergeometric test (also known as Fisher's exact test) is the customary thing to do, usually with the R function 'fisher.test', which internally calls 'phyper'.

                            I really don't see how this applies here. Please explain your setting again: By number of "hits" in your table, you mean the number of sequencing reads that mapped to this gene, right?

                            Now, what do you mean by "overrepresented"? Are you looking for genes which appear more often in one kind of samples than in the other? (E.g.: You have 5 samples from shallow water, 5 from deep water: Which genes differ in their abundance between these two types?)

                            What kind of samples are we talking about?

                            Comment


                            • #15
                              Hi Simon,

                              I am sorry I didn't explain it clearly.

                              The number of "hits" in your table, I mean the number of sequencing reads that mapped to this gene. ( you are right)
                              In the file that I attached, I am interested in the 2nd column (OSP_8 100 Spring Plain). I want to compare the 2nd column to the 3rd and 4th column.

                              "Over-representative": I want to find that which genes in the sample OSP_8 100 Spring Plain are more abundant (or different) than other 2 samples.

                              Do you know how to write the code?

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Essential Discoveries and Tools in Epitranscriptomics
                                by seqadmin




                                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                                04-22-2024, 07:01 AM
                              • seqadmin
                                Current Approaches to Protein Sequencing
                                by seqadmin


                                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                                04-04-2024, 04:25 PM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 04-25-2024, 11:49 AM
                              0 responses
                              19 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-24-2024, 08:47 AM
                              0 responses
                              18 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-11-2024, 12:08 PM
                              0 responses
                              62 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 04-10-2024, 10:19 PM
                              0 responses
                              60 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X