Announcement

Collapse
No announcement yet.

Some cummeRbund questions

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Some cummeRbund questions

    I worked my way through the cummeRbund manual and so far I'm really impressed with it. However, I have a few questions as I'm new to R and I couldn't find the solution to these in the manual:

    1). After finding similarly expressed genes using:

    mySimilar<-findSimilar(cuff,"PINK1",n=20)

    how do I write the expression data (with statistics) for these genes to a file?

    2). How can I produce a heatmap of all differentially expressed genes?

    3). I have a list of gene IDs in a text file - how can I produce a heatmap of these?

  • #2
    I managed to create a heatmap for all differentially expressed genes (2 above) using a post by Loyal elsewhere in this forum:

    cuff <- readCufflinks()

    #Retrive significant gene IDs (XLOC) with a pre-specified alpha
    diffGeneIDs <- getSig(cuff,level="genes",alpha=0.05)

    #Use returned identifiers to create a CuffGeneSet object with all relevant info for given genes
    diffGenes<-getGenes(cuff,diffGeneIDs)

    h<-csHeatmap(diffGenes,cluster='both')

    However, I'm still struggling with writing similarly expressed genes to a text file and to import a list of gene IDs from a file. I would appreciate if someone could put me in the right direction. Thanks!

    Comment


    • #3
      Originally posted by rvaerle View Post
      I worked my way through the cummeRbund manual and so far I'm really impressed with it. However, I have a few questions as I'm new to R and I couldn't find the solution to these in the manual:

      1). After finding similarly expressed genes using:

      mySimilar<-findSimilar(cuff,"PINK1",n=20)

      how do I write the expression data (with statistics) for these genes to a file?

      2). How can I produce a heatmap of all differentially expressed genes?

      3). I have a list of gene IDs in a text file - how can I produce a heatmap of these?
      Hi rvaerle,
      Heres how you can do this with cummeRbund:

      1)
      Code:
      >write.table(diffData(mySimilar),"mySimilar.diff")
      >write.table(fpkm(mySimilar),"mySimilar.fpkm")
      >write.table(features(mySimilar),"mySimilar.features")
      2) It seems you got this one working already

      3)
      >
      Code:
      myIDs<-read.table("Ids.txt")
      myGenes<-getGenes(cuff,myIDs)
      heat<-csHeatmap(myGenes)
      Please let me know if you need any additional help!

      Cheers,
      Loyal

      Comment


      • #4
        Many thanks for this, Loyal! It looks so easy after seeing the code... Also, thanks for this great package!

        BW
        Ronny

        Comment


        • #5
          Hello, this thread has been extremely useful to me already (as another R/bioinformatics in general novice). I am also using the mySimilar function, and have found some interesting results. But I have the problem that it gives me mostly results that were deemed not significant by cuffdiff (in general this set of genes are very lowly expressed, and any patterns detected are likely to be noise, i guess) . If I increase the number of loci to return, in addition to more of these noisy genes, I also get more genes that might be "real" hits.
          So my question is if there is any way to exclude the non-significant genes in the first place? Or can anyone suggest an efficient way to sort through the noise?
          Thanks,
          Jeremy

          Comment


          • #6
            Originally posted by waspboyz View Post
            Hello, this thread has been extremely useful to me already (as another R/bioinformatics in general novice). I am also using the mySimilar function, and have found some interesting results. But I have the problem that it gives me mostly results that were deemed not significant by cuffdiff (in general this set of genes are very lowly expressed, and any patterns detected are likely to be noise, i guess) . If I increase the number of loci to return, in addition to more of these noisy genes, I also get more genes that might be "real" hits.
            So my question is if there is any way to exclude the non-significant genes in the first place? Or can anyone suggest an efficient way to sort through the noise?
            Thanks,
            Jeremy
            Hi waspboyz,
            FindSimilar and getSig, are essentially asking two very different questions of the data. Since different restrictions and filters can be applied to both questions, the best way to find the set of genes that you are interested in my be to generate a list of significant genes, generate a second list of similar genes, and then use intersect() to find the list of genes common to both lists.

            Cheers,
            Loyal

            Comment


            • #7
              error in cummeRbund

              Hi, I found the post is very helpful. I am struggling with cummeRbund now. I tried some codes listed here, and am confronted with some errors.

              > cuff_data <- readCufflinks('diff_out')
              > csDensity(genes(cuff_data))
              Error in dat$fpkm + pseudocount : non-numeric argument to binary operator

              > diffGeneIDs <- getSig(cuff_data, level="genes", alpha=0.05)
              > diffGenes <- getGenes(cuff_data, diffGeneIDs)
              Error in sqliteExecStatement(conn, statement, ...) :
              RS-DBI driver: (RS_SQLite_exec: could not execute1: cannot start a transaction within a transaction)

              > sessionInfo()
              R version 2.15.0 (2012-03-30)
              Platform: x86_64-apple-darwin9.8.0/x86_64 (64-bit)

              locale:
              [1] C

              attached base packages:
              [1] stats graphics grDevices utils datasets methods base

              other attached packages:
              [1] cummeRbund_1.2.0 reshape2_1.2.1 ggplot2_0.9.1 RSQLite_0.11.1
              [5] DBI_0.2-5 BiocInstaller_1.4.6

              loaded via a namespace (and not attached):
              [1] MASS_7.3-18 RColorBrewer_1.0-5 colorspace_1.1-1 dichromat_1.2-4
              [5] digest_0.5.2 grid_2.15.0 labeling_0.1 memoise_0.1
              [9] munsell_0.3 plyr_1.7.1 proto_0.3-9.2 scales_0.2.1
              [13] stringr_0.6 tools_2.15.0

              Do someone has any idea how to get rid of the error?
              Thanks
              Li

              Comment


              • #8
                Hi all,

                I am getting the same error as Li when trying to run csDensity with cummeRbund. Any suggestions?

                > epi <- readCufflinks( )
                > epi
                CuffSet instance with:
                4 samples
                25078 genes
                42661 isoforms
                30261 TSS
                24612 CDS
                150468 promoters
                181566 splicing
                121338 relCDS
                dens <- csDensity(genes(epi))
                Error in dat$fpkm + pseudocount : non-numeric argument to binary operator

                Thanks so much

                Comment


                • #9
                  Version info...

                  Hi Wangli and RULearner,
                  Can you provide me with version information for both cuffdiff and cummeRbund and I can try to help. Also, if either of you wants to email me your cuffData.db files so that I can reproduce the error, I can try to see what's going on.

                  Cheers,
                  Loyal

                  Comment


                  • #10
                    Reading in a table of Gene names for cummeRbund analysis

                    I want examine the CuffDiff/cummeRbund analyses for a list of genes that other RNAseq software has flagged as significant for my data.

                    I made both plain text and csv tables of the gene names
                    say,

                    Fcrls
                    Ndst4
                    Prokr2
                    Grm2
                    Snap25

                    Then I use read.table to enter the names to R/cummeRbund as suggested by Loyal

                    >
                    Code:
                    myIDs<-read.table("Ids.txt")
                    myGenes<-getGenes(cuff,myIDs)
                    heat<-csHeatmap(myGenes)
                    The heat map is never produced and there are no errors.

                    When I then attempt to examine the CuffGeneSet "myGenes"
                    it contains only a single gene (the first one in the list)

                    myGeneFeatureNames<-featureNames(myGenes)

                    the object gives only the XLOC and gene_short_name for the first entry of the list

                    if I

                    print(myIDs)

                    I see every entry. So read.table works ( so does read.csv)

                    I can however use this syntax

                    myGeneID<-myIDs[[i,1]]
                    myGene<-getGene(myGeneID)

                    to get every gene one at a time and I can make isoform plots for example.

                    What more can I do to get a list of Gene names entered as a CuffGeneSet
                    ?

                    Comment


                    • #11
                      Hi all,

                      I am having the same problem as described in the last pot: I can see all my genes doing:

                      print(myIDs)

                      Then if I produce the heatmap I have the heatmap only of the first gene.

                      Did anyone of you find a solution already? Thanks!

                      Federica

                      Comment


                      • #12
                        Hi Starr and fed,

                        MyIDs must be a vector of gene ids. If you are reading these in with read.table, than myIDs is most likely a data.frame. The following can confirm this:

                        is.vector(myIDs)

                        If this is FALSE, then getGenes will not read the list of identifiers correctly, and will most likely only use the first gene. You can probably do something like the following:

                        myIDs<-as.vector(myIDs$v1)

                        To coerce the first column of the data.frame to a vector.

                        Cheers,
                        Loyal

                        Comment


                        • #13
                          Hi Loyal,

                          here what I got:

                          > is.vector(myIDs)
                          [1] FALSE
                          > myIDs<-as.vector(myIDs$v1)

                          But it isn't still a vector. Am I doing something wrong?

                          > is.vector(myIDs)
                          [1] FALSE

                          Fed

                          Comment


                          • #14
                            Actually when I do it, myIDs becomes 'empty' (NULL), and the lists disappears:

                            > myIDs<-as.vector(myIDs$v1)
                            > is.vector(myIDs)
                            [1] FALSE
                            > myIDs
                            NULL

                            Fed

                            Comment


                            • #15
                              Hi,

                              trying to get the heatmap of the significant genes:

                              > diffGeneIDs <- getSig(cuff_data,level="genes",alpha=0.0001)
                              > diffGenes<-getGenes(cuff_data,diffGeneIDs)
                              > h<-csHeatmap(diffGenes,cluster='both')
                              > h

                              I have back a heatmap where the gene name is always preceded by NA|, but my gene.diff doesn't have any NA field.

                              Fed

                              Comment

                              Working...
                              X