Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • How do I get one FPKM value per gene?

    I have been running Cufflink on a set of samples. I would like to compare the gene expression across samples. I am using the FPKM values as a measure of the gene abundance, but cuffcompare output provide more than one FPKM value per gene (for those genes that have isoforms). So, how do I go from 2+FPKM values per gene to one single value?

    Thanks!

  • #2
    I should have mentioned in my previous post, that I have tried to compare the FPKMS reported by Cufflinks in the *genes.expr files. I was wondering if cuffcompare is a better way to do that, and if so, how do I summarize the expression per gene rather than per transcript?
    Thanks

    Comment


    • #3
      You should run cuffdiff and look at the tracking files for genes. They contain the summed FPKM values of transcripts from the same gene.

      Comment


      • #4
        Thanks! I will do that.

        Just out of curiosity, why in the cufflinks output files *_genes.expr (which reports the gene-level coordinates and expression values), sometimes I get more than one row for the same gene? It's like in some cases (noncoding exons??) the FPKM values from the transcripts corresponding to the same gene do not get summed, although the transcripts are assigned to the same gene.

        Thanks in advance for your help.

        Comment


        • #5
          Originally posted by PFS View Post
          Thanks! I will do that.

          Just out of curiosity, why in the cufflinks output files *_genes.expr (which reports the gene-level coordinates and expression values), sometimes I get more than one row for the same gene? It's like in some cases (noncoding exons??) the FPKM values from the transcripts corresponding to the same gene do not get summed, although the transcripts are assigned to the same gene.

          Thanks in advance for your help.
          This is a known bug in Cufflinks and will be fixed in the next release.

          Comment


          • #6
            I have been running Cufffdiff on a set of samples using the newest available release (cuffdiff v0.8.3 (1332); 7/2/2010). The file genes.fpkm_tracking includes in some cases additional FPKM result columns as described by PFS.
            I have two questions about it.
            Is there a prospective release date for a bug-fixed cuffdiff version?
            Does it influence the subsequent differential expression/splicing analysis?
            Many thanks in advance,
            Kasimir

            Comment


            • #7
              I run into the same problem. I wonder if I could just add the two isoforms values.

              Originally posted by PFS View Post
              I have been running Cufflink on a set of samples. I would like to compare the gene expression across samples. I am using the FPKM values as a measure of the gene abundance, but cuffcompare output provide more than one FPKM value per gene (for those genes that have isoforms). So, how do I go from 2+FPKM values per gene to one single value?

              Thanks!

              Comment


              • #8
                update

                Is this supposed to have been fixed in cufflinks 0.8.3? Doesn't seem fixed to me... I'm still seeing multiple FPKMs a single gene in the _genes.expr files.

                Comment


                • #9
                  I have also been getting some duplicates when examining the genes.expr file. Aligned using tophat to hg19 and used -G option in cufflinks 0.9.2 with ensembl 59 gtf file.

                  Any ideas?

                  See some examples here:


                  gene_id bundle_id chr left right FPKM FPKM_conf_lo FPKM_conf_hi status

                  ENSG00000143198 33524 chr1 165600097 165631033 127.41 0 298.498 FAIL
                  ENSG00000143198 33524 chr1 165614897 165617907 0 0 0 OK
                  ENSG00000162105 36862 chr11 70313960 70963623 9.58183 0 170.285 FAIL
                  ENSG00000162105 36862 chr11 70753739 70754197 0 0 0 OK
                  ENSG00000162105 36862 chr11 70798845 70798972 0 0 0 OK
                  ENSG00000165899 38298 chr12 80633119 80648905 0 0 0 OK
                  ENSG00000165899 38299 chr12 80655759 80672003 0 0 0 OK
                  ENSG00000165899 38300 chr12 80707295 80726842 0 0 0 OK
                  ENSG00000165899 38301 chr12 80730291 80772870 0 0 0 OK
                  ENSG00000211890 40491 chr14 106050068 106058270 259.752 227.422 292.082 OK
                  ENSG00000211890 40491 chr14 106055295 106056387 0 0 0 OK
                  ENSG00000249751 54186 chr5 138784244 138784863 20.4268 11.3876 29.466 OK
                  ENSG00000249751 54187 chr5 138837129 138842328 22.1737 12.7559 31.5915 OK
                  ENSG00000131508 54192 chr5 138906015 139008018 35.9963 23.7319 48.2606 OK
                  ENSG00000131508 54192 chr5 138945438 138946512 0 0 0 OK

                  Comment


                  • #10
                    duplicate errors

                    jb2, I was facing duplicate errors too. In my case , later I run cufflinks without -G option , then that is fine. you may have a try.

                    Comment


                    • #11
                      I ended up writing a script to sum the FPKMS for a given gene id, which I think is right...

                      Here's my (unpolished) code (a perl script and a shell script).

                      This botches the confidence intervals, by the way.
                      Last edited by mgogol; 11-05-2010, 05:52 AM.

                      Comment


                      • #12
                        Originally posted by mgogol View Post
                        This botches the confidence intervals, by the way.
                        Yeah, that is what I was worried about, because I was considering taking those into account with my data. I will take a look at your script though since it saves me the time of writing my own.

                        Hopefully Cole or others can take a look at this and let us know what the problem might be.

                        Comment


                        • #13
                          Cufflinks

                          I was wondering if anyone knows what the status in genes.expr and transcripts.expr (output files of Cufflinks) means? I can't find the meaning in the manual. A possible meaning is "can be one of OK (test successful), NOTEST (not enough alignments for testing), or FAIL, when an ill-conditioned covariance matrix or other numerical exception prevents testing", but this is actually the description of "test status" which is a column in the Cuffdiff output files.


                          What shall I do with genes (or transcripts) whose status is FAIL? Shall I assume that their FPKM is 0 or take the FPKM of these genes regardless of their status?


                          Cufflinks v0.9.1b was used in my experiments, but the problem of getting multiple FPKM for some genes still exists. Running Cufflinks without a GTF file seems to solve this problem, but then I don't know how to link the FPKM to the corresponding Ensembl ID. If I provide a GTF file when running Cufflinks, I'll get multiple FPKM and FAIL status for some genes.


                          What shall I do with genes that have multiple FPKM? Shall I add the FPKM together or choose only the FPKM that matches the start and end position of these genes?


                          Thank you very much for your time.
                          Last edited by yjlui; 11-11-2010, 07:53 AM.

                          Comment


                          • #14
                            Does someone have a small example dataset that I can run this on to find the problem?

                            Comment


                            • #15
                              Thanks for the prompt reply, Adam! Just emailed you a small dataset built from my SAM file.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Non-Coding RNA Research and Technologies
                                by seqadmin




                                Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                                Nobel Prize for MicroRNA Discovery
                                This week,...
                                10-07-2024, 08:07 AM
                              • seqadmin
                                Recent Developments in Metagenomics
                                by seqadmin





                                Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                                09-23-2024, 06:35 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 10-02-2024, 04:51 AM
                              0 responses
                              104 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 10-01-2024, 07:10 AM
                              0 responses
                              112 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 09-30-2024, 08:33 AM
                              1 response
                              115 views
                              0 likes
                              Last Post EmiTom
                              by EmiTom
                               
                              Started by seqadmin, 09-26-2024, 12:57 PM
                              0 responses
                              21 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X