Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Cufflinks/Cuffdiff different FPKM values for multiple genes at one location

    I am doing my first RNA-seq for Drosophila melanogaster (I normally deal with human or mouse data). It turns out there are a lot of fly genes that have identical coordinates as other genes. In other words, the same location has multiple genes assigned to it.

    If there are multiple genes that have the same exact coordinates, they should have the same FPKM values. However, running Cuffdiff using a GTF like that does not yield the same values for all genes.

    Is there a way to force Cuffdiff to assign the same values to all overlapping genes? I could not find any arguments that may do that. Is there a proper way of dealing with such situations? Do I need to optimize the GTF file? I use the one from iGenomes, which is endorsed by Cufflinks, so it seems like it should be fine.

  • #2
    Hi,

    The FPKM values for the genes are a sum of the FPKM values found for transcripts of that gene... so I guess you could expect differing values come from the difference in transcripts and the reads/fragments covering them.

    To test I would look at the transcripts FPKM values for each gene that have the same loci.

    Comment


    • #3
      Originally posted by davidblaney View Post
      The FPKM values for the genes are a sum of the FPKM values found for transcripts of that gene... so I guess you could expect differing values come from the difference in transcripts and the reads/fragments covering them.

      To test I would look at the transcripts FPKM values for each gene that have the same loci.
      The FPKM values for multiple transcripts with identical coordinates are very different. Additonally, they can even be called significantly different between two samples and in different directions.

      Comment


      • #4
        I stopped using cufflinks/cuffdiff 3 months ago as the latest version was producing implausible results. I would recommend using tophat2 + htseq-count + edgeR (or DESeq). I based my workflow on this nice tutorial: http://www-huber.embl.de/pub/pdf/nprot.2013.099.pdf

        Comment


        • #5
          I agree with feralBiologist, but would switch to featureCounts instead of HTSeq-count for performance reasons (can be run multithreaded and does not require resorted SAM file).

          Comment


          • #6
            Originally posted by rboettcher View Post
            I agree with feralBiologist, but would switch to featureCounts instead of HTSeq-count for performance reasons (can be run multithreaded and does not require resorted SAM file).
            Thanks for this suggestion. Have you checked whether HTSeq-count and featureCounts produce the same results? HTSeq-count supports certain counting models. Basically, the main issue is how to count reads that hit overlapping genes. I chose HTSeq-count because it is written by the author of DESeq and in the above paper has been "approved" also by the authors of edgeR. If featureCounts produces the same results than the switch would be painless but if there are differences than you need to look into the details.

            EDIT: I realised that featureCounts is written by the authors of edgeR so it shall be straight-forward to substitute HTSeq-count. Thanks again to rboettcher.
            Last edited by feralBiologist; 11-08-2013, 01:56 AM.

            Comment


            • #7
              Originally posted by feralBiologist View Post
              Thanks for this suggestion. Have you checked whether HTSeq-count and featureCounts produce the same results? HTSeq-count supports certain counting models. Basically, the main issue is how to count reads that hit overlapping genes. I chose HTSeq-count because it is written by the author of DESeq and in the above paper has been "approved" also by the authors of edgeR. If featureCounts produces the same results than the switch would be painless but if there are differences than you need to look into the details.

              EDIT: I realised that featureCounts is written by the authors of edgeR so it shall be straight-forward to substitute HTSeq-count. Thanks again to rboettcher.
              I suggest to have a look a their manuscript on arXiv where the authors made such as comparison. From my experience both tools produce similar results, see http://arxiv.org/abs/1305.3347

              EDIT: another nice feature is that fC outputs gene length, so computation of RPKM is straight forward.

              Comment


              • #8
                I am going to try these new methods out, thanks.

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Advanced Tools Transforming the Field of Cytogenomics
                  by seqadmin


                  At the intersection of cytogenetics and genomics lies the exciting field of cytogenomics. It focuses on studying chromosomes at a molecular scale, involving techniques that analyze either the whole genome or particular DNA sequences to examine variations in structure and behavior at the chromosomal or subchromosomal level. By integrating cytogenetic techniques with genomic analysis, researchers can effectively investigate chromosomal abnormalities related to diseases, particularly...
                  09-26-2023, 06:26 AM
                • seqadmin
                  How RNA-Seq is Transforming Cancer Studies
                  by seqadmin



                  Cancer research has been transformed through numerous molecular techniques, with RNA sequencing (RNA-seq) playing a crucial role in understanding the complexity of the disease. Maša Ivin, Ph.D., Scientific Writer at Lexogen, and Yvonne Goepel Ph.D., Product Manager at Lexogen, remarked that “The high-throughput nature of RNA-seq allows for rapid profiling and deep exploration of the transcriptome.” They emphasized its indispensable role in cancer research, aiding in biomarker...
                  09-07-2023, 11:15 PM
                • seqadmin
                  Methods for Investigating the Transcriptome
                  by seqadmin




                  Ribonucleic acid (RNA) represents a range of diverse molecules that play a crucial role in many cellular processes. From serving as a protein template to regulating genes, the complex processes involving RNA make it a focal point of study for many scientists. This article will spotlight various methods scientists have developed to investigate different RNA subtypes and the broader transcriptome.

                  Whole Transcriptome RNA-seq
                  Whole transcriptome sequencing...
                  08-31-2023, 11:07 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, Yesterday, 06:57 AM
                0 responses
                9 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 09-26-2023, 07:53 AM
                0 responses
                8 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 09-25-2023, 07:42 AM
                0 responses
                14 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 09-22-2023, 09:05 AM
                0 responses
                44 views
                0 likes
                Last Post seqadmin  
                Working...
                X