Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Unusually high FPKM for cufflinks

    Hi,

    I have been working with Cufflinks v. 1.1.0 with mRNA-Seq data from some of the old (2008) runs of Illumina with 36bp reads. The only options I specified were (-I 5000 and -b <refSeqFasta>) and there was no reference GFF specified. In the resulting transcripts.gtf, I'm getting unusually high FPKM values on the scale of tens to hundreds of thousands (eg: FPKM=83456.4, 5571.5, 1017907.8) for several thousand transcripts.

    Some previous posts had suggested short read length and reference FASTA as possible culprits. But, removing the -b option does not help. This is not a problem with the BAM format since SAM format also gives similar result. I tried the newer v. 1.3.0 and that too gives similar values. I'm not sure if short transcripts are being consistently inflated.

    Strangely, the older v. 0.9.3 is giving respectable FPKM values (455.4 for the transcript that had 83456.4 previously), which I'd like to trust since they match manually calculated values (not quite, but close).

    However, I wonder why the new versions of Cufflinks are inflating the FPKM values by several orders of magnitude? Has anyone found a solution to this problem? Can I still use the new versions without causing such FPKM inflation?

    Thanks
    Last edited by flobpf; 02-13-2012, 01:21 PM.

  • #2
    High FPKM for small transcripts

    It is indeed true that small Cufflinks transcripts tend to have significantly inflated FPKMs. Anyone else seeing this?

    Last edited by flobpf; 02-13-2012, 02:34 PM.

    Comment


    • #3
      Hi,

      Which mode did you use Cufflinks with? with a reference file, in RABT mode, or de-novo?

      Comment


      • #4
        Originally posted by Nicolas View Post
        Hi,

        Which mode did you use Cufflinks with? with a reference file, in RABT mode, or de-novo?
        Hi Nicholas,

        I used the ~RABT mode with single-end reads. The reads were first mapped to reference genome using TopHat and Cufflinks was run on accepted_hits.bam file. However, reference GTF was not specified.
        Last edited by flobpf; 02-15-2012, 08:08 AM. Reason: Not exactly RABT, not exactly denovo

        Comment


        • #5
          Yes, we see this as well (and other groups I have spoken to). It's pretty consistent from run to run.

          Comment


          • #6
            A note about small transcripts and high FPKM: The reason you're seeing this is that with a very small transcript, the fragments that map to it have to be short (at least as short as the transcript), and thus often come from the tail of the library's fragment length distribution. What I mean by this is that if you plot a histogram of the length of each library fragment, there's usually a mean around 200-250 bp (depending on the protocol, and excluding adapters). Most fragments aren't much larger or much smaller than that - i.e. the variance is very small. However, there are a small fraction of fragments that are super short (100bp or even smaller) or quite long (500-600bp). Because these are rare, Cufflinks reasons that for the small transcript to have generated them, it must be very very abundant. In fact, it probably generated many many more fragments, most of which didn't make it through all of the size selection steps during library construction. So we "upscale" the FPKM to account for this effect. You can read about this correction in the supplement of the Cufflinks paper. The reason for the change between 1.1.0 and 0.9.3 is that there were some problems in the actual implementation of the correction in 0.9.3, and we fixed them in later versions.

            While the correction (in our opinion) is good thing to do, the bottom line is that standard RNA-Seq is really not the right assay for measuring small RNA expression, because the very nature of the size selection introduces a lot of error and variability in the sampling of fragments from these species. I'm actually considering adding another status flag (similar to HIDATA, FAIL, etc) to warn users that their library is too large for reliable quantification of a particular transcript.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              The Impact of AI in Genomic Medicine
              by seqadmin



              Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
              02-26-2024, 02:07 PM
            • seqadmin
              Multiomics Techniques Advancing Disease Research
              by seqadmin


              New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

              A major leap in the field has
              ...
              02-08-2024, 06:33 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:12 AM
            0 responses
            19 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 02-23-2024, 04:11 PM
            0 responses
            67 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 02-21-2024, 08:52 AM
            0 responses
            73 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 02-20-2024, 08:57 AM
            0 responses
            65 views
            0 likes
            Last Post seqadmin  
            Working...
            X