Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Rachel Hillmer
    Junior Member
    • Jun 2012
    • 8

    Why does quartile normalization inflate my FPKM values by ~4 orders of magnitude?

    Hello,

    When I run cufflinks with quartile normalization, the FPKM values it gives me are about 4 orders of magnitude higher than without quartile normalization.
    This makes absolutely no sense to me. Is anyone else having this problem?

    Also, there is a strange comment in the cufflinks manual:
    "If requested, Cufflinks/Cuffdiff will use the number of reads mapping to the upper-quartile locus in place of the "map mass" (M) when calculating FPKM."

    Shouldn't this be the number of reads NOT mapping to the upper quartile? My understanding is that bad behavior -- titrating out the bulk of the reads because of a few highly overrepresented sequences in one sample -- can be corrected for by IGNORING the upper quarttile.

    I'd love some answers.

    ~Rachel
  • john_nl
    Member
    • Feb 2012
    • 13

    #2
    Originally posted by Rachel Hillmer View Post
    Hello,

    Shouldn't this be the number of reads NOT mapping to the upper quartile? My understanding is that bad behavior -- titrating out the bulk of the reads because of a few highly overrepresented sequences in one sample -- can be corrected for by IGNORING the upper quarttile.

    ~Rachel
    Glad i'm not the only one who thinks this. I'm sure there is an explanation, but at the moment it does not seem intuitive to me.

    Comment

    • glados
      Member
      • Mar 2012
      • 59

      #3
      Wondering this as well.

      Comment

      • jk1124
        Member
        • Oct 2012
        • 17

        #4
        I am also confused by the explanation for upper quartile normalization provided by the Cufflinks page (i.e. adjusting for highly overexpressed genes), and would appreciate any insight on that, but the paper the authors reference (Bullard 2010 BMC Bioinformatics) makes more sense, I think.

        Basically, the upper quartile normalization gets rid of any long tail on the distribution of read counts which occurs due to the "preponderance of zero and low-count genes." So it seems, using this kind of normalization gets rid of any sequencing noise.

        It makes sense that an FPKM would be inflated with upper quartile normalization then, because you are basically dividing by a smaller denominator (upper quartile < total reads).

        Please let me know if this is a plausible reasoning, since I am new to this.

        Comment

        • HESmith
          Senior Member
          • Oct 2009
          • 512

          #5
          jk1124,

          Your reasoning is not flawed, but (unless I'm missing something) the only way to increase FPKM by four orders of magnitude would be if the upper quartile read count constitutes only 1/10000 of the total read count. That seems unlikely.

          Also, the distribution tail of the data would not include zero-count genes.

          Comment

          • Richard Barker
            Member
            • Apr 2012
            • 47

            #6
            So would you recommend that we/i normalize my data by the upper quartile of the number of fragments mapping to individual loci when running cufflinks? or should one just omit this option?

            Comment

            • john_nl
              Member
              • Feb 2012
              • 13

              #7
              The Upper Quartile normalisation method does just essentially use the count value at the 75th percentile as the denominator.

              Also, for people thinking about normalization methods I would recommend this article:

              A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis. (2012) Brief Bioinform

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Pathogen Surveillance with Advanced Genomic Tools
                by seqadmin




                The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
                03-24-2025, 11:48 AM
              • seqadmin
                New Genomics Tools and Methods Shared at AGBT 2025
                by seqadmin


                This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                The Headliner
                The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                03-03-2025, 01:39 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 03-20-2025, 05:03 AM
              0 responses
              49 views
              0 reactions
              Last Post seqadmin  
              Started by seqadmin, 03-19-2025, 07:27 AM
              0 responses
              57 views
              0 reactions
              Last Post seqadmin  
              Started by seqadmin, 03-18-2025, 12:50 PM
              0 responses
              50 views
              0 reactions
              Last Post seqadmin  
              Started by seqadmin, 03-03-2025, 01:15 PM
              0 responses
              201 views
              0 reactions
              Last Post seqadmin  
              Working...