Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • int11ap1
    Member
    • Jan 2014
    • 16

    t-test FPKM values

    I have two sets of genes, and I'd like to have a boxplot and do a t-test in order to know if they have significantly different expressions or not.

    However, my t-test p-value changes when using log10(FPKM+1) values or just FPKM values. Why? What should I choose?

    Thanks.
  • ffinkernagel
    Senior Member
    • Oct 2009
    • 110

    #2
    A t-test is dependend on the effect size - and that obviously changes if you do log2.
    The general rule is to test on the data you measure - in this case, this would be the un-logged reads per million.

    Either way: You should not be testing on the FPKM values, in summary because you loose the information about the no of reads actually behind the value -> more reads -> a better estimate.

    Consider using a testing method specifically for RNAseq data such as DESeq.

    Comment

    • jwfoley
      Senior Member
      • Jun 2009
      • 183

      #3
      FPKM is just an intuitive transformation of fragment counts and is not suitable to be used in statistics.

      Fortunately, the software package that probably gave you the FPKM values, Cufflinks, also includes a program called cuffdiff that will do the test you want to do in a statistically rigorous way based on modeling the actual fragment counts. Use that instead; don't try to do use statistical tests that are unsuited for your data type on data that are unsuited for statistics.

      Comment

      • int11ap1
        Member
        • Jan 2014
        • 16

        #4
        I do not need specific RNA-seq normalization here for what I want. Both sets of genes (actually I have transcripts) come from the same RNA-seq dataset (the same fasta). One dataset is made up of coding transcripts and the second one is made up of putative lncRNAs. I just wanna know which set or group of transcripts is more expressed.

        What is your final conclusion¿
        Last edited by int11ap1; 07-17-2014, 11:14 AM.

        Comment

        • jwfoley
          Senior Member
          • Jun 2009
          • 183

          #5
          My final conclusion is the same as before: you should use a valid hypothesis test on the count data, like cuffdiff, DESeq2, or edgeR, all of which are quite rigorous, commonly used, and well documented. Do not use an invalid hypothesis test on FPKMs. FPKM is a crude normalization and cannot be used in a meaningful statistical test. Asking us again is not going to change the way numbers work.

          Comment

          • int11ap1
            Member
            • Jan 2014
            • 16

            #6
            But those methods that you say (edgeR and DESeq) are for normalization between different samples or RNA-seq datasets...

            Comment

            • jwfoley
              Senior Member
              • Jun 2009
              • 183

              #7
              No, you have it backwards: those methods are all for statistical hypothesis testing, and FPKM is a (crude, statistically inappropriate) normalization for comparing different samples.

              Comment

              • int11ap1
                Member
                • Jan 2014
                • 16

                #8
                I do not follow you, sorry for asking again.

                For example, I have 1000 FPKM values (from 1 RNA-seq sample) from 1000 transcripts. If I want to compare first 500 with second 500 transcripts (for seeing which set is more expressed), I need to use edgeR or DESseq¿ For what¿

                Comment

                • jwfoley
                  Senior Member
                  • Jun 2009
                  • 183

                  #9
                  Ah, I see: you're comparing some genes with other genes in the same experiment, not same gene different experiment.

                  You can use FPKM values for this if you use a distribution-free test like Mann-Whitney-Wilcoxon, but that won't be very powerful. Otherwise you could use a more effective normalization like the variance-stabilizing transformation or regularized log in DESeq2 and then use a regular t-test.

                  Comment

                  • int11ap1
                    Member
                    • Jan 2014
                    • 16

                    #10
                    Here you are, thanks¡
                    Why do not apply directly the t-test¿ Where can I learn about it¿

                    Comment

                    • jwfoley
                      Senior Member
                      • Jun 2009
                      • 183

                      #11
                      The t-test assumes the populations are normally distributed. FPKMs are not. http://en.wikipedia.org/wiki/Student's_t-test

                      A log transformation may seem to help but it is still inappropriate because it fails to account for the heteroskedastic mean-variance dependency of read counts. DOI: 10.1111/j.2041-210X.2010.00021.x

                      Comment

                      • int11ap1
                        Member
                        • Jan 2014
                        • 16

                        #12
                        But the arithmetic mean of my FPKM values will be normally distributed according to the central limit theorem. In large samples such as mine, t.test for skewed distributions should be fine: http://stats.stackexchange.com/quest...ormal-when-n50

                        Comment

                        • jwfoley
                          Senior Member
                          • Jun 2009
                          • 183

                          #13
                          Okay, you could do a normality test to verify that the t-test assumptions are met, but it would be more straightforward and rigorous to just use a better normalization.

                          Comment

                          Latest Articles

                          Collapse

                          ad_right_rmr

                          Collapse

                          News

                          Collapse

                          Topics Statistics Last Post
                          Started by SEQadmin2, Today, 10:09 AM
                          0 responses
                          9 views
                          0 reactions
                          Last Post SEQadmin2  
                          Started by SEQadmin2, Yesterday, 08:59 AM
                          0 responses
                          16 views
                          0 reactions
                          Last Post SEQadmin2  
                          Started by SEQadmin2, 06-02-2026, 12:03 PM
                          0 responses
                          24 views
                          0 reactions
                          Last Post SEQadmin2  
                          Started by SEQadmin2, 06-02-2026, 11:40 AM
                          0 responses
                          21 views
                          0 reactions
                          Last Post SEQadmin2  
                          Working...