I think that CuffDiff (from cufflinks ) might have a normalization issue!
I'm using the Tophat/cufflinks/cuffdiff pipeline to analyze RNA-seq data from a WT and knockout (KO) mouse. Both done in dublicates.
But when I make the scatterplot of WT vs KO using the FPKM values from Cuffdiff there is a general shift towards WT being more expressed than KO as seen here:
I've plotted each replicates combination separately to be consistent with the plots below. The red line is the 1:1 ratio (where the bulk of the values are expected to be) and the blue line is the dy/dx mean created with geom_smooth() from ggplot2 (Generalized additive models with integrated smoothness estimation).
This pattern probably originates from the raw read counts as the shift is also seen in the raw data, as showed here:
Even though the shift is not as pronounced in the raw data.
I think it is a normalization problem because when I take the raw read counts and normalize them using EdgeR ( calcNormFactors() ) the shift is much smaller, as seen here:
I'm using Cufflinks v 2.0.2 and the iGenome as refrences.
I get these results regardless of how I run Cuffdiff. I've tried using the following parameter combination:
only mandatory parameters
-N
-N -M myGTF
-N -M myGTF -b
-N -M myGTF -b -u
And I always get the same results. Furthermore I have had a similar problem with another mouse data-set created by a completely different experimental protocol.
Does anyone have good suggestions of how to make CuffDiff work?
I'm using the Tophat/cufflinks/cuffdiff pipeline to analyze RNA-seq data from a WT and knockout (KO) mouse. Both done in dublicates.
But when I make the scatterplot of WT vs KO using the FPKM values from Cuffdiff there is a general shift towards WT being more expressed than KO as seen here:
I've plotted each replicates combination separately to be consistent with the plots below. The red line is the 1:1 ratio (where the bulk of the values are expected to be) and the blue line is the dy/dx mean created with geom_smooth() from ggplot2 (Generalized additive models with integrated smoothness estimation).
This pattern probably originates from the raw read counts as the shift is also seen in the raw data, as showed here:
Even though the shift is not as pronounced in the raw data.
I think it is a normalization problem because when I take the raw read counts and normalize them using EdgeR ( calcNormFactors() ) the shift is much smaller, as seen here:
I'm using Cufflinks v 2.0.2 and the iGenome as refrences.
I get these results regardless of how I run Cuffdiff. I've tried using the following parameter combination:
only mandatory parameters
-N
-N -M myGTF
-N -M myGTF -b
-N -M myGTF -b -u
And I always get the same results. Furthermore I have had a similar problem with another mouse data-set created by a completely different experimental protocol.
Does anyone have good suggestions of how to make CuffDiff work?