I'm using DESeq to identify differentially expressed genes in a next-gen sequencing dataset. (DESeq: http://www.ncbi.nlm.nih.gov/pmc/articles/PMC3218662/). In my experiment, the normalization of read counts as implemented by DESeq may not perform as well as anticipated. DESeq uses a 'size factor' to achieve a common scale of count values across samples. The size factor for a given library is defined as the median of the ratios of observed counts to the geometric mean of each corresponding target over all samples.
For my dataset, upper-quartile scaling (using the 75th percentile of data, which often has low read counts, for linear scaling) may improve performance. In my specific case it may even be possible to define a set of genes as an internal scaling standard.
In order for the statistical test for differential gene expression to work correctly, am I allowed to use an alternative method for data normalization?
For my dataset, upper-quartile scaling (using the 75th percentile of data, which often has low read counts, for linear scaling) may improve performance. In my specific case it may even be possible to define a set of genes as an internal scaling standard.
In order for the statistical test for differential gene expression to work correctly, am I allowed to use an alternative method for data normalization?
Comment