"Outliers in expression data are usually harder to deal with. The accepted remedy by the GTEx consortium is the transformation of the measurements for each gene into normally distributed while preserving relative rankings. The target distribution may be the standard normal distribution or the normal distribution the mean and spread of the original measurements. Here is the code for such transformation:"
for( sl in 1:length(gene) ) {
mat = gene[[sl]];
mat = t(apply(mat, 1, rank, ties.method = "average"));
mat = qnorm(mat / (ncol(gene)+1));
gene[[sl]] = mat;
}
rm(sl, mat);
I used my normalized DESeq count data as input, then used the program to transform each gene to a normal distribution of expression. Comparing before and after transformation for a few genes, they certainly look normal.

The program claims to have been used successfully to identify eQTLs in RNAseq data. Whether or not my using this approach for eQTLs turns out to be biologically relevant, informative, or correct is yet to be determined. Has anyone tried other transformations of RNAseq for eQTL analysis?
Leave a comment: