I conducted targeted-sequencing experiments on a variant library and I want to test for differential relative abundance of variants between different nucleic acid inputs (e.g. total RNA versus gDNA, gDNA from FACS-sorted populations vs input).
I noted a correlation between the log FC and mean abundance using limma-trend with empirical Bayes moderation and BH correction. The input data consist of log frequencies for each variant (variant count/wild-type count). See a particularly strong case of this between a sorted population and input population in the image.
I am wondering if such strong biases are tolerated by limma. The results clearly show that variants called significant at a FDR < 0.01 may be close to the expectation of a linear model fit.
I know quantile normalization is commonly used for this, but I do not want to mask any true global biological variation.
I noted a correlation between the log FC and mean abundance using limma-trend with empirical Bayes moderation and BH correction. The input data consist of log frequencies for each variant (variant count/wild-type count). See a particularly strong case of this between a sorted population and input population in the image.
I am wondering if such strong biases are tolerated by limma. The results clearly show that variants called significant at a FDR < 0.01 may be close to the expectation of a linear model fit.
I know quantile normalization is commonly used for this, but I do not want to mask any true global biological variation.