1. Actually, libraries with higher size factors are given more weight in the test (i.e., when calculating the p value). This is because we compare the sums of the

2. I don't think that this is the cause of the artifacts you see. However, as you are worried about the size factors, you may want to double check that they are good estimates. Make an MA plot, i.e. a log-log plot of means versus ratios of the normalized counts between all pairs of samples and check whether the bulk of the genes is centered around zero log fold change.

3. Controlling false discovery rate at 0.01 sounds extremely stringent to me. Remember that controlling FDR at x% means that your hit list can be expected to have at most x% false positives. It is common to cut adjusted p values at 5% or 10% because this is quite a reasonable FDR that one can usually well live with.

*unnormalized*counts with what one should expect according to the size factors. (For details, please see the fine print in our paper.) For the fold change, we simply calculate the ration of the averages of the*normalized*counts, which is straight-forward. However, you may have a point that it would be more consistent to weigh the sum according to size factors or fitted variances.2. I don't think that this is the cause of the artifacts you see. However, as you are worried about the size factors, you may want to double check that they are good estimates. Make an MA plot, i.e. a log-log plot of means versus ratios of the normalized counts between all pairs of samples and check whether the bulk of the genes is centered around zero log fold change.

3. Controlling false discovery rate at 0.01 sounds extremely stringent to me. Remember that controlling FDR at x% means that your hit list can be expected to have at most x% false positives. It is common to cut adjusted p values at 5% or 10% because this is quite a reasonable FDR that one can usually well live with.

## Comment