Hello;
I am working with an RNA-Seq dataset with samples that have varying numbers of reads and am wondering how that will affect differential expression, and what is a generally acceptable difference between samples.
Four groups of samples were barcoded and run on a single flow cell. Within the entire dataset the difference between the largest and smallest sample read count is about 5 fold (and size factors ranging from 0.34 - 3.2). Within each group the number of reads is similar (for the most part) but differences exist between groups that we intend to compare. I'll use our first comparison as an example: For Group1 the samples have ~ 4 million reads per sample, where Group 2 has >7 million reads per sample. The total number of genes detected between the two groups is also different.
To assess expression changes I used DESeq2, but am wondering whether normalizing with size factor accounts is enough to account for this? Suppose GeneA was not detected in Group1 as a consequence of the small number of reads, but is lowly expressed in Group2. This gene would be identified as DE although we don't know if that is necessarily the case.
I am working with an RNA-Seq dataset with samples that have varying numbers of reads and am wondering how that will affect differential expression, and what is a generally acceptable difference between samples.
Four groups of samples were barcoded and run on a single flow cell. Within the entire dataset the difference between the largest and smallest sample read count is about 5 fold (and size factors ranging from 0.34 - 3.2). Within each group the number of reads is similar (for the most part) but differences exist between groups that we intend to compare. I'll use our first comparison as an example: For Group1 the samples have ~ 4 million reads per sample, where Group 2 has >7 million reads per sample. The total number of genes detected between the two groups is also different.
To assess expression changes I used DESeq2, but am wondering whether normalizing with size factor accounts is enough to account for this? Suppose GeneA was not detected in Group1 as a consequence of the small number of reads, but is lowly expressed in Group2. This gene would be identified as DE although we don't know if that is necessarily the case.
Comment