Hi all,
I'm getting stuck on something that seems like it should be simple, hoping someone can help me find my way out.
I've produced transcriptomes for 10 genotypes across 3 environments (mostly 2 biological reps per G+E combination, a few with more reps). I've used Tophat2 to map reads, ht-seq to generate counts, and run DESeq2 with these contrasts: design=~Batch+Genotype+Env+Genotype:Env.
Standard analyses are working fine, but I'm getting hung up on wanting to calculate per gene variance in expression level among my genotypes in each environment. I'd prefer to work with DESeq2 output over straight counts to appropriately control for technical variability in estimating expression levels, but I'm puzzled about how to extract this information from the results dataframe. Specifically, if I use a design matrix including the interaction term and then extract the contrast specifying a particular genotype and environment (results(dds_GxE, name="Genotype1.Env1")), I believe this gives me a result relative to my control genotype (Genotype0) in a matched environment (Env1). If I want the result relative to my control genotype in a single environment, but on a scale that is shared across all environments (so that I can compare population variances among the 3 environments on the same scale), I'm wondering if I need to create a dummy variable that combines genotype and environment, and then just calculate the variance on the subset of samples that happen to share one environment?
Thanks for any insight you can offer.
A
I'm getting stuck on something that seems like it should be simple, hoping someone can help me find my way out.
I've produced transcriptomes for 10 genotypes across 3 environments (mostly 2 biological reps per G+E combination, a few with more reps). I've used Tophat2 to map reads, ht-seq to generate counts, and run DESeq2 with these contrasts: design=~Batch+Genotype+Env+Genotype:Env.
Standard analyses are working fine, but I'm getting hung up on wanting to calculate per gene variance in expression level among my genotypes in each environment. I'd prefer to work with DESeq2 output over straight counts to appropriately control for technical variability in estimating expression levels, but I'm puzzled about how to extract this information from the results dataframe. Specifically, if I use a design matrix including the interaction term and then extract the contrast specifying a particular genotype and environment (results(dds_GxE, name="Genotype1.Env1")), I believe this gives me a result relative to my control genotype (Genotype0) in a matched environment (Env1). If I want the result relative to my control genotype in a single environment, but on a scale that is shared across all environments (so that I can compare population variances among the 3 environments on the same scale), I'm wondering if I need to create a dummy variable that combines genotype and environment, and then just calculate the variance on the subset of samples that happen to share one environment?
Thanks for any insight you can offer.
A