Hi,
I have an RNA-seq experiment where I would like to identify genes that are differentially expressed between two species and alleles that are differentially expressed in hybrids.
I have six libraries from two parental lines A1, A2, A3, B1, B2, B3 for which I have calculated read counts per gene.
And six libraries from two reciprocal hybrids AxB1, AxB2, AxB3, BxA1, BxA2, BxA3, for which I have calculated read counts per allele. So for the hybrids I have twelve columns of read counts AxB1(A_allele), AxB1 (B_allele), AxB2 (A_allele), AxB2 (B_alelle)…..
I would now like to call DE genes/alleles.
For the parents, I will use DEseq/edgeR to identify genes DE in A vs B. This should be straightforward.
For the hybrids, I’m less sure how to test for DE of the A allele vs the B allele. Can I use DEseq/edgeR considering the A_allele counts and B_allele counts as individual libraries?
My first thought was to use DEseq/edgeR because this is count data, and the variance will presumably decrease as the mean increases. However, looking in the literature it appears that most people use an alternative:
Gregg et al. Science -> Chi-squared
McManus et al. Genome Res –> fisher exact
Rozowsky et al. Molecular Systems Biology (AlleleSeq) -> binomial test
Pandey et al. Mol Ecology Resources (Allim) -> G tests, ANOVA
Bell et al Genome Biology Evolution -> binomial test
Zhang et al. PNAS -> Chi Squared
Secondly, if I can use DEseq/edgeR, would it be a good idea (or possible) to manually adjust library size based on the total number of reads aligning to each allele? In the hybrids, I get slightly more reads aligning to A than B (51% vs 49%). Assuming for the moment this is biological effect rather than a mapping artifact, if I normalize to library size I size would presumably slightly overestimate the B allele vs the A allele.
Any help much appreciated!
I have an RNA-seq experiment where I would like to identify genes that are differentially expressed between two species and alleles that are differentially expressed in hybrids.
I have six libraries from two parental lines A1, A2, A3, B1, B2, B3 for which I have calculated read counts per gene.
And six libraries from two reciprocal hybrids AxB1, AxB2, AxB3, BxA1, BxA2, BxA3, for which I have calculated read counts per allele. So for the hybrids I have twelve columns of read counts AxB1(A_allele), AxB1 (B_allele), AxB2 (A_allele), AxB2 (B_alelle)…..
I would now like to call DE genes/alleles.
For the parents, I will use DEseq/edgeR to identify genes DE in A vs B. This should be straightforward.
For the hybrids, I’m less sure how to test for DE of the A allele vs the B allele. Can I use DEseq/edgeR considering the A_allele counts and B_allele counts as individual libraries?
My first thought was to use DEseq/edgeR because this is count data, and the variance will presumably decrease as the mean increases. However, looking in the literature it appears that most people use an alternative:
Gregg et al. Science -> Chi-squared
McManus et al. Genome Res –> fisher exact
Rozowsky et al. Molecular Systems Biology (AlleleSeq) -> binomial test
Pandey et al. Mol Ecology Resources (Allim) -> G tests, ANOVA
Bell et al Genome Biology Evolution -> binomial test
Zhang et al. PNAS -> Chi Squared
Secondly, if I can use DEseq/edgeR, would it be a good idea (or possible) to manually adjust library size based on the total number of reads aligning to each allele? In the hybrids, I get slightly more reads aligning to A than B (51% vs 49%). Assuming for the moment this is biological effect rather than a mapping artifact, if I normalize to library size I size would presumably slightly overestimate the B allele vs the A allele.
Any help much appreciated!
Comment