Hey guys,
I have the following scenario. We performed MACE experiments (like RNA-seq but you produce only one read per transcript) for three developmental stages of tomato pollen (3 biological replicates for each stage). On the one hand I produced RPM values for all genes by normalizing simply for the differing library size. On the other I performed an analysis with DESeq2. When comparing the RPM values of stage 1 and stage 3 I see for some genes nearly identical RPM values, whereas according to DESeq2 these genes are differentially expressed. In total I have around 16,000 (of around 19,000) differentially expressed genes when performing DESeq2 analysis with LRT test on all three stages
When looking at the venn diagram for identified genes between the 3 stages I see the following:
only_stage1: 3891
only_stage2: 275
only_stage3: 178
stage1_stage2: 3846
stage1_stage3: 88
stage2_stage3: 415
all_3_stages: 10387
My problem is that I do not know whether I should trust the RPM values or DESeq2 because DESeq2 has for normalization the assumption that: "normalization is working under the assumption that most of the genes are not DE and only a small subset of genes are". But during the development of pollen there are drastic physiological and morphological changes going along with strong changes in expressed genes (as you can see from the venn diagram).
Or is DESeq2 fine and I have strong carry-over effects within the RPM values by genes exclusively expressed in one of the stages.
I further checked one of our housekeeping genes from qRT-PCRs. This gene has for the RPM values a log2FC between stage3 and 1 of ~-2.3 and for DESeq2 a log2FC of -0.89 (here I made a Wald test between stage3 and 1 and it is DE). So for the housekeeping gene DESeq2 is performing slightly better.
I really hope someone has any suggestions for me. I'm really desperate.
Greetings
Mario
I have the following scenario. We performed MACE experiments (like RNA-seq but you produce only one read per transcript) for three developmental stages of tomato pollen (3 biological replicates for each stage). On the one hand I produced RPM values for all genes by normalizing simply for the differing library size. On the other I performed an analysis with DESeq2. When comparing the RPM values of stage 1 and stage 3 I see for some genes nearly identical RPM values, whereas according to DESeq2 these genes are differentially expressed. In total I have around 16,000 (of around 19,000) differentially expressed genes when performing DESeq2 analysis with LRT test on all three stages
When looking at the venn diagram for identified genes between the 3 stages I see the following:
only_stage1: 3891
only_stage2: 275
only_stage3: 178
stage1_stage2: 3846
stage1_stage3: 88
stage2_stage3: 415
all_3_stages: 10387
My problem is that I do not know whether I should trust the RPM values or DESeq2 because DESeq2 has for normalization the assumption that: "normalization is working under the assumption that most of the genes are not DE and only a small subset of genes are". But during the development of pollen there are drastic physiological and morphological changes going along with strong changes in expressed genes (as you can see from the venn diagram).
Or is DESeq2 fine and I have strong carry-over effects within the RPM values by genes exclusively expressed in one of the stages.
I further checked one of our housekeeping genes from qRT-PCRs. This gene has for the RPM values a log2FC between stage3 and 1 of ~-2.3 and for DESeq2 a log2FC of -0.89 (here I made a Wald test between stage3 and 1 and it is DE). So for the housekeeping gene DESeq2 is performing slightly better.
I really hope someone has any suggestions for me. I'm really desperate.
Greetings
Mario