I have 5 time-points with 2 biological replicates (collected and prepared on separate days following exactly the same protocol) of bacteria during starvation-induced development. I've done analysis using CLC genome workbench and tophat-cufflinks-cuffdiff (yes, I realize I probably only need bowtie for bacteria, but I figured looking for nonexistent splice junctions would just take computational time and shouldn't change anything).
My problem is this; there are a number of genes that I know are differentially regulated (previously published, validated by me by qPCR) that go up by many fold (one example goes from 50 RPKM to like 4000) but that both programs say are not statistically significantly regulated because there is high variability between replicates.
Instead, the genes that are given as statistically significantly regulated are expressed at very low levels and don't have as much variability or a very high fold up-(or down) regulation (from 20 to 2 RPKM, for example). These seem less likely to be interesting biologically.
So my question is, am I going to be able to get anything statistically valid out of this data, or if there's a lot of variation am I just out of luck? I am sure I could just cherry-pick genes for future work, but that seems like a waste of data.
If I try DESeq, will I just have the same problem in a different format, or might the different ways the programs analyze the data change the way statistics are calculated?
Thanks,
Anna
My problem is this; there are a number of genes that I know are differentially regulated (previously published, validated by me by qPCR) that go up by many fold (one example goes from 50 RPKM to like 4000) but that both programs say are not statistically significantly regulated because there is high variability between replicates.
Instead, the genes that are given as statistically significantly regulated are expressed at very low levels and don't have as much variability or a very high fold up-(or down) regulation (from 20 to 2 RPKM, for example). These seem less likely to be interesting biologically.
So my question is, am I going to be able to get anything statistically valid out of this data, or if there's a lot of variation am I just out of luck? I am sure I could just cherry-pick genes for future work, but that seems like a waste of data.
If I try DESeq, will I just have the same problem in a different format, or might the different ways the programs analyze the data change the way statistics are calculated?
Thanks,
Anna
Comment