Hi All,
I've recently being assessing some RNA-Seq data that I have from a pilot project. When we begin our actual study though we'll have much more samples and hence we want to use less coverage to save money where possible. Basically we're looking at differential expression between 2 conditions, however when I randomly extracted 1/3 of the reads for each file and mapped them (using the exact same pipeline as the whole files) and then looked at differential expression, I found about 4 times more differentially expressed genes than with the all of the data.
Any ideas why? I've been doing this using DESeq
I also performed some clustering and found that the samples from the pilot study tend to fall into 2 fairly distinct groups, but look at differential expression within those groups isn't viable because of the sample size (only 3 samples in each group) and so DESeq doesn't detect anything as being significantly differentially expressed.
I've recently being assessing some RNA-Seq data that I have from a pilot project. When we begin our actual study though we'll have much more samples and hence we want to use less coverage to save money where possible. Basically we're looking at differential expression between 2 conditions, however when I randomly extracted 1/3 of the reads for each file and mapped them (using the exact same pipeline as the whole files) and then looked at differential expression, I found about 4 times more differentially expressed genes than with the all of the data.
Any ideas why? I've been doing this using DESeq
I also performed some clustering and found that the samples from the pilot study tend to fall into 2 fairly distinct groups, but look at differential expression within those groups isn't viable because of the sample size (only 3 samples in each group) and so DESeq doesn't detect anything as being significantly differentially expressed.
Comment