I'm currently trying to learn how to use the other DE-analysis pipelines that I don't already use (DESeq2), and I have started with limma and baySeq. I'm having some problems, though...
First, baySeq. I'm reading the vignette from Bioconductor, and while I can get some results using the example data, I can't do that with my own data. This is a small piece of my counts table:
Following the procedure in the vignette, my code looks like this:
... and that's as far as I come. The resulting plot looks really weird, not a MA-plot at all: I only have six points! Obviously this has something to do with my data or the way I handle it, and I assume that the six points has something to do with the 3 + 3 replicates. I did try the rest of the vignette standard procedure as well, but the Volcano plot looks equally weird and the lowest FDR.DE I get is around 85%, so whatever is wrong must be wrong even before that. Can anybody see what I've screwed up?
Secondly, limma: this has gone much better, and I have a finished script that works just fine. The only question I have is about the counts table to use, and pertains to this line in the vignette: "The limma-voom method assumes that rows with zero or very low counts have been removed." Does that mean that I remove genes where ALL the samples have 0 count, or do I remove genes that have ANY samples with 0 count? (related to this: is this needed for baySeq, or do I use the counts table as-is?)
First, baySeq. I'm reading the vignette from Bioconductor, and while I can get some results using the example data, I can't do that with my own data. This is a small piece of my counts table:
Code:
hct_a hct_b hct_c rko_a rko_b rko_c ENSG00000XXXXXX 3376 3469 3917 2704 1713 1847 ENSG0000XXXXXXX 0 0 0 0 0 0 ENSG00000XXXXXX 5299 4700 4974 3212 2503 3340 ENSG00000XXXXXX 892 1130 1386 262 360 362 ENSG0000XXXXXXX 1453 568 1610 1482 514 908 ENSG0000XXXXXXX 0 0 0 0 0 0 ... (20,000 ish genes)
Code:
replicates = c(rep("hct", 3), rep("rko", 3))
groups = list(NDE=c(rep(1, 6)), DE=c(rep(1, 3), rep(2, 3)))
CD = new("countData", data = merged, replicates = replicates, groups = groups)
libsizes(CD) = getLibsizes(CD)
plotMA.CD(CD, samplesA="hct", samplesB="rko", col=c(rep("red", 100), rep("black", 900)))
Secondly, limma: this has gone much better, and I have a finished script that works just fine. The only question I have is about the counts table to use, and pertains to this line in the vignette: "The limma-voom method assumes that rows with zero or very low counts have been removed." Does that mean that I remove genes where ALL the samples have 0 count, or do I remove genes that have ANY samples with 0 count? (related to this: is this needed for baySeq, or do I use the counts table as-is?)
Comment