Hi,
I'm tried to use the WGCNA bioconductor package on my RNAseq data - it was originally integer FPKM counts from the tuxedo pipeline, so I log2(x+1) transformed it (as was reccomended on the WGCNA bioconductor page), because it prefers to deal with numeric counts rather than integer counts..
But I keep runnning up against the problem...
Has anyone else come up against this problem/know how to solve it?
My data looks like this
X12Y30 X10B70 X10G87 X36W62 X23Y70 X2UNA X12Y47 X10R99 X10G17 X35Y44 X36W35 X23Y59 X2B82 X12Y51
5.5 5.1 4.9 5.4 6.4 5.3 4.9 4.6 5.4 5.4 5.3 4.1 5.4 4.2
5.9 5.4 4.9 4.7 5.9 5.8 4.9 4.8 5.3 4.2 5.0 5.0 4.8 2.3
0.0 2.8 0.0 0.0 0.0 0.0 0.0 3.7 4.1 0.0 2.3 2.8 0.0 3.0
0.0 3.5 0.0 0.0 0.0 0.0 2.3 1.6 5.0 0.0 1.6 0.0 0.0 2.6
5.6 5.6 5.8 5.2 5.6 6.1 5.6 5.3 6.3 4.2 5.3 5.4 6.0 5.2
5.8 5.6 5.6 4.2 5.8 6.0 5.2 4.8 6.3 5.5 4.9 5.8 5.6 4.5
9.5 9.6 10.0 9.9 9.5 10.1 10.3 10.1 10.4 10.8 10.3 11.0 10.4 10.0
9.5 9.7 10.1 9.8 9.5 10.2 10.4 10.1 10.4 10.8 10.3 11.0 10.5 10.1
Thanks!
I'm tried to use the WGCNA bioconductor package on my RNAseq data - it was originally integer FPKM counts from the tuxedo pipeline, so I log2(x+1) transformed it (as was reccomended on the WGCNA bioconductor page), because it prefers to deal with numeric counts rather than integer counts..
But I keep runnning up against the problem...
Code:
> setwd("/home/user/edgeR") > library(WGCNA) Loading required package: dynamicTreeCut Loading required package: fastcluster Attaching package: ‘fastcluster’ The following object is masked from ‘package:stats’: hclust Creating a generic function for ‘nchar’ from package ‘base’ in package ‘S4Vectors’ Loading required package: DBI ========================================================================== * * Package WGCNA 1.49 loaded. * * Important note: It appears that your system supports multi-threading, * but it is not enabled within WGCNA in R. * To allow multi-threading within WGCNA with all available cores, use * * allowWGCNAThreads() * * within R. Use disableWGCNAThreads() to disable threading if necessary. * Alternatively, set the following environment variable on your system: * * ALLOW_WGCNA_THREADS=<number_of_processors> * * for example * * ALLOW_WGCNA_THREADS=8 * * To set the environment variable in linux bash shell, type * * export ALLOW_WGCNA_THREADS=8 * * before running R. Other operating systems or shells will * have a similar command to achieve the same aim. * ========================================================================== Attaching package: ‘WGCNA’ The following object is masked from ‘package:stats’: cor > options(stringsAsFactors = FALSE) > alphaData = read.csv("data.csv") > nSets = 1 > multiExpr = vector(mode = "list", length = nSets) > multiExpr[[1]] = list(data = as.data.frame(t(alphaData[-c(1:8)]))); > names(multiExpr[[1]]$data) = alphaData$substanceBXH; > rownames(multiExpr[[1]]$data) = names(alphaData)[-c(1:8)]; > exprSize = checkSets(multiExpr) > exprSize $nSets [1] 1 $nGenes [1] 20162 $nSamples [1] 19 $structureOK [1] TRUE > gsg = goodSamplesGenesMS(multiExpr, verbose = 3); Flagging genes and samples with too many missing values... ..step 1 ..bad gene count: 0, bad sample counts: 0 > gsg$allOK [1] TRUE > sampleTrees = list() > for (set in 1:nSets) + { + sampleTrees[[set]] = hclust(dist(multiExpr[[set]]$data), method = "average") + } > pdf(file = "Plots/SampleClustering.pdf", width = 12, height = 12); Error in pdf(file = "Plots/SampleClustering.pdf", width = 12, height = 12) : cannot open file 'Plots/SampleClustering.pdf' > par(mfrow=c(2,1)) > par(mar = c(0, 4, 2, 0)) > for (set in 1:nSets) + plot(sampleTrees[[set]], main = paste("Sample clustering on all genes in", setLabels[set]), + xlab="", sub="", cex = 0.7); Error in paste("Sample clustering on all genes in", setLabels[set]) : object 'setLabels' not found > dev.off() null device 1 > save(multiExpr,setLabels, shortLabels, + file = "Consensus-dataInput.RData") Error in save(multiExpr, setLabels, shortLabels, file = "Consensus-dataInput.RData") : objects ‘setLabels’, ‘shortLabels’ not found > enableWGCNAThreads() Allowing parallel execution with up to 7 working processes. > lnames = load(file = "Consensus-dataInput.RData"); > lnames [1] "multiExpr" "setLabels" "shortLabels" > nSets = checkSets(multiExpr)$nSets > powers = c(seq(4,10,by=1), seq(12,20, by=2)); > powerTables = vector(mode = "list", length = nSets); > for (set in 1:nSets) + powerTables[[set]] = list(data = pickSoftThreshold(multiExpr[[set]]$data, powerVector=powers, + verbose = 2)[[2]]); pickSoftThreshold: will use block size 2218. pickSoftThreshold: calculating connectivity for given powers... ..working on genes 1 through 2218 of 20162 ..working on genes 2219 through 4436 of 20162 ..working on genes 4437 through 6654 of 20162 ..working on genes 6655 through 8872 of 20162 ..working on genes 8873 through 11090 of 20162 ..working on genes 11091 through 13308 of 20162 ..working on genes 13309 through 15526 of 20162 ..working on genes 15527 through 17744 of 20162 ..working on genes 17745 through 19962 of 20162 ..working on genes 19963 through 20162 of 20162 Power SFT.R.sq slope truncated.R.sq mean.k. median.k. max.k. 1 4 0.237 -0.734 0.728 946.0 935.0 2330 2 5 0.336 -0.843 0.733 627.0 612.0 1670 3 6 0.407 -1.020 0.754 438.0 419.0 1290 4 7 0.487 -1.110 0.814 318.0 296.0 1020 5 8 0.561 -1.190 0.879 239.0 215.0 840 6 9 0.620 -1.280 0.925 184.0 160.0 709 7 10 0.666 -1.390 0.953 145.0 121.0 618 8 12 0.730 -1.560 0.979 94.5 71.8 485 9 14 0.779 -1.680 0.987 65.1 45.1 395 10 16 0.820 -1.760 0.992 46.8 29.4 331 11 18 0.837 -1.810 0.985 34.8 19.9 282 12 20 0.854 -1.830 0.981 26.6 13.9 243 > collectGarbage(); > colors = c("black", "red") > plotCols = c(2,5,6,7) > colNames = c("Scale Free Topology Model Fit", "Mean connectivity", "Median connectivity", + "Max connectivity"); > ylim = matrix(NA, nrow = 2, ncol = 4); > for (set in 1:nSets) + { + for (col in 1:length(plotCols)){ + ylim[1, col] = min(ylim[1, col], powerTables[[set]]$data[, plotCols[col]], na.rm = TRUE); + ylim[2, col] = max(ylim[2, col], powerTables[[set]]$data[, plotCols[col]], na.rm = TRUE); + } + } > sizeGrWindow(8, 6) > par(mfcol = c(2,2)); > par(mar = c(4.2, 4.2 , 2.2, 0.5)) > cex1 = 0.7; > for (col in 1:length(plotCols)) for (set in 1:nSets) + { + if (set==1) + { + plot(powerTables[[set]]$data[,1], -sign(powerTables[[set]]$data[,3])*powerTables[[set]]$data[,2], + xlab="Soft Threshold (power)",ylab=colNames[col],type="n", ylim = ylim[, col], + main = colNames[col]); + addGrid(); + } + if (col==1) + { + text(powerTables[[set]]$data[,1], -sign(powerTables[[set]]$data[,3])*powerTables[[set]]$data[,2], + labels=powers,cex=cex1,col=colors[set]); + } else + text(powerTables[[set]]$data[,1], powerTables[[set]]$data[,plotCols[col]], + labels=powers,cex=cex1,col=colors[set]); + if (col==1) + { + legend("bottomright", legend = setLabels, col = colors, pch = 20) ; + } else + legend("topright", legend = setLabels, col = colors, pch = 20) ; + } > net = blockwiseConsensusModules( + multiExpr, power = 6, minModuleSize = 30, deepSplit = 2, + pamRespectsDendro = FALSE, + mergeCutHeight = 0.25, numericLabels = TRUE, + minKMEtoStay = 0, + saveTOMs = TRUE, verbose = 5) Calculating consensus modules and module eigengenes block-wise from all genes Calculating topological overlaps block-wise from all genes Flagging genes and samples with too many missing values... ..step 1 ..bad gene count: 0, bad sample counts: 0 ....pre-clustering genes to determine blocks.. Consensus projective K-means: ..using 1008 centers. ..consensus k-means clustering.. ..iteration 1 ..proposing to move 19900 genes..move accepted. ..iteration 2 ..proposing to move 8627 genes..move accepted. ..iteration 3 ..proposing to move 4279 genes..move accepted. ..iteration 4 ..proposing to move 2669 genes..move accepted. ..iteration 5 ..proposing to move 1879 genes..move accepted. ..iteration 6 ..proposing to move 1328 genes..move accepted. ..iteration 7 ..proposing to move 910 genes..move accepted. ..iteration 8 ..proposing to move 633 genes..move accepted. ..iteration 9 ..proposing to move 429 genes..move accepted. ..iteration 10 ..proposing to move 241 genes..move accepted. ..iteration 11 ..proposing to move 141 genes..move accepted. ..iteration 12 ..proposing to move 70 genes..move accepted. ..iteration 13 ..proposing to move 28 genes..move accepted. ..iteration 14 ..proposing to move 9 genes..move accepted. ..iteration 15 ..proposing to move 4 genes..move accepted. ..iteration 16 Could not find a suitable move to improve the clustering. ..merging smaller clusters... ..Working on block 1 . ....Working on set 1 Error: REAL() can only be applied to a 'numeric', not a 'integer'
My data looks like this
X12Y30 X10B70 X10G87 X36W62 X23Y70 X2UNA X12Y47 X10R99 X10G17 X35Y44 X36W35 X23Y59 X2B82 X12Y51
5.5 5.1 4.9 5.4 6.4 5.3 4.9 4.6 5.4 5.4 5.3 4.1 5.4 4.2
5.9 5.4 4.9 4.7 5.9 5.8 4.9 4.8 5.3 4.2 5.0 5.0 4.8 2.3
0.0 2.8 0.0 0.0 0.0 0.0 0.0 3.7 4.1 0.0 2.3 2.8 0.0 3.0
0.0 3.5 0.0 0.0 0.0 0.0 2.3 1.6 5.0 0.0 1.6 0.0 0.0 2.6
5.6 5.6 5.8 5.2 5.6 6.1 5.6 5.3 6.3 4.2 5.3 5.4 6.0 5.2
5.8 5.6 5.6 4.2 5.8 6.0 5.2 4.8 6.3 5.5 4.9 5.8 5.6 4.5
9.5 9.6 10.0 9.9 9.5 10.1 10.3 10.1 10.4 10.8 10.3 11.0 10.4 10.0
9.5 9.7 10.1 9.8 9.5 10.2 10.4 10.1 10.4 10.8 10.3 11.0 10.5 10.1
Thanks!
Comment