Dear SEQanswer community,
I run edgeR analysis for 8 samples (4-leukemia patients, 4-healthy donors)
I want to perform normalization, estimate mean-variance relationship and test differential expression of the genes.
Before I prepared input data in HTSeq software. So I got 8 separate txt files, each with 2 columns: 1st - for gene name, 2nd - for read counts. The path to the folder is the following: /home/olha/test_dataset/
Here is my script:
However, edgeR does not generate any files (plots, tables, etc.). I run my analysis in LinuxMint shell using Perl.
I saw, that after first line instead of ">" sign, the edgeR produce "+" sign.
How I can cope with this issue?
Thank You very much for help!
Olha
I run edgeR analysis for 8 samples (4-leukemia patients, 4-healthy donors)
I want to perform normalization, estimate mean-variance relationship and test differential expression of the genes.
Before I prepared input data in HTSeq software. So I got 8 separate txt files, each with 2 columns: 1st - for gene name, 2nd - for read counts. The path to the folder is the following: /home/olha/test_dataset/
Here is my script:
Code:
> library(edgeR) Loading required package: limma > samples <- matrix(c("A15","blood","ART_LOS","ART_blood", "A17","blood","ART_LOS","ART_blood", "A18","blood","ART_LOS","ART_blood", "A19","blood","ART_LOS","ART_blood", "H11","blood","AI_control","AI_blood", "H12","blood","AI_control","AI_blood", "H13","blood","AI_control","AI_blood", "H15","blood","AI_control","AI_blood"), nrow = 8, ncol = 4, byrow = TRUE, dimnames = list(c(1:8), c("library_name","organ","condition","group_LOS_vs_control"))) + samples <- as.data.frame (samples, row.names = NULL, optional = FALSE, stringAsFactors = default.stringAsFactors()) + counts <- readDGE(samples$library_name, path = "/home/olha/test_dataset/", columns=c(1,2), group = samples$group_LOS_vs_control, header = FALSE) + colnames(counts) <- samples$library_name + dim(counts) + counts$samples + noint <- rownames(counts) %in% c('__no_feature','__ambiguous','__too_low_aQual','__not_aligned','__alignment_not_unique') + cpms <- cpm(counts) + keep <- rowSums (cpms > 1) >= 4 & !noint + counts <- counts[keep,] + counts <- DGEList(counts=counts,group = samples$group_ALL_vs_control) + counts <- calcNormFactors(counts) + counts$samples + counts + pdf(file = 'HCB_ALL.pdf', width = 9, height = 6) + plotMDS(counts, labels = c('A15','A17','A18','A19','H11','H12','H13','H15'), xlab = 'Dimension 1', ylab = 'Dimension 2', asp = 6/9, cex = 0.8, main = 'Multidimentional scaling plot of blood') + par(cex.axis =0.6, cex.lab = 0.6, cex.main = 1) + dev.off() + counts <- estimateCommonDisp(counts, verbos = TRUE) + counts <- estimateTagwiseDisp(counts) + pdf(file = 'ALL.pdf', width = 7, height = 7) + plotBCV(counts, main = 'Biological coefficient of variation vs. mean log CPM of brain')+ + par(cex.axis =0.6, cex.lab = 0.6, cex.main = 1) + dev.off() + pdf(file = 'Mean_variance_relationships_blood_simple.pdf', width = 7, height = 7) plotMeanVar(counts, show.tagwise.vars =TRUE, NBline= TRUE, main = 'Mean_variance relationships of blood') par(cex.axis =0.6, cex.lab = 0.6, cex.main = 1) dev.off() + de <- exactTest(counts) + tt <- topTags(de, n=nrow(counts), sort.by = 'logFC') head(tt$table) + nc <- cpm(counts, normalized.lib.sizes = TRUE) rn <- rownames(tt$table) head (nc[rn, order(samples$condition)], 4) + deg <- rn[tt$table$FDR <.01] pdf(file = 'differentail_genes_blood_simple.pdf') plotSmear(counts, de.tags = deg, main = 'Log fold change of expression level in blood: ART_LOS vs. AI') abline(h=c(-1, 1), col='blue') par(cex.axis =0.6, cex.lab = 0.6, cex.main = 1) dev.off()+ + + summary(de <- decideTestsDGE(de)) + pdf (file = 'test.pdf') detags <- rownames(counts)[as.logical(de)] plotSmear(counts, de.tags=detags) abline(h=c(-1, 1), col='blue') dev.off() + write.csv(tt$table, file = 'differential_genes_blood_simple.txt')
I saw, that after first line instead of ">" sign, the edgeR produce "+" sign.
How I can cope with this issue?
Thank You very much for help!
Olha