Hi all,
I am new in RNA seq and in R where I have only some solid experience. I face some problems to prepare the reads count matrix file for importing it in edgeR
. I have used coverageBed to convert the 6 bam files corresponding to 2 conditions, 3 replicates each, to the respective coverage files (find attached an example of one of this cov file, named test.txt).
I have to sum up the reads counts which are referring to exons of the same gene (for example at the atatched file in case of the 2 exons of the gene_id "FusR_00001, lines 3 and 5, I have to sum up their read counts, in that case 43+5 = 48, column j).
I am trying to execute the following R code but when I execute it line by line in rstudio the error (Error in data.frame(NA_integer_, NA_integer_, NA_integer_, NA_integer_, : row names contain missing values)
is coming after I run line 28 before I run the following line code
colnames(dat_tab) <- unlist((strsplit(fnames,"/")))[seq(1,41,2)]
### R code###
fnames<- system("ls *.cov",intern=T)
count_list<- list()
for(i in 1:length(fnames))
{ print(i)
tt<- read.table(fnames[i],sep="\t",as.is=T)
tt_e <- tt[tt[,3]=="exon",]
gids<-apply(tt_e,1,function(x){strsplit(x[9],";")[[1]][5]})
gids2<-apply(tt_e,1,function(x){strsplit(x[9],";")[[1]][4]})
gids[is.na(gids)] <- gids2[is.na(gids)]
counts <- c()
for(j in unique(gids))
{
counts<- c(counts,sum(tt_e[gids==j,10]))
}
names(counts) <- unique(gids)
count_list[[i]] <- counts
}
un_names <- unique(unlist(lapply(count_list,names)))
dat_tab <- as.data.frame(lapply(count_list, function(x) {
x[un_names]
}
)
)
###Error in data.frame(NA_integer_, NA_integer_, NA_integer_, NA_integer_, : row names contain missing values###
colnames(dat_tab) <- unlist((strsplit(fnames,"/")))[seq(1,41,2)]
dat_sum <- cbind( rowSums(dat_tab[,1:2]), rowSums(dat_tab[,3:4]), rowSums(dat_tab[,5:6]), rowSums(dat_tab[,7:8]), rowSums(dat_tab[,9:10]), rowSums(dat_tab[,11:12]), rowSums(dat_tab[,13:14]), rowSums(dat_tab[,15:16]), rowSums(dat_tab[,17:18]), dat_tab[,19:21])
colnames(dat_sum) <- colnames(dat_tab)[c(seq(1,18,2),19:21)]
dat_sum <- as.matrix(dat_sum)
###end###
Since i am new in R I am struggled myself to find where the problem lies and I think that I have rows with 0 values at column j that provoke this error. Please any help to overpass this issue????
Thanks in advance
I am new in RNA seq and in R where I have only some solid experience. I face some problems to prepare the reads count matrix file for importing it in edgeR
. I have used coverageBed to convert the 6 bam files corresponding to 2 conditions, 3 replicates each, to the respective coverage files (find attached an example of one of this cov file, named test.txt).I have to sum up the reads counts which are referring to exons of the same gene (for example at the atatched file in case of the 2 exons of the gene_id "FusR_00001, lines 3 and 5, I have to sum up their read counts, in that case 43+5 = 48, column j).
I am trying to execute the following R code but when I execute it line by line in rstudio the error (Error in data.frame(NA_integer_, NA_integer_, NA_integer_, NA_integer_, : row names contain missing values)
is coming after I run line 28 before I run the following line code
colnames(dat_tab) <- unlist((strsplit(fnames,"/")))[seq(1,41,2)]
### R code###
fnames<- system("ls *.cov",intern=T)
count_list<- list()
for(i in 1:length(fnames))
{ print(i)
tt<- read.table(fnames[i],sep="\t",as.is=T)
tt_e <- tt[tt[,3]=="exon",]
gids<-apply(tt_e,1,function(x){strsplit(x[9],";")[[1]][5]})
gids2<-apply(tt_e,1,function(x){strsplit(x[9],";")[[1]][4]})
gids[is.na(gids)] <- gids2[is.na(gids)]
counts <- c()
for(j in unique(gids))
{
counts<- c(counts,sum(tt_e[gids==j,10]))
}
names(counts) <- unique(gids)
count_list[[i]] <- counts
}
un_names <- unique(unlist(lapply(count_list,names)))
dat_tab <- as.data.frame(lapply(count_list, function(x) {
x[un_names]
}
)
)
###Error in data.frame(NA_integer_, NA_integer_, NA_integer_, NA_integer_, : row names contain missing values###
colnames(dat_tab) <- unlist((strsplit(fnames,"/")))[seq(1,41,2)]
dat_sum <- cbind( rowSums(dat_tab[,1:2]), rowSums(dat_tab[,3:4]), rowSums(dat_tab[,5:6]), rowSums(dat_tab[,7:8]), rowSums(dat_tab[,9:10]), rowSums(dat_tab[,11:12]), rowSums(dat_tab[,13:14]), rowSums(dat_tab[,15:16]), rowSums(dat_tab[,17:18]), dat_tab[,19:21])
colnames(dat_sum) <- colnames(dat_tab)[c(seq(1,18,2),19:21)]
dat_sum <- as.matrix(dat_sum)
###end###
Since i am new in R I am struggled myself to find where the problem lies and I think that I have rows with 0 values at column j that provoke this error. Please any help to overpass this issue????
Thanks in advance
Comment