Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • antoza
    Member
    • Aug 2013
    • 18

    rnaseq format the count reads matrix file before edgeR

    Hi all,

    I am new in RNA seq and in R where I have only some solid experience. I face some problems to prepare the reads count matrix file for importing it in edgeR. I have used coverageBed to convert the 6 bam files corresponding to 2 conditions, 3 replicates each, to the respective coverage files (find attached an example of one of this cov file, named test.txt).

    I have to sum up the reads counts which are referring to exons of the same gene (for example at the atatched file in case of the 2 exons of the gene_id "FusR_00001, lines 3 and 5, I have to sum up their read counts, in that case 43+5 = 48, column j).

    I am trying to execute the following R code but when I execute it line by line in rstudio the error (Error in data.frame(NA_integer_, NA_integer_, NA_integer_, NA_integer_, : row names contain missing values)

    is coming after I run line 28 before I run the following line code

    colnames(dat_tab) <- unlist((strsplit(fnames,"/")))[seq(1,41,2)]

    ### R code###
    fnames<- system("ls *.cov",intern=T)
    count_list<- list()
    for(i in 1:length(fnames))
    { print(i)
    tt<- read.table(fnames[i],sep="\t",as.is=T)
    tt_e <- tt[tt[,3]=="exon",]
    gids<-apply(tt_e,1,function(x){strsplit(x[9],";")[[1]][5]})
    gids2<-apply(tt_e,1,function(x){strsplit(x[9],";")[[1]][4]})
    gids[is.na(gids)] <- gids2[is.na(gids)]
    counts <- c()
    for(j in unique(gids))
    {
    counts<- c(counts,sum(tt_e[gids==j,10]))
    }
    names(counts) <- unique(gids)
    count_list[[i]] <- counts
    }

    un_names <- unique(unlist(lapply(count_list,names)))

    dat_tab <- as.data.frame(lapply(count_list, function(x) {


    x[un_names]

    }
    )
    )

    ###Error in data.frame(NA_integer_, NA_integer_, NA_integer_, NA_integer_, : row names contain missing values###

    colnames(dat_tab) <- unlist((strsplit(fnames,"/")))[seq(1,41,2)]

    dat_sum <- cbind( rowSums(dat_tab[,1:2]), rowSums(dat_tab[,3:4]), rowSums(dat_tab[,5:6]), rowSums(dat_tab[,7:8]), rowSums(dat_tab[,9:10]), rowSums(dat_tab[,11:12]), rowSums(dat_tab[,13:14]), rowSums(dat_tab[,15:16]), rowSums(dat_tab[,17:18]), dat_tab[,19:21])

    colnames(dat_sum) <- colnames(dat_tab)[c(seq(1,18,2),19:21)]

    dat_sum <- as.matrix(dat_sum)

    ###end###

    Since i am new in R I am struggled myself to find where the problem lies and I think that I have rows with 0 values at column j that provoke this error. Please any help to overpass this issue????

    Thanks in advance
    Attached Files
  • dpryan
    Devon Ryan
    • Jul 2011
    • 3478

    #2
    There are two ways to go about this. Firstly, you could convert your pseudo-GTF file (test.txt is just a GTF file with some extra column) into an actual GTF by just making the counts features in column 9. You'd then read that into R as a GRanges object, split() by gene_id and sapply a function over that to return the colSum of the appropriate metadata columns.

    Secondly and more simply, you could just delete this file and use either htseq-count or featureCounts and then not have to deal with this. I should also add the coverageBed is not likely to produce 100% correct count metrics for the purposes of RNAseq.

    Make your life easier and go with the second option.

    Comment

    • antoza
      Member
      • Aug 2013
      • 18

      #3
      Dear dpryan,

      Many thanks for your reply, issue solved!!

      Comment

      Latest Articles

      Collapse

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by SEQadmin2, 06-05-2026, 10:09 AM
      0 responses
      12 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-04-2026, 08:59 AM
      0 responses
      24 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-02-2026, 12:03 PM
      0 responses
      28 views
      0 reactions
      Last Post SEQadmin2  
      Started by SEQadmin2, 06-02-2026, 11:40 AM
      0 responses
      22 views
      0 reactions
      Last Post SEQadmin2  
      Working...