I have an R script in which i can read the lines from a .Sam file after mapping and i want to parse lines of sam file into strings in order to be easier to manipulate them and create the wig files that i want or to calculate the cov3 and cov5 that i need.
Can you help me please to make this script work faster?how can i parse lines of a huge .sam file into a data frame faster?Or into a list? Here is my script:
gc()
rm(list=ls())
exptPath <- "/home/dimitris/INDEX3PerfectUnique31cov5.sam"
lines <- readLines(exptPath)
pos = lines
pos
chrom = lines
chrom
pos = ""
chrom = ""
nn = length(lines)
nn
# parse lines of sam file into strings(this part is very very slow)
rr = strsplit(lines,"\t", fixed = TRUE)
rr
trr = do.call(rbind.data.frame, rr)
pos = as.numeric(as.character(trr[8:nn,4]))
# for cov3
#pos = pos+25
#pos
chrom = trr[8:nn,3]
pos = as.numeric(pos)
pos
tab1 = table(chrom,pos, exclude="")
tab1
ftab1 = as.data.frame(tab1)
ftab1 = subset(ftab1, ftab1[3] != 0)
ftab1 = subset(ftab1, ftab1[1] != "<NA>")
oftab1 = ftab1[ order(ftab1[,1]), ]
final.ftab1 = oftab1[,2:3]
write.table(final.ftab1, "ind3_cov5_wig.txt", row.names=FALSE, sep=" ", quote=FALSE)
Can you help me please to make this script work faster?how can i parse lines of a huge .sam file into a data frame faster?Or into a list? Here is my script:
gc()
rm(list=ls())
exptPath <- "/home/dimitris/INDEX3PerfectUnique31cov5.sam"
lines <- readLines(exptPath)
pos = lines
pos
chrom = lines
chrom
pos = ""
chrom = ""
nn = length(lines)
nn
# parse lines of sam file into strings(this part is very very slow)
rr = strsplit(lines,"\t", fixed = TRUE)
rr
trr = do.call(rbind.data.frame, rr)
pos = as.numeric(as.character(trr[8:nn,4]))
# for cov3
#pos = pos+25
#pos
chrom = trr[8:nn,3]
pos = as.numeric(pos)
pos
tab1 = table(chrom,pos, exclude="")
tab1
ftab1 = as.data.frame(tab1)
ftab1 = subset(ftab1, ftab1[3] != 0)
ftab1 = subset(ftab1, ftab1[1] != "<NA>")
oftab1 = ftab1[ order(ftab1[,1]), ]
final.ftab1 = oftab1[,2:3]
write.table(final.ftab1, "ind3_cov5_wig.txt", row.names=FALSE, sep=" ", quote=FALSE)
Comment