Does anybody know any string compression technique (in R or Perl)?
In particular we usually would like to compress tag of length 35 above,
to store them in data structure for further processing. Note also
that we are talking about ~10million tags to process.
I have such implementation in R to convert tag to numerical value.
But it get overflow error when handling tag of length > 30
In particular we usually would like to compress tag of length 35 above,
to store them in data structure for further processing. Note also
that we are talking about ~10million tags to process.
I have such implementation in R to convert tag to numerical value.
But it get overflow error when handling tag of length > 30
Code:
tagsequence2tagnum <- function (tags, length)
{
new.tags <- tolower(unlist(strsplit(as.character(tags), "")))
new.tags[!(new.tags == "a" | new.tags == "g" | new.tags ==
"c" | new.tags == "t" | new.tags == "s" | new.tags ==
"y" | new.tags == "b" | new.tags == "k")] <- "n"
new.tags <- matrix(as.numeric(chartr("acgtnsybk", "012301112",
new.tags)), nrow = length)
colSums(new.tags * 4^((length - 1):0)) + 1
}
tagnum2tagsequence <- function (tags, length)
{
new.tags <- t(matrix((rep(tags - 1, each = length)%/%4^((length - 1):0))%%4, nrow = length))
new.tags <- apply(new.tags, 1, paste, collapse = "")
chartr("0123", "acgt", new.tags)
}
Comment