Dear all,
I am a first time DEseq2 user, and I am already stuck with importing my dataset.
My RNAseq data has been going through the Hisat2 - StringTie pipeline and I have created a gene counts file using the python script provided with StringTie.
As far as I can tell, my gene count data set looks just fine, except that there is something weird going on with negative values, and I have no idea what.
I am trying to import the data into DEseq2 with the DESeqDataSetFromMatrix function.
Here's a step-by-step version of what I have done so far:
# Import data file that contains gene counts
countdata <- as.matrix(read_excel("DEseqcounts.xlsx"),header=TRUE)
# take row names from the first column
rownames(countdata) <- countdata[ , 1]
# first column is now duplicated, so remove
countdata <- countdata[,-1]
# Import data file that contains phenotype data in columns
coldata=as.matrix(read_excel("coldata.xlsx"),header=TRUE)
# take row names from the first column
rownames(coldata) <- coldata[ , 1]
# first column is now duplicated, so remove
coldata <- coldata[,-1]
(I have visually checked that the files are imported correctly, and I can't seem to find anything that looks wrong)
I would like to run the DESeqDataSetFromMatrix as follows:
DESeqDataSetFromMatrix(countData = countdata, colData = coldata, design = ~ treatment, tidy = FALSE, ignoreRank = FALSE)
which returns this error message:
Error in DESeqDataSet(se, design = design, ignoreRank) : some values in assay are negative
Indeed, there seem to be values in my "countdata" object that are somehow classified as negative:
countdata["" < 0] omitted 1280373 entries, which look like this:
[1] " 0" " 0" " 0" " 0" " 5" " 0" " 26" " 104" " 10" " 24"
[11] " 22" " 3" " 22" " 0" " 226" " 0" " 152" " 2" " 153" " 178"
[21] " 0" " 2" " 427" " 153" " 0" " 475" " 0" " 0" " 16" " 101"
[31] " 78" " 26" " 71" " 372" " 35" " 17" " 108" " 100" " 43" " 0"
I have no ideas where that comes from. I couldn't find any negative, empty or NA cells in my count data file, nor are there any spaces in the cells.
Does anyone have a solution, or an idea on what went wrong?
Any help is highly appreciated,
Thanks so much!
I am a first time DEseq2 user, and I am already stuck with importing my dataset.
My RNAseq data has been going through the Hisat2 - StringTie pipeline and I have created a gene counts file using the python script provided with StringTie.
As far as I can tell, my gene count data set looks just fine, except that there is something weird going on with negative values, and I have no idea what.
I am trying to import the data into DEseq2 with the DESeqDataSetFromMatrix function.
Here's a step-by-step version of what I have done so far:
# Import data file that contains gene counts
countdata <- as.matrix(read_excel("DEseqcounts.xlsx"),header=TRUE)
# take row names from the first column
rownames(countdata) <- countdata[ , 1]
# first column is now duplicated, so remove
countdata <- countdata[,-1]
# Import data file that contains phenotype data in columns
coldata=as.matrix(read_excel("coldata.xlsx"),header=TRUE)
# take row names from the first column
rownames(coldata) <- coldata[ , 1]
# first column is now duplicated, so remove
coldata <- coldata[,-1]
(I have visually checked that the files are imported correctly, and I can't seem to find anything that looks wrong)
I would like to run the DESeqDataSetFromMatrix as follows:
DESeqDataSetFromMatrix(countData = countdata, colData = coldata, design = ~ treatment, tidy = FALSE, ignoreRank = FALSE)
which returns this error message:
Error in DESeqDataSet(se, design = design, ignoreRank) : some values in assay are negative
Indeed, there seem to be values in my "countdata" object that are somehow classified as negative:
countdata["" < 0] omitted 1280373 entries, which look like this:
[1] " 0" " 0" " 0" " 0" " 5" " 0" " 26" " 104" " 10" " 24"
[11] " 22" " 3" " 22" " 0" " 226" " 0" " 152" " 2" " 153" " 178"
[21] " 0" " 2" " 427" " 153" " 0" " 475" " 0" " 0" " 16" " 101"
[31] " 78" " 26" " 71" " 372" " 35" " 17" " 108" " 100" " 43" " 0"
I have no ideas where that comes from. I couldn't find any negative, empty or NA cells in my count data file, nor are there any spaces in the cells.
Does anyone have a solution, or an idea on what went wrong?
Any help is highly appreciated,
Thanks so much!
Comment