Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DEseq2 - some values in assay are negative

    Dear all,

    I am a first time DEseq2 user, and I am already stuck with importing my dataset.

    My RNAseq data has been going through the Hisat2 - StringTie pipeline and I have created a gene counts file using the python script provided with StringTie.

    As far as I can tell, my gene count data set looks just fine, except that there is something weird going on with negative values, and I have no idea what.

    I am trying to import the data into DEseq2 with the DESeqDataSetFromMatrix function.

    Here's a step-by-step version of what I have done so far:

    # Import data file that contains gene counts
    countdata <- as.matrix(read_excel("DEseqcounts.xlsx"),header=TRUE)
    # take row names from the first column
    rownames(countdata) <- countdata[ , 1]
    # first column is now duplicated, so remove
    countdata <- countdata[,-1]

    # Import data file that contains phenotype data in columns
    coldata=as.matrix(read_excel("coldata.xlsx"),header=TRUE)
    # take row names from the first column
    rownames(coldata) <- coldata[ , 1]
    # first column is now duplicated, so remove
    coldata <- coldata[,-1]

    (I have visually checked that the files are imported correctly, and I can't seem to find anything that looks wrong)

    I would like to run the DESeqDataSetFromMatrix as follows:

    DESeqDataSetFromMatrix(countData = countdata, colData = coldata, design = ~ treatment, tidy = FALSE, ignoreRank = FALSE)

    which returns this error message:
    Error in DESeqDataSet(se, design = design, ignoreRank) : some values in assay are negative

    Indeed, there seem to be values in my "countdata" object that are somehow classified as negative:

    countdata["" < 0] omitted 1280373 entries, which look like this:

    [1] " 0" " 0" " 0" " 0" " 5" " 0" " 26" " 104" " 10" " 24"
    [11] " 22" " 3" " 22" " 0" " 226" " 0" " 152" " 2" " 153" " 178"
    [21] " 0" " 2" " 427" " 153" " 0" " 475" " 0" " 0" " 16" " 101"
    [31] " 78" " 26" " 71" " 372" " 35" " 17" " 108" " 100" " 43" " 0"

    I have no ideas where that comes from. I couldn't find any negative, empty or NA cells in my count data file, nor are there any spaces in the cells.

    Does anyone have a solution, or an idea on what went wrong?

    Any help is highly appreciated,

    Thanks so much!

  • #2
    It looks like you have an extra space in front of all of your numbers and that's screwing everything up. Fix how the values are imported and ensure they're actually numbers and not strings.

    Comment


    • #3
      I'm not so familiar with the stringtie pipeline, but I recommend avoiding Excel for most NGS related analyses (see Zeeberg et al. 2004: Mistaken Identifiers: Gene name errors can be introduced inadvertently when using Excel in bioinformatics)

      Can you use the python script to get simple csv/tsv output?
      [Update]
      The prepDE.py script produces csv files. Import these directly R; any selection and computation you've done with Excel can be done there as well.
      Last edited by Michael.Ante; 11-29-2016, 12:15 AM.

      Comment


      • #4
        I have double checked and there is no extra space in each of my cells,
        that is actually the reason I later saved this file as excel.

        The python script gives me the gene counts in csv format, I have of course tried that too and it gives the same error.

        Using the same file in edgeR for example works without issues.

        Comment


        • #5
          Try as a first solution:
          countdata <- as.matrix(read_excel("DEseqcounts.xlsx"),header=TRUE, row.names=1)

          And check then
          summary(is.numeric(countdata[,1]))

          Maybe there are some empty lines at the end, which lead to the fact that R is reading it as factors rather than numbers. This can be checked by tail(countdata) .

          Comment


          • #6
            The class of countdata[,1] is "character"

            summary(is.numeric(countdata[,1]))
            Mode FALSE NA's
            logical 1 0

            class(countdata[,1])
            [1] "character"

            That should be the issue I guess?

            Thanks for your help!

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Understanding Genetic Influence on Infectious Disease
              by seqadmin




              During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

              Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
              09-09-2024, 10:59 AM
            • seqadmin
              Addressing Off-Target Effects in CRISPR Technologies
              by seqadmin






              The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...
              08-27-2024, 04:44 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Today, 06:25 AM
            0 responses
            13 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 01:02 PM
            0 responses
            12 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 09-18-2024, 06:39 AM
            0 responses
            14 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 09-11-2024, 02:44 PM
            0 responses
            14 views
            0 likes
            Last Post seqadmin  
            Working...
            X