Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq: pvalues are NA

    Hello,

    When running DESeq, we get some genes with p-values that are NA (See below). They're probably supposed to be zero or really close to zero. How can we fix this?

    happygrad:~/work/kristen/DEseqTroubleshooting> head -n 1 DE_genes_Reps6_allres.txt
    id baseMean baseMeanA baseMeanB foldChange log2FoldChange pval padj adjLog2FoldChange
    happygrad:~/work/kristen/DEseqTroubleshooting> grep NA DE_genes_Reps6_allres.txt
    NM_001012669 13245.90643904 1967.80257160171 24524.0103064784 12.4626375940331 3.63953752770609 NA NA 3.63886339029247
    NM_001024569 1861.17241176578 0.626705575521935 3721.71811795603 5938.54317453056 12.5358933415034 NA NA 11.1601875099376
    NM_001034039 5972.12221805081 11639.9873990581 304.257037043527 0.0261389490050605 -5.25765505536172 NA NA -5.2530450700836

    We're running version 2.15.0
    Platform = x86_64-pc-linux-gnu (64 bit)

    These are the commands we're using:
    cds <- newCountDataSet(countsTable, cond)
    cds <- estimateSizeFactors(cds)
    cds <- estimateDispersions(cds)
    res <- nbinomTest(cds, "1", "2")

    Thanks,
    Danielle

  • #2
    Hi Danielle

    so, only 3 genes are affected? Anything different about the counts for these three genes compared to all the others? Maybe post an excerpt from the count table and from fData(cds).

    Simon

    Comment


    • #3
      Hi Simon,
      I am the grad student working on this project. There are actually between 18-91 entries that are returning p-value = NA, depending on which data set we are working with. I have assembled a tarball with the script we are using and subset of mixed entries some of which give a p-val that is NA and some which are normal. Session information, input and output files are included as well. Take a look and let me know what you think or if you need any more info from me.

      Thanks,
      Kristen
      Attached Files
      Last edited by kristenbeck527; 07-02-2012, 01:07 PM.

      Comment


      • #4
        Hi Kristen

        I've had a look at the data now. (Sorry that it took a while.)

        This does not look like RNA-Seq data. You only have 94 genes; this is only barely enough to fit a variance-mean relation.

        Furthermore, your data is sub-Poissonean: the variance is smaller than the mean for more than half of the genes. This is impossible even with technical replicates. You must have done something wrong when obtaining your counts.

        I'd say DESeq was justified in giving strange results here. You will need to tell me more about how the data was obtained before I can give further advice.

        Comment


        • #5
          Dear Simon,

          Thank you for looking at our data. Apologies, we thought it would be easier for you if we gave you a piece of the data set showing the problem, rather than a full data set. Kristen can send a full data set.

          Your detective work is spot-on, because these are data were obtained by "simulating" technical replicates based on a Poisson distribution.... so, some of the genes will have a variance smaller than the mean.

          The NA values are troubling because these are genes with VERY different abundances, so it should be a slam-dunk for DESeq to detect them as differentially expressed.

          Thanks,
          Danielle

          Comment


          • #6
            The p values were not "NA" but "NaN", which usually results from a division of zero by zero. Here, it happened because the floating point machine precision turns out to be insufficient in case of extremely low dispersions combined with large means. I guess this can be fixed, but it's maybe not worth the effort because essentially zero dispersions do not happen in practice. (Even between technical replicates, you will usually find dispersion values above 10^-6.)

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-25-2024, 11:49 AM
            0 responses
            19 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-24-2024, 08:47 AM
            0 responses
            17 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            62 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            60 views
            0 likes
            Last Post seqadmin  
            Working...
            X