Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DESeq: pvalues are NA

    Hello,

    When running DESeq, we get some genes with p-values that are NA (See below). They're probably supposed to be zero or really close to zero. How can we fix this?

    happygrad:~/work/kristen/DEseqTroubleshooting> head -n 1 DE_genes_Reps6_allres.txt
    id baseMean baseMeanA baseMeanB foldChange log2FoldChange pval padj adjLog2FoldChange
    happygrad:~/work/kristen/DEseqTroubleshooting> grep NA DE_genes_Reps6_allres.txt
    NM_001012669 13245.90643904 1967.80257160171 24524.0103064784 12.4626375940331 3.63953752770609 NA NA 3.63886339029247
    NM_001024569 1861.17241176578 0.626705575521935 3721.71811795603 5938.54317453056 12.5358933415034 NA NA 11.1601875099376
    NM_001034039 5972.12221805081 11639.9873990581 304.257037043527 0.0261389490050605 -5.25765505536172 NA NA -5.2530450700836

    We're running version 2.15.0
    Platform = x86_64-pc-linux-gnu (64 bit)

    These are the commands we're using:
    cds <- newCountDataSet(countsTable, cond)
    cds <- estimateSizeFactors(cds)
    cds <- estimateDispersions(cds)
    res <- nbinomTest(cds, "1", "2")

    Thanks,
    Danielle

  • #2
    Hi Danielle

    so, only 3 genes are affected? Anything different about the counts for these three genes compared to all the others? Maybe post an excerpt from the count table and from fData(cds).

    Simon

    Comment


    • #3
      Hi Simon,
      I am the grad student working on this project. There are actually between 18-91 entries that are returning p-value = NA, depending on which data set we are working with. I have assembled a tarball with the script we are using and subset of mixed entries some of which give a p-val that is NA and some which are normal. Session information, input and output files are included as well. Take a look and let me know what you think or if you need any more info from me.

      Thanks,
      Kristen
      Attached Files
      Last edited by kristenbeck527; 07-02-2012, 01:07 PM.

      Comment


      • #4
        Hi Kristen

        I've had a look at the data now. (Sorry that it took a while.)

        This does not look like RNA-Seq data. You only have 94 genes; this is only barely enough to fit a variance-mean relation.

        Furthermore, your data is sub-Poissonean: the variance is smaller than the mean for more than half of the genes. This is impossible even with technical replicates. You must have done something wrong when obtaining your counts.

        I'd say DESeq was justified in giving strange results here. You will need to tell me more about how the data was obtained before I can give further advice.

        Comment


        • #5
          Dear Simon,

          Thank you for looking at our data. Apologies, we thought it would be easier for you if we gave you a piece of the data set showing the problem, rather than a full data set. Kristen can send a full data set.

          Your detective work is spot-on, because these are data were obtained by "simulating" technical replicates based on a Poisson distribution.... so, some of the genes will have a variance smaller than the mean.

          The NA values are troubling because these are genes with VERY different abundances, so it should be a slam-dunk for DESeq to detect them as differentially expressed.

          Thanks,
          Danielle

          Comment


          • #6
            The p values were not "NA" but "NaN", which usually results from a division of zero by zero. Here, it happened because the floating point machine precision turns out to be insufficient in case of extremely low dispersions combined with large means. I guess this can be fixed, but it's maybe not worth the effort because essentially zero dispersions do not happen in practice. (Even between technical replicates, you will usually find dispersion values above 10^-6.)

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Recent Advances in Sequencing Analysis Tools
              by seqadmin


              The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
              05-06-2024, 07:48 AM
            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Today, 06:35 AM
            0 responses
            12 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 02:46 PM
            0 responses
            18 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-07-2024, 06:57 AM
            0 responses
            17 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-06-2024, 07:17 AM
            0 responses
            18 views
            0 likes
            Last Post seqadmin  
            Working...
            X