Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • idyll_ty
    Junior Member
    • Nov 2011
    • 5

    DESeq: "NA" generated in the resulted differentially expressed genes

    I am using DESeq to analyze my RNA-seq data. However, I found in my generated differentially expressed genes there were a a bunch of "NA". Please see the attached table for details. The number of those "NA" genes is different for different comparisons.

    id baseMean baseMeanA baseMeanB foldChange log2FoldChange pval padj resVarA resVarB
    NA NA NA NA NA NA NA NA NA NA NA
    NA.1 NA NA NA NA NA NA NA NA NA NA
    616 GPR128 187.5648803 0 234.4561004 Inf Inf 1.19E-15 1.16E-12 0 19.90527498


    Is there anyone else who experience this before? What could be the problem? Thanks.
  • marcowanger
    Senior Member
    • Dec 2008
    • 273

    #2
    Originally posted by idyll_ty View Post
    I am using DESeq to analyze my RNA-seq data. However, I found in my generated differentially expressed genes there were a a bunch of "NA". Please see the attached table for details. The number of those "NA" genes is different for different comparisons.

    id baseMean baseMeanA baseMeanB foldChange log2FoldChange pval padj resVarA resVarB
    NA NA NA NA NA NA NA NA NA NA NA
    NA.1 NA NA NA NA NA NA NA NA NA NA
    616 GPR128 187.5648803 0 234.4561004 Inf Inf 1.19E-15 1.16E-12 0 19.90527498


    Is there anyone else who experience this before? What could be the problem? Thanks.
    Did you check the the same set of geneID used for read-counting is identical for every samples (gene with no read : 0)?
    Marco

    Comment

    • idyll_ty
      Junior Member
      • Nov 2011
      • 5

      #3
      Yes, the genes names are consistent.

      I find the problem. Because in my input read count data, for some genes, there are no reads mapped at all, and those genes cause NA values in the results.

      Comment

      • marcowanger
        Senior Member
        • Dec 2008
        • 273

        #4
        specifically, you mean there is no reads mapped at all.

        Do you mean for a majority of samples within 1 group has no reads at all, and thus shows "0"?

        Visually, suppose there are 2 samples per group

        [group1] sample 1: 0, sample 2: 0
        [group2] sample 1: 1, sample 2: 2

        ?
        Marco

        Comment

        • marcowanger
          Senior Member
          • Dec 2008
          • 273

          #5
          Originally posted by marcowanger View Post
          specifically, you mean there is no reads mapped at all.

          Do you mean for a majority of samples within 1 group has no reads at all, and thus shows "0"?

          Visually, suppose there are 2 samples per group

          [group1] sample 1: 0, sample 2: 0
          [group2] sample 1: 1, sample 2: 2

          ?
          idyll_ty, have you checked your data??
          Marco

          Comment

          • Simon Anders
            Senior Member
            • Feb 2010
            • 995

            #6
            Typically, such entries appear in R when subsetting with a conditional expression that may contain or result in NA. Please post your full R code (and the output of sessionInfo()), the we can have a look.

            Comment

            • fatakias
              Member
              • Apr 2011
              • 11

              #7
              Dear DESeq experts,

              Apologies for continuing this thread - I have an identical problem.

              I try to identify Differentially Expressed Genes (DEG) from a known dataset. I am trying to understand why over 80% of entries with 'NA' values are obtained extracted from counts table as obtained via DESeq_1.8.2. I have seen similar queries in the forum but I believe I am using the latest release of DESeq that is not a development version. However, if this issue is fixed in an updated version please let us know and how do we load that library in R? Thanks.
              Best,
              sarosh


              A two step approach to my workflow is as follows:


              Part_1- extract dataset (pasilla dataset)
              Part_2- use DESeq library calls to identify DEGs and show the 'NA' values.

              This dataset (countstable.txt) has
              14,470 entries with count information, of which
              ~2,500 entries with count information 0 for all case replicates


              Code:
              ################################################
              #
              #Part_1- extract a dataset
              #
              
              rm(list = ls());
              
              #require(DESeq);
              require(pasilla);
              
              data("pasillaGenes");
              
              head(counts(pasillaGenes));
              
              #save_data to view and contrast
              write.table(counts(pasillaGenes), file="countstable.txt", quote=FALSE, sep="  ", row.names=TRUE);

              ################################################

              #edit countstable.txt - remove header
              #count the number of entries with all counts 0
              # (use grep command ..)
              #start R again


              Code:
              ################################################
              ################################################
              #
              #Part_2
              
              require(DESeq);
              require(pasilla);
              
              countsTable <-read.table("countstable.txt", header=TRUE, stringsAsFactors=TRUE)
              rownames( countsTable ) <- countsTable$gene
              countsTable <- countsTable[,-1]
              conds=c("U","U","U","U","T","T","T");
              
              cds <- newCountDataSet( countsTable, conds);
              cds <-estimateSizeFactors(cds);
              
              #normcds <- counts( cds, normalized=TRUE );
              #write.table(normcds, file="normalized.countstable.txt", quote=FALSE, sep="\t", row.names=TRUE);
              
              cds <- estimateDispersions( cds, sharingMode="fit-only" );
              res <- nbinomTest(cds, "U","T");
              
              resSig <- res[ res$padj < 0.05,];
              resSig <- resSig[ order(resSig$pval), ];
              write.table(resSig, file="DEGsig_list.txt", quote=FALSE, sep="\t", row.names=FALSE);
              
              #############################################
              Final list of DEG has a large majority of NA entries.

              Comment

              • chadn737
                Senior Member
                • Jan 2009
                • 392

                #8
                When you make the resSig it keeps the same number of lines that were in res and just writes NA for all those lines that did not meet the cutoff. The presence of NA in these lines is not a problem.

                To get rid of them, just use the na.omit function:

                Code:
                resSig<-na.omit(resSig)
                This will omit all lines that have an NA, leaving you only those lines with differentially expressed genes.

                Comment

                • sdriscoll
                  I like code
                  • Sep 2009
                  • 436

                  #9
                  i usually trim out zero count genes (across all samples) before calling newCountDataSet like so...

                  Code:
                  mycounts <- mycounts[rowSums(mycounts) > 0,]
                  /* Shawn Driscoll, Gene Expression Laboratory, Pfaff
                  Salk Institute for Biological Studies, La Jolla, CA, USA */

                  Comment

                  Latest Articles

                  Collapse

                  • SEQadmin2
                    From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                    by SEQadmin2


                    Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                    The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                    ...
                    Yesterday, 10:05 AM
                  • SEQadmin2
                    Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                    by SEQadmin2


                    With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                    Introduction

                    Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                    05-22-2026, 06:42 AM
                  • SEQadmin2
                    Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                    by SEQadmin2

                    Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                    Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                    05-06-2026, 09:04 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, Yesterday, 12:03 PM
                  0 responses
                  19 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, Yesterday, 11:40 AM
                  0 responses
                  14 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 05-28-2026, 11:40 AM
                  0 responses
                  29 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 05-26-2026, 10:12 AM
                  0 responses
                  31 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...