Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ccard28
    Member
    • Jan 2012
    • 20

    DESeq cds error; help needed

    Hello,

    I am trying to do some differential expression work on my two Illumina datasets. I only have experience with tophat => cufflinks (mostly through galaxy but some command line) so my computing skills are at a minimum, especially in R. I am trying to run the DESeq package in R but an having trouble with the creating the cds to do analysis through DESeq.

    I made a raw count file using htseqcount and used copy and paste to put them in excel with 3 columns: the genes, sample A counts, sample B counts. My initial steps in R for reading the file seem to be working but I get an error "not an integer: missing value where TRUE/FALSE needed" when making the cds.

    Thank you in advance for any help in correcting my errors. My R work is as follows after loading the DESeq library:

    > countTable <- read.csv( "~/Desktop/mergedcounts.csv", header=TRUE, row.names=1)
    > head(countTable)
    A B
    20ALPHA-HSD 0 0
    A1BG 0 0
    A2M 0 0
    A2ML1 0 21
    A4GNT 0 0
    AAAS 0 1
    > conds <- factor( c( "highfert", "lowfert" ) )
    > conds
    [1] highfert lowfert
    Levels: highfert lowfert
    > cds <- newCountDataSet( countTable, conds )
    Error in if (any(round(countData) != countData)) stop("The countData is not integer.") :
    missing value where TRUE/FALSE needed
  • Wolfgang Huber
    Senior Member
    • Aug 2009
    • 109

    #2
    Hi ccard28

    you need to make sure that your 'countTable' is a data.frame whose columns are numeric variables of storage class 'integer', and contain no NA (and no negative) values. It might be necessary to read a basic R intro to familiarize yourself with these concepts.

    To trouble-shoot, you could try (not tested):

    sapply(countTable, function(x) which(is.na(x)))

    Best wishes
    Wolfgang
    Wolfgang Huber
    EMBL

    Comment

    • ccard28
      Member
      • Jan 2012
      • 20

      #3
      I feel that my data table does follow the correct parameters. From my understanding of the commands i used thus far:

      countTable <- read.csv( "~/Desktop/mergedcounts.csv", header=TRUE, row.names=1)

      This is creating the data.frame file that is needed. This command reads my table and calls the first row in which in my excel.csv I have A and B for column headers for my 2 samples. The row.names=1 should be saying that column one is my row names which in my .csv file are gene names. All of my values are read counts that are all whole, positive numbers with many 0s as well so this should satisfy the integer requirement.

      If I am calling row 1 column names with "header=TRUE" and the first column the row names with "row.names=1" that leaves me with only positive whole numbers and 0s which should satisfy the integer requirement but why would I keep getting the error:
      "Error in if (any(round(countData) != countData)) stop("The countData is not integer.") : missing value where TRUE/FALSE needed" ?

      My data is printing fine in R so the table is importing correctly but I still don't understand why the error keeps occurring. Could my column header or row.names functions not be separating the letters/gene names correctly? Could it by my conditions i set up with "conds <- factor( c( "highfert", "lowfert" ) )" is messing things up when trying to create the cds?

      Creating the cds seems like it should be a simple step especially with my data apparently printing correctly within the R console when checking the countTable. Without the cds working I can't do any actual analysis within DESeq.

      I tried to read up on sapply and tried your sapply command and it did not change anything with the error and I am not entirely sure the basis for using sappily in this instance.

      Any other input would be very welcome.

      Thank You,
      ccard28

      Comment

      • dpryan
        Devon Ryan
        • Jul 2011
        • 3478

        #4
        You might just cut to the chase and:

        Code:
        which(round(countTable) != countTable)
        To see the index in (the matrix conversion of) countTable that's causing problems.

        The point of Wolfang's sapply method was to output a matrix of True/False values so you can see which cells of your table might be producing NA values. It won't actually change anything, but instead print the results to screen. You could easily find out how many of the cells are producing NA values with:

        Code:
        table(sapply(countTable, function(x) which(is.na(x))))
        You'll find a basic fluency in R to be extremely useful in bioinformatics.

        Comment

        • ccard28
          Member
          • Jan 2012
          • 20

          #5
          Thank you both very much for your input. I was able to interpret the sapply function that you both mentioned and determine the 2 rows that had missing values that were causing problems with my cds creation. Without the sapply I never would have found them amongst the thousands of rows, much appreciated.

          Comment

          Latest Articles

          Collapse

          • SEQadmin2
            Nine Things a Sample Prep Scientist Thinks About Before Sequencing
            by SEQadmin2


            I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

            Here are nine questions we think about, in roughly the order they matter, before...
            06-18-2026, 07:11 AM
          • SEQadmin2
            From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
            by SEQadmin2


            Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


            The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
            ...
            06-02-2026, 10:05 AM

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, Yesterday, 05:37 AM
          0 responses
          6 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-26-2026, 11:10 AM
          0 responses
          16 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-17-2026, 06:09 AM
          0 responses
          51 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-09-2026, 11:58 AM
          0 responses
          110 views
          0 reactions
          Last Post SEQadmin2  
          Working...