Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • jgibbons1
    Senior Member
    • Oct 2009
    • 135

    Contingency tests in R, Error with large numbers

    I have been struggling to figure out how to fix this error, and I thought why not try the seqanswers community. I am fairly new to R though, so please forgive me if this is a fairly easy solution.

    I am trying to perform multiple Fisher's Exact tests or Pearson's Chi-squared contingency tests from a datamatrix in which data from each row is data for an independent test.

    My data is formatted as such:

    AAA 75533 4756922556 88210 6715122129
    BBB 14869 4756983220 16384 6715193955
    CCC 7230 4756990859 8559 6715201780
    DDD 18332 4756979757 23336 6715187003
    EEE 14733 4756983356 16826 6715193513
    FFF 2918 4756995171 3433 6715206906
    GGG 3726 4756994363 4038 6715206301
    HHH 6196 4756991893 7011 6715203328
    III 7925 4756990164 9130 6715201209
    JJJ 1434 4756996655 1602 6715208737
    Where the 1st column is the identifier, the 2nd column = observations 1, the 3rd column = background counts 1, the 4th column = observations 2 and the 5th column = background counts 2.

    I am loading my data like this:

    > data=read.table("My.File", header=FALSE)
    And I am looping through each row to perform a test like this:

    > pvalues=c("pvalue")
    > for(i in 1:10){
    + datamatrix=matrix(c(as.integer(data[i,2:5])),nrow=2)
    + fisherresult=fisher.test(datamatrix)
    + pvalues=cbind(pvalues,fisherresult[1])
    + }
    Here is the Error I am Getting:

    Error in fisher.test(datamatrix) :
    all entries of 'x' must be nonnegative and finite
    In addition: Warning messages:
    1: In matrix(c(as.integer(data[i, 2:5])), nrow = 2) :
    NAs introduced by coercion
    2: In matrix(c(as.integer(data[i, 2:5])), nrow = 2) :
    NAs introduced by coercion
    When I replace the large number in the 3rd and 5th column with smaller numbers, the statistical calculation works fine.

    Any ideas? Any help would be GREATLY appreciated!
  • mastal
    Senior Member
    • Mar 2009
    • 666

    #2
    Contingency tests in R, Error with large numbers

    Unless your columns are in the wrong order, in the data sample you've shown,
    the background counts are way higher than the observed counts.

    You could also try posting this on the R/Bioconducor mailing list:

    Comment

    • jgibbons1
      Senior Member
      • Oct 2009
      • 135

      #3
      Hi mastal,
      Yes, that is correct. The background frequencies are much larger than the observed frequencies.

      Thanks for your suggestion. I will try posting to the R/Bioconducor mailing list.

      Comment

      • dpryan
        Devon Ryan
        • Jul 2011
        • 3478

        #4
        This is because you have integers larger than 2^32. If you look at help(as.integer), you'll find that it doesn't support numbers over 2*10^9.

        Comment

        • jgibbons1
          Senior Member
          • Oct 2009
          • 135

          #5
          Thanks dpryan,
          I just ran into this answer myself from the following post:



          Hmmm...is there a way to change this, I wonder?

          Comment

          • dpryan
            Devon Ryan
            • Jul 2011
            • 3478

            #6
            Not to my knowledge, though I'm looking further into this in case I ever run into this (I follow the bioconductor email list to, so hopefully someone will reply with a good solution). There's some limited support in the int64 package, but I think you would otherwise have to recompile R and change the default size of int with a compiler switch (you can run into similar problems if you try to do svd on large datasets, since matrix indexing is still 32bit on some levels).

            Comment

            • jgibbons1
              Senior Member
              • Oct 2009
              • 135

              #7
              I appreciate your insight.

              This is pretty frustrating. Unfortunately, I am really at a standstill until I can figure out how to generate these p-values. I have about a hundred million of tests to perform and was going to break down the jobs into batches of about 100,00 tests.

              Outside of R, do you happen to know of any other solutions? For example, I ran a couple tests in JMP, which worked fine (and thus apparently has a larger integer limit).

              Comment

              • dpryan
                Devon Ryan
                • Jul 2011
                • 3478

                #8
                Not if you want anything close to user friendly, at least. There are open source algorithms (available from netlib, which is what R actually uses) that you can more easily recompile to make int a 64-bit integer by default. You can then write a "simple" wrapper program to parse you dataset and run through the statistics. I've had to do this with other functions in R that are limited by the 32bit issue (in my case it wasn't how big the numbers were, but that I was dealing with matrices that were too big to be used in the underlying BLAS algorithms). You probably have to compile BLAS in the same fashion, depending on how the algorithm works (there's a link to the algorithm if you type help(fisher.test) in R). If you're familiar with programming and compilation, this is pretty doable, but I expect it can become really daunting if not. You also need to then do a couple spot-checks just to make sure that nothing is getting screwed up in the process (since JMP seems to work, I guess you could use that).

                If no one else comes up with something better and your programming knowledge isn't sufficient for this method, you can shoot me a PM and I can (hopefully) walk you through how to go about this via email (I assume that this would be off-topic for this forum).

                I hope that R will transition to 64bit integers at some point, but that won't be a quick process.

                Comment

                • jgibbons1
                  Senior Member
                  • Oct 2009
                  • 135

                  #9
                  Thanks again for your help. Unfortunately, I am not much of a programmer. What I think I'll do is recruit a collaborater from my institution to see if they can help come up with a solution.

                  I will update this thread once we've come up with a solution.

                  Many thanks again.

                  Comment

                  Latest Articles

                  Collapse

                  • GATTACAT
                    Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                    by GATTACAT
                    Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                    07-01-2026, 11:43 AM
                  • SEQadmin2
                    Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                    by SEQadmin2


                    I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                    Here are nine questions we think about, in roughly the order they matter, before...
                    06-18-2026, 07:11 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, Yesterday, 11:08 AM
                  0 responses
                  7 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-30-2026, 05:37 AM
                  0 responses
                  11 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-26-2026, 11:10 AM
                  0 responses
                  19 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-17-2026, 06:09 AM
                  0 responses
                  53 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...