Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • vplagnol
    Member
    • Sep 2008
    • 13

    plotDEXseq failure- all log2fold values set to NA or NaN

    Dear all,

    I am having issues with my DEXSeq analysis. The symptom appears with plotDEXSeq but I think this is a deeper issue with the underlying computation. The error message I am getting is with plotDEXSeq is:

    Error in plot.window(xlim = c(0, 1), ylim = c(0, max(matr))) :
    need finite 'ylim' values

    The issue is that in my Exon Count set object the log2fold values are all either NA or NaN:
    head(featureData(DexSeqExons.loc)@data$log2fold)
    [1] NaN NaN NaN NaN NaN NaN

    I suspect, but I am not sure, that the following warning points to the problem:
    In fitDispersionFunction(DexSeqExons.loc) :
    Negative intercept value in the dispersion function, it will be set to 0. Check fit diagnostics plot section from the vignette.

    Indeed I see:
    > DexSeqExons.loc@dispFitCoefs
    (Intercept) I(1/means[good])
    0.0000000 0.9973345

    which does not seem right. I can somehow get around the error by changing the function fitDispersionFunction and set
    coefs[1] <- 0.001
    instead of
    coefs[1] <- 0
    when the estimated coefficient is negative. But tinkering with the underlying numerical computations hardly seems like a good thing to do.


    I had never seen such an issue before, and I had been working on the same dataset. It is not clear to me what I may have changed that caused this odd behaviour but I cannot find a way to fix it. There could be something wrong with my gff file which I changed recently but I don't think it is very likely.

    Has anyone experienced a similar issue before? Any tip to fix the problem would be very welcome,

    Thanks,

    Vincent
    Last edited by vplagnol; 01-17-2013, 05:19 PM.
  • Simon Anders
    Senior Member
    • Feb 2010
    • 995

    #2
    Can you post the plot of dispersion versus mean. Use, e.g.

    Code:
    plot( 
       rowMeans( counts( ecs, normalized=TRUE ) ), 
       fData(ecs)$dispBeforeSharing, 
       log="xy", pch=19, cex=.3, col="#00000040" )

    Comment

    • vplagnol
      Member
      • Sep 2008
      • 13

      #3
      Thank you for your quick reply Simon.

      So if I don't use my "fudge", the coefficients are:
      (Intercept) I(1/means[good])
      0.0000000 0.9973345

      I attach my mean vs dispersion plot, which hopefully will make sense to you.
      Note that this is only for a small subset of 300 exons which I use for debugging but the same exact thing happens for the full genome-wide set.
      Attached Files

      Comment

      • Simon Anders
        Senior Member
        • Feb 2010
        • 995

        #4
        This plot seems perfectly fine to me. Is the red line now done with the original or the fudged coefficients? The intecept (y value of the red line where x=exp(0)=1) looks positive and correct. In any case, it seem to trace the point well, so it should be correct to continue the calculation with this fit.

        By the way, you have called the estimateLog2FoldChanges function, have you? Otherwise, it would be no wonder that you don't have fold change estimates.

        Comment

        • vplagnol
          Member
          • Sep 2008
          • 13

          #5
          Argh sorry... this plot was in fact using my fudge. Just tired I suppose. Now here is the problematic plot, I think you will see the issue.

          And here are the messed up coefficients for the fit:
          > DexSeqExons.loc@dispFitCoefs
          (Intercept) I(1/means[good])
          -0.0001751386 0.9973344545

          Basically my understanding is that the first iteration results in a negative intercept. If I set it back to a positive value and let the algorithm iterate, I get the reasonable graph you saw above. But as it stands the fitting gives up right away as soon as such a negative intercept occurs.
          Attached Files

          Comment

          • Simon Anders
            Senior Member
            • Feb 2010
            • 995

            #6
            It's annoying that these negative intercepts happen, and we need to figure out how to get rid of them. Your fudge, however, seems to do the trick for your data, because with it, the fit looks reasonable.

            After you have modified the dispersion coefficients, use the following code to recalculate the dispersion values from it:

            Code:
            fData(ecs)$dispFitted <- ecs@dispFitCoefs[1] + 
               ecs@dispFitCoefs[2] / colMeans( t(counts(ecs)) / sizeFactors(ecs) )
            fData(ecs)$dispersion <- pmin( pmax( fData(ecs)$dispBeforeSharing, 
               fData(ecs)$dispFitted, na.rm = TRUE), 1e+08)
            (These are the last two lines from 'fitDispersionFunction'.)

            Comment

            • vplagnol
              Member
              • Sep 2008
              • 13

              #7
              OK thanks, this is what I needed to hear.

              For what it's worth my solution is not very elegant but because I remove the "break" that occurs after the negative coefficients, the fit continues with more iterations and I do not think I need these extra 2 lines. In my situation the first iteration gives negative values, but resetting the starting coefficients after that seems to converge toward proper parameter values after that. This solves that dataset a least.

              See below my modification to your fitDispersion function, again no claim to have a fix, just a rough idea:

              if (coefs[1] < 0) {
              coefs[1] <- 0.001
              warning("Negative intercept value in the dispersion function, it will be set to 0. Check fit diagnostics plot section from the vignette.")
              #break
              }

              Comment

              Latest Articles

              Collapse

              • GATTACAT
                Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by GATTACAT
                Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                07-01-2026, 11:43 AM
              • SEQadmin2
                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by SEQadmin2


                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                Here are nine questions we think about, in roughly the order they matter, before...
                06-18-2026, 07:11 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, 07-02-2026, 11:08 AM
              0 responses
              11 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-30-2026, 05:37 AM
              0 responses
              14 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-26-2026, 11:10 AM
              0 responses
              20 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-17-2026, 06:09 AM
              0 responses
              54 views
              0 reactions
              Last Post SEQadmin2  
              Working...