Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ECHo
    Member
    • Jan 2010
    • 17

    edgeR

    I've read and used the DEGseq R-package.
    And edgeR seems to be complement to DEGseq package.

    But while manipulating with the edgeR manual, I insert some data into DGEList.
    However, the counts are shown without the automatic counting library size(lib.size would be NA).

    Does anyone know why it is?
    My R version is 2.12.0.

    Thank you.
  • ECHo
    Member
    • Jan 2010
    • 17

    #2
    haha
    I've found out the reason...
    Some data include missing values.

    Comment

    • ECHo
      Member
      • Jan 2010
      • 17

      #3
      Well, still one question:
      When I want to plot the MDS, I'd like to use the following command:
      plotMDS.dge(d, xlim=c(-2,1));
      d is a DGE object

      However, the R system always shows the following:
      Error in if (mx < tol) { : missing value where TRUE/FALSE needed
      Error during wrapup: cannot open the connection

      Do you guys have this kind of questions?
      How could I solve the problem?


      Thanks!

      Comment

      • colindaven
        Senior Member
        • Oct 2008
        • 417

        #4
        Hmm, this worked fine for me in the last few weeks despite the fact I'm not an edgeR expert.

        Perhaps you still have a problem with missing values, try taking a small high quality subset of your data and retrying with that.

        Comment

        • carmeyeii
          Senior Member
          • Mar 2011
          • 137

          #5
          Hi all,

          I am having a similar problem to this and was wondering if any one might have come across this before:

          I get the message Error in if (mx < tol) { : missing value where TRUE/FALSE needed when I run the command EstimateCommonDisp(y) .


          > mutant_control= x[,c(1,5,9,6,12,14)]
          > group <- factor(c(1,1,1,2,2,2))
          > y <- DGEList(counts=mutant_control, group=group)
          Calculating library sizes from column totals.
          > y <- estimateCommonDisp(y)
          Error in if (mx < tol) { : missing value where TRUE/FALSE needed

          > head(mutant_control)
          27 31 35 32 38 40
          128up 100.85404 94.66619 87.78034 101.9768 91.39150 85.91481
          14-3-3epsilon 9061.95160 9391.45480 9106.62168 9604.3740 9952.53064 9667.63616
          14-3-3zeta 7959.80739 8169.34580 8478.59387 8434.7244 7926.26723 8587.06141
          140up 19.50291 22.34962 14.74578 15.2824 19.61044 14.21309
          18w 88.16118 113.38107 97.86222 115.4046 120.79999 125.11319
          26-29-p 288.60969 274.10267 262.37095 275.9005 283.34272 296.14799

          > tail(mutant_control)
          27 31 35 32 38 40
          zip 2317.423662 2690.28298 2746.989546 2960.364282 2897.5624980 2985.039414
          zormin 324.178816 270.25428 350.734099 337.749747 370.9414048 304.788741
          zpg 0.000000 0.00000 0.000000 0.000000 0.0000000 0.000000
          zuc 3.015593 1.21086 1.031125 5.638336 2.6222360 4.258978
          zwilch 30.068800 28.26996 25.578376 27.846503 25.5089142 27.275533
          zye 1.292230 0.00000 0.000000 1.486839 0.8172129 0.000000

          Could it be because of the 0.0000 values?

          Thanks a lot,

          Carmen

          Comment

          • carmeyeii
            Senior Member
            • Mar 2011
            • 137

            #6
            So i think I figured it out and it has to do with the function expecting integers and not real numbers. If you just round your counts matrix everything will run smoothly.

            Cheers!
            Carmen

            Comment

            • Simon Anders
              Senior Member
              • Feb 2010
              • 995

              #7
              Yes, it runs smoothly but it won't give you correct results. There is a reason that edgeR and DESeq want integer values, namely that you are supposed to supply a table which, for each gene and each sample, tells the number of reads that map to the gene.

              How can 2317.423662 reads map to gene 'zip'?

              Comment

              • carmeyeii
                Senior Member
                • Mar 2011
                • 137

                #8
                Thank you Simon. I was missing something fundamental about edgeR.

                Comment

                • earonesty
                  Member
                  • Mar 2011
                  • 52

                  #9
                  integers

                  edgeR expects integers, but many programs use estimation functions to improve transcript counts... ie: non integers. So you need to round.

                  Comment

                  • Simon Anders
                    Senior Member
                    • Feb 2010
                    • 995

                    #10
                    Sigh.

                    No, you should not round. If you do not have integer counts, your input is not suitable for these tools. This is why they insist that you give them integer counts.

                    Of course, you can trick them into using your unsuitable data by rounding but than you will not get a reliable result. Please only use statistical methods off-label if you know what you are doing.

                    Comment

                    • lshen
                      Member
                      • Jan 2008
                      • 30

                      #11
                      I compared the HTseq derived counts, and the rounded counts from cuffdiff v 2.1.1 (released last week).


                      I run 2-group edgeR, 3 rep. in control and 4 rep in cases.


                      DEGs at FDR 0.05:

                      HTseq derived counts: 475

                      rounded counts from cuffdiff v 2.1.1: 441

                      Overlap: 398.

                      In addition, 439 of the 475 htseq DEGs are of FDR <=0.1 in the results from rounded counts from cuffdiff v 2.1.1.

                      So, maybe using rounded counts data is acceptable in final results even though not strictly following edgeR assumptions?


                      Checking a few replicated (attached plot, r= 0.99 using transformation log(x)+1 ), there are some genes showing very different counts in htseq. Many of them are very short miRNAs thus missed by cufflinks ( counts=0).

                      Click image for larger version

Name:	count.cuffdiff.vs.htseq.ensgene71.PEN.png
Views:	1
Size:	16.1 KB
ID:	304138
                      Last edited by lshen; 04-18-2013, 11:56 AM. Reason: Enhancement content

                      Comment

                      • Simon Anders
                        Senior Member
                        • Feb 2010
                        • 995

                        #12
                        Sure, if the values you obtain by rounding the output of cufflinks happen to be close to the correct values, there is a good chance that the result won't be that different, either.

                        But why would you do that when it is no more difficult to get the correct values in the first place?

                        This willingness of amassing many minor inaccuracies despite better knowledge is common in bioinformatics, but it is still sloppy science.


                        And, with all due respect: If the instructions for a statistical method state very clearly and explicitly that the method requires a certain kind of data as input advises against using the method on other data, and even gives a clear reason, founded on statistical theory, for that -- are you really that confident in you knowledge of advanced statistics that you think you know better?

                        Comment

                        • lshen
                          Member
                          • Jan 2008
                          • 30

                          #13
                          I provide bioinformatics analysis services, and have people talking about using cufflinks counts directly. So I want to take a checking of it in addition to telling them assumptions that you emphasized many times.

                          I used pipleines of htseq count and edgeR/DESeq. And we trusted this combination more than FPKM-based results. But it relies on known gene annotations, whereas cufflinks can do de novo predictions. So I look for the non-expression tests of it (promoter, splicing), and using count based method for expression analysis.

                          Comment

                          • Simon Anders
                            Senior Member
                            • Feb 2010
                            • 995

                            #14
                            Sorry for the harsh tone, which was more directed at post #9.

                            I am simply getting tired from getting asked the same stuff over and over again -- and way too often, I meet this attitude that as soon as a program runs through without throwing an error, the result must be right, no matter what one has done before.

                            Comment

                            Latest Articles

                            Collapse

                            • SEQadmin2
                              From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                              by SEQadmin2


                              Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                              The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                              ...
                              06-02-2026, 10:05 AM
                            • SEQadmin2
                              Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                              by SEQadmin2


                              With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                              Introduction

                              Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                              05-22-2026, 06:42 AM
                            • SEQadmin2
                              Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                              by SEQadmin2

                              Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                              Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                              05-06-2026, 09:04 AM

                            ad_right_rmr

                            Collapse

                            News

                            Collapse

                            Topics Statistics Last Post
                            Started by SEQadmin2, 06-02-2026, 12:03 PM
                            0 responses
                            21 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 06-02-2026, 11:40 AM
                            0 responses
                            14 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 05-28-2026, 11:40 AM
                            0 responses
                            29 views
                            0 reactions
                            Last Post SEQadmin2  
                            Started by SEQadmin2, 05-26-2026, 10:12 AM
                            0 responses
                            31 views
                            0 reactions
                            Last Post SEQadmin2  
                            Working...