Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DEGseq VS edgeR, which one is more reliable?

    hi, there.

    i am working on the RNA seq data analysis and i both use the R package DEGseq and edgeR to obtain DEGs .however, the DEG lists i get from these two packages are not much alike.

    here is the total number of DEGs i get:
    No code has to be inserted here.and the total matched gene is about 17380(71.40%) in dataset 1 and 16000 (65.72%) in dataset 2respectively.
    and DEGs filter threshold are : FDR <=0.001, |log2 FC|>1
    i am confused now and i just want to know which one is more reasonable?
    Last edited by tianyub836; 10-09-2011, 05:04 PM.

  • #2
    If that is supposed to be a table, enclose it in [ code ] ... [ / code ] tags to preserve the spacing.

    ...or use the [ table ] tag (syntax example in this thread: http://seqanswers.com/forums/showthread.php?t=948)

    Comment


    • #3
      I think edgeR better. I have made a compare among DEGseq, DESeq and edgeR,and made venn diagrams to find the overlap, finding the DESeq and edgeR have a better overlap. So I think edgeR better.

      It depends on you!
      Wishes!
      Engineer of Data Analysis
      E-mail: [email protected]

      Comment


      • #4
        Originally posted by Chrevan View Post
        I think edgeR better. I have made a compare among DEGseq, DESeq and edgeR,and made venn diagrams to find the overlap, finding the DESeq and edgeR have a better overlap. So I think edgeR better.

        It depends on you!
        Wishes!
        thanks for you reply, Chrevan.

        i know that DEGseq based on Poisson distribution while edgeR based on negative binomial distribution. and what i want to know is that apart from the methodology, output from which pakcage is reasonable based on common sense if there was such a thing?

        Comment


        • #5
          DESeq is basically edgeR with some improvements, so if you want common sense, that seems to be the winner. Since DESeq and edgeR use the same distribution while DEGseq uses a different one, they naturally get more similar results, and that's not a sensible way to conclude that they're better. However, both of the negative-binomial methods' authors provide good evidence that DEGseq's Poisson assumption is invalid.

          Here is the DESeq paper: http://genomebiology.com/2010/11/10/R106

          Comment


          • #6
            Originally posted by jwfoley View Post
            DESeq is basically edgeR with some improvements, so if you want common sense, that seems to be the winner. Since DESeq and edgeR use the same distribution while DEGseq uses a different one, they naturally get more similar results, and that's not a sensible way to conclude that they're better. However, both of the negative-binomial methods' authors provide good evidence that DEGseq's Poisson assumption is invalid.

            Here is the DESeq paper: http://genomebiology.com/2010/11/10/R106
            thanks, jwfoley.

            well, i have read the DESeq paper and the edgeR one. they use the same NB distribution medel and both claimed taht they suit for the identification of DEGs from RNA-seq without any replicates and that meet my situations.

            i am working on looking for DEGs of plants in abiotic stresses and my samples contain only control and treated groups. and both papers mentioned above have suggested that Poisson distribution model for no-replicates samples is acceptable. Am I right?

            as i have mentioned in the former table, nearly 1/3 matched genes outputted from DEGseq were DEGs. does that make any sense?

            Comment


            • #7
              No, the Poisson distribution is never appropriate, and I thought we said that quite clearly in our paper. You will always end up with loads of false positives.

              You simply cannot perform a proper analysis without replicates. The correct solution is to start over. (See also http://seqanswers.com/forums/showpos...04&postcount=2 )

              DESeq offers the possibility to perform a very conservative analysis for the no-replicates case which shows you only those genes which really "stick out". This can give you at least a few results.

              Comment


              • #8
                Originally posted by Simon Anders View Post
                No, the Poisson distribution is never appropriate, and I thought we said that quite clearly in our paper. You will always end up with loads of false positives.

                You simply cannot perform a proper analysis without replicates. The correct solution is to start over. (See also http://seqanswers.com/forums/showpos...04&postcount=2 )

                DESeq offers the possibility to perform a very conservative analysis for the no-replicates case which shows you only those genes which really "stick out". This can give you at least a few results.
                well , you just frightened me, Simon Anders.

                i did not understand your words by saying "You simply cannot perform a proper analysis without replicates. The correct solution is to start over".

                did you mean that, the data i was working on which simply came from control and treatment samples were meaningless?
                Last edited by tianyub836; 10-11-2011, 04:49 PM.

                Comment


                • #9
                  Originally posted by tianyub836 View Post
                  well , you just frightened me, Simon Anders.

                  i did not understand your words by saying "You simply cannot perform a proper analysis without replicates. The correct solution is to start over".

                  did you mean that, the data i was working on which simply came from control and treatment samples were meaningless?
                  If you have no measure of variability of your measurements, how can you make any conclusions about how reliable/reproducible the differential expression you observe is?

                  Comment


                  • #10
                    Originally posted by frozenlyse View Post
                    If you have no measure of variability of your measurements, how can you make any conclusions about how reliable/reproducible the differential expression you observe is?
                    what if i assumed that the variability of my measurement was ignorable or not significant enough to imapct my final output?

                    i mean that i am sure of the technical noise is minimumized and can be ignored and the biological variance is not significant.

                    Comment


                    • #11
                      Originally posted by tianyub836 View Post
                      what if i assumed that the variability of my measurement was ignorable or not significant enough to imapct my final output?

                      i mean that i am sure of the technical noise is minimumized and can be ignored and the biological variance is not significant.

                      Well, then you'd be lying to yourself. But getting list of DE genes isn't the problem (edgeR will of course still give you a table of pvals and logFC) but knowing how many of those are at all trustworthy is the problem.

                      Comment


                      • #12
                        Originally posted by tianyub836 View Post
                        i mean that i am sure of the technical noise is minimumized and can be ignored and the biological variance is not significant.
                        I am curious what makes you so sure of that?


                        There are, of course some possibilities to find something in your data. You might just guess the amount of sample-to-sample variability, and inject this information into the DESeq workflow. For a reasonable guess, however, you have better performed this kind of analysis before, with replication, and still, I would not want to see something like this in a publication. You might also estimate the variance from comparing your treatment and control samples and limit your hits to genes with so extreme fold changes that they stick out even there. DESeq's "blind" dispersion estimation is meant for that. Again, such an analysis is not publication quality.

                        Comment


                        • #13
                          Originally posted by Simon Anders View Post
                          I am curious what makes you so sure of that?


                          There are, of course some possibilities to find something in your data. You might just guess the amount of sample-to-sample variability, and inject this information into the DESeq workflow. For a reasonable guess, however, you have better performed this kind of analysis before, with replication, and still, I would not want to see something like this in a publication. You might also estimate the variance from comparing your treatment and control samples and limit your hits to genes with so extreme fold changes that they stick out even there. DESeq's "blind" dispersion estimation is meant for that. Again, such an analysis is not publication quality.
                          well, that was just an assumption of not significant impacts.

                          actually, when samples were prepared and we collected samples from multiple plants both for the control and treatment groups, which meant we had sent mixed samples for each group to be sequenced respectively. and we oringally thought that the biological replicates' impact might be reduced.

                          Did that make any sense?

                          Comment


                          • #14
                            Originally posted by tianyub836 View Post
                            actually, when samples were prepared and we collected samples from multiple plants both for the control and treatment groups, which meant we had sent mixed samples for each group to be sequenced respectively. and we oringally thought that the biological replicates' impact might be reduced.

                            Did that make any sense?
                            Only a bit. If you pool N plants, the your variance goes down to 1/N (or your standard error of expression estimates to 1/sqrt(N) of the value for a single plant.)

                            So, of course, the variance got smaller, but by pooling everything, you have lost all possibility of figuring out how small it is now.

                            What you should have done is make two or three pools for each group and add multiplexing tags to the samples so that you can put them together in one sequencing lane. Comparing the pools from the same group would have enabled you to assess the variance. Without is, you have to guess it blindly, and whatever guess you may come up with, you cannot expect anybody (especially not a reviewer of your paper) to believe that to be a good guess.

                            Comment


                            • #15
                              Originally posted by Simon Anders View Post
                              Only a bit. If you pool N plants, the your variance goes down to 1/N (or your standard error of expression estimates to 1/sqrt(N) of the value for a single plant.)

                              So, of course, the variance got smaller, but by pooling everything, you have lost all possibility of figuring out how small it is now.

                              What you should have done is make two or three pools for each group and add multiplexing tags to the samples so that you can put them together in one sequencing lane. Comparing the pools from the same group would have enabled you to assess the variance. Without is, you have to guess it blindly, and whatever guess you may come up with, you cannot expect anybody (especially not a reviewer of your paper) to believe that to be a good guess.
                              thanks, Simon Anders.

                              i admit that it was not a perfect experiment design and also i ve many details to take care.

                              it was wonderful to discuss with you

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Advanced Tools Transforming the Field of Cytogenomics
                                by seqadmin


                                At the intersection of cytogenetics and genomics lies the exciting field of cytogenomics. It focuses on studying chromosomes at a molecular scale, involving techniques that analyze either the whole genome or particular DNA sequences to examine variations in structure and behavior at the chromosomal or subchromosomal level. By integrating cytogenetic techniques with genomic analysis, researchers can effectively investigate chromosomal abnormalities related to diseases, particularly...
                                09-26-2023, 06:26 AM
                              • seqadmin
                                How RNA-Seq is Transforming Cancer Studies
                                by seqadmin



                                Cancer research has been transformed through numerous molecular techniques, with RNA sequencing (RNA-seq) playing a crucial role in understanding the complexity of the disease. Maša Ivin, Ph.D., Scientific Writer at Lexogen, and Yvonne Goepel Ph.D., Product Manager at Lexogen, remarked that “The high-throughput nature of RNA-seq allows for rapid profiling and deep exploration of the transcriptome.” They emphasized its indispensable role in cancer research, aiding in biomarker...
                                09-07-2023, 11:15 PM
                              • seqadmin
                                Methods for Investigating the Transcriptome
                                by seqadmin




                                Ribonucleic acid (RNA) represents a range of diverse molecules that play a crucial role in many cellular processes. From serving as a protein template to regulating genes, the complex processes involving RNA make it a focal point of study for many scientists. This article will spotlight various methods scientists have developed to investigate different RNA subtypes and the broader transcriptome.

                                Whole Transcriptome RNA-seq
                                Whole transcriptome sequencing...
                                08-31-2023, 11:07 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Yesterday, 06:57 AM
                              0 responses
                              9 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 09-26-2023, 07:53 AM
                              0 responses
                              8 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 09-25-2023, 07:42 AM
                              0 responses
                              15 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 09-22-2023, 09:05 AM
                              0 responses
                              44 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X