Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • comparision of edgR and DEseq

    I would like to bring to attention the paper by yendrek et al 2012
    Because of the high variability between methods for determining differential expression of RNA-Seq data, we suggest using several bioinformatics tools, as outlined here, to ensure that a conservative list of differentially expressed genes is obtained. We also conclude that despite these analytical l …

    where they have pointed out some limitations of each method for RNA-seq analysis. It will be intresting to get feedback of Simon as well other users what they think and if some minor adjustments with in these tool be of use or not?

    Thanks

  • #2
    Honey,

    these authors make a lot of the fact that the two tools' resulting gene lists do not fully overlap (DESeq, with their settings, produces 11317 and edgeR 11995, with 10.535 in common). But there is no contradiction: like any method, these too make type I errors (false positives) and type II errors (false negatives).

    Type I: As far as I could figure out from the paper, they put thresholds at FDR=0.05, which means that up to 5% (ca. 600) of hits are expected to be false positives. This is a fundamental property of statistical testing from noisy measurements, and no amount of gymnastics (incl. method 'overlapping') can get you around that.

    Type II: None of the methods provide a good estimate of type II errors. Consider the extreme situation that all genes are differentially expressed if you just look hard enough (not completely unrealistic for a comparison that affects a systemic phenotype, growth), and that the detection is only limited by the sequencing depth and number of replicates. In this case, both results lists are fully correct.

    To move forward from this, one needs to further sharpen the question (for instance by considering confidence ranges of the fold change, asking for minimal effect size, or adding a prior on the fraction of true H0), and I suppose that is the subject of active method development in various places.

    The results reported in the quoted paper are based on older versions of R (2.13) and the packages. To see how much of them is relevant for current work one would need to re-do the analysis with current versions. (This is not a critique of the paper, but still of note to its users/readers.)

    Best wishes
    Wolfgang
    Last edited by Wolfgang Huber; 11-23-2012, 01:27 AM.
    Wolfgang Huber
    EMBL

    Comment


    • #3
      I would like to add that this comparison has already been performed when DESeq was published and has been discussed on Seqanswers as well, so I am missing the point of the mentioned paper with regards to novelty and overall information content.

      It would be much more informative if a general comparison of different tools with different strategies were included; such as cufflinks, eXpress, EBSeq with a known population of DE genes, i.e. performing a real benchmark to assess the performance of each tool.
      I am well aware of the fact, that each tool might perform better in certain situations, but due to the sheer number of bioinformatics tools being published / released, it becomes increasingly difficult to keep track and give reasonable suggestions on which to use. Thus, the program offering the best compromise is usually what I aim for.

      Nevertheless, it should be quite obvious that different tools will result in different results, as the same can be seen when investigating different data sets from the same platform.
      Still, regarding my own experiences with cufflinks, CLC Bio and edgeR results on the same data as well as discussions going on in the forums, I see that cufflinks will not be my first choice for DE analysis in the future, regardless of genes being found exclusively by edgeR.

      Best regards

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Current Approaches to Protein Sequencing
        by seqadmin


        Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
        04-04-2024, 04:25 PM
      • seqadmin
        Strategies for Sequencing Challenging Samples
        by seqadmin


        Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
        03-22-2024, 06:39 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, 04-11-2024, 12:08 PM
      0 responses
      31 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 10:19 PM
      0 responses
      33 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-10-2024, 09:21 AM
      0 responses
      28 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-04-2024, 09:00 AM
      0 responses
      53 views
      0 likes
      Last Post seqadmin  
      Working...
      X