Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ymc
    Senior Member
    • Mar 2010
    • 496

    tumor purity and edgeR

    I am currently using tophat2-htseq-edgeR pipeline for my tumor-normal pair analysis. I basically followed edgeR manual's example to do my analysis.

    I didn't see anywhere in the example that required me to input a guess on tumor purity/contamination. I believe theoretically tumor purity should affect the true gene expression level in tumor cells. So is it wrong that edgeR didn't take that into account? Or this effect is fixed somehow through the estimation of BCV??

    Thanks.
  • dpryan
    Devon Ryan
    • Jul 2011
    • 3478

    #2
    edgeR isn't only for looking at tumor-normal pairs, so it doesn't make any assumptions about the type of experiment you're doing. If you want to account for tumor purity, you'll need to put that in your experimental model.

    Comment

    • ymc
      Senior Member
      • Mar 2010
      • 496

      #3
      Thank you for your reply. What do you mean by "put that in your experimental model"? How to do that?

      Comment

      • dpryan
        Devon Ryan
        • Jul 2011
        • 3478

        #4
        edgeR and the other standard tools take a dataframe that describes the various experimental manipulations or confounders that you want included in a model fit. In the typical examples, the components of this dataframe are factors (genotype, treatment, etc.), but they don't have to be, you can also use continuous explanatory variables here. I believe there have been a few discussions on the bioconductor email list between people needing to account for age or other continuous data in their models, so you might have a read through those. This assumes that you have some numeric estimate of purity, of course. If you just have a categorical estimate (low, medium, high, etc.), then you could also just use that as a factor.

        FYI, here's one email thread about the subject, which is convenient since it makes explicit what logFC would mean in such situations.
        Last edited by dpryan; 08-20-2013, 06:22 AM. Reason: Fix some wording to better fit the "real" terms for things

        Comment

        • jparsons
          Member
          • Feb 2012
          • 62

          #5
          There are also a few papers/programs out there that attempt to make numerical estimates of the purity of a 'mixture' sample. I can't speak to the efficacy of them, as the only sample I'm interested in is a ternary system and the programs are quite limited in scope. However, the general method is sound.

          PMID: 23737925
          Last edited by jparsons; 08-20-2013, 01:55 PM. Reason: wrong paper in second ref

          Comment

          • ymc
            Senior Member
            • Mar 2010
            • 496

            #6
            Can you adjust the tumor hit counts by this method?

            Let c be the fraction of tumor sample contaminated by normal cells (probably determined experimentally or by other means)

            true tumor count = (tumor_count - c*normal_count) / (1-c)

            Will this work?

            Comment

            • dpryan
              Devon Ryan
              • Jul 2011
              • 3478

              #7
              No, because the relative contribution from the two sources to the measured count will vary by gene. Some genes may be more tumor-specific while others more normal-specific. Because of how RNAseq works, separating these two sources is actually quite difficult.

              Comment

              • Dario1984
                Senior Member
                • Jun 2011
                • 166

                #8
                Originally posted by dpryan View Post
                No, because the relative contribution from the two sources to the measured count will vary by gene.
                Yes, many genes would not be differentially expressed between the healthy sample and the cancer sample. If you applied a scaling factor, you would be altering their fold changes away from 1 and introducing new problems. I have not seen any journal articles account for this problem for differential expression analysis and I haven't even seen anyone do a spike-in study with healthy cells and a varying proportion of a cancer cell line to convincingly demonstrate that methods such as ESTIMATE work well. I would also be interested to know what kind of percentages purity estimation methods give on a single cell RNA-seq dataset.

                Comment

                Latest Articles

                Collapse

                • SEQadmin2
                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                  by SEQadmin2


                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                  ...
                  Yesterday, 10:05 AM
                • SEQadmin2
                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                  by SEQadmin2


                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                  Introduction

                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                  05-22-2026, 06:42 AM
                • SEQadmin2
                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                  by SEQadmin2

                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                  05-06-2026, 09:04 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, Yesterday, 12:03 PM
                0 responses
                19 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, Yesterday, 11:40 AM
                0 responses
                14 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 05-28-2026, 11:40 AM
                0 responses
                29 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 05-26-2026, 10:12 AM
                0 responses
                31 views
                0 reactions
                Last Post SEQadmin2  
                Working...