Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • log2FoldChange for a continuous covariate in DESeq2 with normalization factor matrix

    Hi,
    I'm using DESeq2 and I have a continuous covariate in the model. Also, instead of size factors, I have the Normalization Factor Matrix (NF) to account for GC content bias. I have a few questions:
    (1) Is this the correct way to access the normalized counts, now that I'm using the NF matrix?
    counts(dds,normalized=T)
    (2) I'm not sure what the log2FoldChange for this covariate represents:
    (2.1) If the covariate were not continuous and I could represent it as a factor, would the log2Foldchange be equal to log2 of ratio of normalized means, for the last two factor levels?
    (2.2) If so, then how would this translate to the continuous case? I know that it is per unit of change of the continuous covariate. But for some genes, I'm getting high log2FoldChange values, but when I plot the "read counts" as a function of the continuous covariate, I'm not observing a high change in count values as the continuous covariate increases. Why is that the case?
    (2.3) Will I get a better interpretation of the log2FoldChange value if I plot the "normalized counts" against the continuous covariate (considering that I'm using the NF matrix)?
    I'd deeply appreciate any help or comment.
    Thanks,
    Golsheed

  • #2
    1. No clue, Michael Love will likely reply with how to do this.
    2. It's the log2 fold change per unit increase in the covariate. This is typically the case for continuous covariates.
    2.1 No, it'd be different. The fold changes are shrunken, after all (not to mention the outlier detection step).
    2.2 see above
    2.3 perhaps, depends on whether you need the visual or not.

    Comment


    • #3
      1 yes. (btw the code is in DESeq2:::counts.DESeqDataSet)
      2 As Devon says, it's the fold change per unit in the covariate.

      You can maybe get a sense of LFCs from a GLM with log link and using continuous covariates by examining toy data and using R's glm with the poisson family:

      Code:
      > y = c(200,100,50,25,12,210)
      > x = 1:6
      > glm(y ~ x, family=poisson)
      
      Call:  glm(formula = y ~ x, family = poisson)
      
      Coefficients:
      (Intercept)            x
          4.83416     -0.06883
      Note that the log fold change (natural log) is negative, even though the log fold change of 210 over 200 would be positive.

      2.1 The log2FoldChange with betaPrior=FALSE and with minReplicatesForReplace=Inf is the log2 ratio of the mean of normalized counts, however continuous and factor covariates are not treated the same.
      2.2 If you plot log2(counts(dds, normalized=TRUE)[idx,]) over the covariate, you should be able to see a trend which is related to the LFC.
      2.3 The simplest way to understand this relationship is by looking at the mathematics. The LFC is beta in the model formula in the vignette (see vignette Section 4.1: The DESeq2 model and Section 3.11: Sample-/gene-dependent normalization factors). Yes, you need to look at normalized counts to understand the LFC.

      Comment


      • #4
        Thanks a lot, it's very helpful.

        Golsheed

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM
        • seqadmin
          Strategies for Sequencing Challenging Samples
          by seqadmin


          Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
          03-22-2024, 06:39 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        18 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        22 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        17 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        49 views
        0 likes
        Last Post seqadmin  
        Working...
        X