Announcement

Collapse
No announcement yet.

Batch effect for RNAseq data

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Batch effect for RNAseq data

    Hi all

    I am pretty new to RNAseq data and currently working on RNAseq data from Brainspan database (http://brainspan.org/). The data from the database contains normalized expression values and, from my knowledge, it needs batch effect processing. Is there any bioconductor package or other ways to do this?

    Thanks

  • #2
    The standard tool is the SVA package, with the combat command.

    Comment


    • #3
      Hi - I have a question related to 'manually' adjusting for batch effects using RNASeq data (and by manually I mean not using built in batch adjustment from packages like edgeR and DESeq2, but using ComBat/gene-wise normalization/linear modelling to adjust for batch effects).

      I realize there are a few options to eliminate such effects, but most methods (such as ComBat or a linear model) require normalized (normal) count data to begin with. So for instance, one would use cpm() in edgeR or DESeq to fetch normalized counts (in log space) which can then be used for batch adjustment with the corresponding batch variable from the experimental design.

      My question is - upon adjusting these normalized counts for batch effect (through any method), you cannot plug those numbers back in to any differential expression package function (edgeR or DESeq) as this will result in nonsensical results. At the same time - we cannot use raw counts for the batch adjustment prior to normalizing them.

      How does one solve this issue? I have a pretty strong batch effect in my data that I'm struggling to remove effectively prior to differential expression testing

      Thanks

      Comment


      • #4
        In the case of SVA, you get a list containing the surrogate variables. You then just add them as covariates to your design. Combat() itself produces a tweaked expression-set, which is more useful for something like limma.

        Comment


        • #5
          Originally posted by dpryan View Post
          In the case of SVA, you get a list containing the surrogate variables. You then just add them as covariates to your design. Combat() itself produces a tweaked expression-set, which is more useful for something like limma.

          Thanks for your reply! I actually did try adding the batch term as a covariate to the design model specification in both edgeR and DESeq2 but I see very few DE genes (10-20 out of 20,000 tested) which is why I was looking to do it independently through ComBat or another method.

          My main issue is that I might have my corrected (normalized) counts through independent batch-adjustment methods but any DE package (DESeq, edgeR or even limma's voom) would require raw counts because it does internal normalization/rescaling which would make the corresponding results not make sense anymore.

          I don't see an easy way around this (Is there any package or specification where it lets you give it already normalized data without doing any transformation internally?)

          Thanks any help would be greatly appreciated

          Comment


          • #6
            Just use limma. You don't need to do voom().

            Comment

            Working...
            X