Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • shirley47162928
    Member
    • Jan 2015
    • 15

    how to do batch correction

    Hello everyone,

    I've got some raw counts for mRNA expression data. Before put these counts into EdgeR or DESeq, I was told I have to some batch correction and normalisation. Has anyone done the similiar work before? What shoul I do, with what package?
  • dpryan
    Devon Ryan
    • Jul 2011
    • 3478

    #2
    Depends on the nature of the batch effect. If all the samples in a batch seem to be affected similarly, then just add batch as a model in your design. Otherwise, have a look at the SVA package in bioconductor (pay special attention to the combat() function).

    Comment

    • shirley47162928
      Member
      • Jan 2015
      • 15

      #3
      Originally posted by dpryan View Post
      Depends on the nature of the batch effect. If all the samples in a batch seem to be affected similarly, then just add batch as a model in your design. Otherwise, have a look at the SVA package in bioconductor (pay special attention to the combat() function).
      So how to make a model? I also want to adjust some other covariats like age, race other than batch.
      What I want to do is to find differentially expressed genes between 2-3 treated groups. And what should I do next step?

      Many thanks!

      Comment

      • dpryan
        Devon Ryan
        • Jul 2011
        • 3478

        #4
        Just read the sections in the edgeR/DESeq2/etc. vignettes on more multifactor designs. The simplest route is to just add batches as a factor to the dataframe use to make the model matrix.

        Regarding what you do after DE analysis, that depends on your goals. Popular choices include (1) qPCR validation of some hits (2) GO analysis and (3) pathway analysis.

        Comment

        • shirley47162928
          Member
          • Jan 2015
          • 15

          #5
          [QUOTE=dpryan;157595]Just read the sections in the edgeR/DESeq2/etc. vignettes on more multifactor designs. The simplest route is to just add batches as a factor to the dataframe use to make the model matrix.

          is it this step?
          data.frame(Sample=colnames(y), FH, Seq_Batch, Age, RACE, BMI, MENSSTAT)
          design <- model.matrix(~FH+Seq_Batch+Age+RACE+BMI+MENSSTAT)

          FH is my group factor
          Seq_Batch=Sequence batch infromation, should I categorizise it into groups as "numeric" or just use the original information as "character"
          Age, RACE, BMI, MENSSTAT are my covariats

          Comment

          • dpryan
            Devon Ryan
            • Jul 2011
            • 3478

            #6
            Yeah, if you input a character it'll normally get converted to a factor anyway (this makes things convenient).

            Comment

            • shirley47162928
              Member
              • Jan 2015
              • 15

              #7
              Originally posted by dpryan View Post
              Yeah, if you input a character it'll normally get converted to a factor anyway (this makes things convenient).
              Can you have a closer look at my code, thanks?
              The last coecient "FH" is my group categorization. The others are adjusted factors in my model.

              > design <- model.matrix(~Seq_Batch+Age+RACE+BMI+MENSSTAT+FH)
              > rownames(design) <- colnames(y)
              > design
              (Intercept) Seq_Batch Age RACE BMI MENSSTAT FHYes
              FH0.B1.K100801 1 1 37 3 23.4 2 0
              FH1.B2.K100813 1 2 67 3 21.9 1 1
              FH0.B3.K100823 1 3 46 3 29.0 1 0
              FH1.B2.K100826 1 2 49 3 27.3 2 1
              FH1.B2.K100831 1 2 54 3 28.3 1 1
              FH0.B2.K101448 1 2 46 3 32.3 1 0
              FH1.B2.K102540 1 2 57 -1 28.0 1 1
              FH0.B2.K102654 1 2 59 3 41.6 1 0
              FH0.B5.K104200 1 5 63 3 26.5 1 0
              FH0.B5.K104238 1 5 55 3 22.1 1 0
              FH0.B2.K104239 1 2 48 -1 39.0 2 0
              FH0.B2.K104250 1 2 33 3 22.7 3 0
              FH0.B3.K104338 1 3 56 -1 34.3 1 0
              FH0.B2.K104343 1 2 44 3 33.4 1 0
              FH0.B3.K104403 1 3 46 3 34.2 1 0
              FH1.B4.K104416 1 4 37 3 32.5 1 1
              FH0.B5.K104443 1 5 38 3 20.7 2 0
              FH1.B5.K104506 1 5 59 3 25.9 1 1
              FH1.B3.K104557 1 3 41 3 33.1 2 1
              FH0.B2.K104603 1 2 63 3 31.0 1 0
              FH0.B2.K104638 1 2 78 3 22.6 1 0
              FH0.B4.K104662 1 4 64 3 29.9 1 0
              FH1.B2.K104824 1 2 57 -1 27.4 1 1
              attr(,"assign")
              [1] 0 1 2 3 4 5 6
              attr(,"contrasts")
              attr(,"contrasts")$FH
              [1] "contr.treatment"

              Comment

              Latest Articles

              Collapse

              • SEQadmin2
                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                by SEQadmin2


                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                ...
                Yesterday, 10:05 AM
              • SEQadmin2
                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                by SEQadmin2


                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                Introduction

                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                05-22-2026, 06:42 AM
              • SEQadmin2
                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                by SEQadmin2

                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                05-06-2026, 09:04 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, Yesterday, 12:03 PM
              0 responses
              17 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, Yesterday, 11:40 AM
              0 responses
              13 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 05-28-2026, 11:40 AM
              0 responses
              29 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 05-26-2026, 10:12 AM
              0 responses
              31 views
              0 reactions
              Last Post SEQadmin2  
              Working...