Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem: DESeq2 analysis with very unbalanced design

    Hello,

    I have a question about my DESeq2 analysis.
    I have to compare expression of miRNAs in different disease variants but I have a problem, because the design is not balanced ; this is my data:
    Disease variant 1: 5 replicates
    Disease variant 2: 26 replicates
    Disease variant 3: 5 replicates
    Disease variant 4: 8 replicates
    Disease variant 1,3,4 have fewer patients than the variant 2 because are rare variants of the disease, then their frequency in a population is very low (it is difficult to find replicates!).
    Can deseq2 working well with these unbalanced samples?

    PS. Sorry if there are errors in the text, but I don't speak English very well
    Thank you very much in advance,
    Fischer

  • #2
    Unbalanced designs aren't a problem, you just have lower power with the variants containing fewer samples.

    Comment


    • #3
      Thank you for reply!
      Then do you think that is correct an analysis with deseq2 in my case?
      Practically I only have low accuracy in the results of these variants, right?

      Comment


      • #4
        Sure, I'd still use DESeq2 if this were my dataset.

        Comment


        • #5
          Thank you again for reply! I have another question, if you can help me again..
          Becouse we had a problem in our lab, some of the samples (15%) were extracted with the Hiseq, while the remaining with the Myseq.. so the initial frequencies of miRNAs in the samples are differents because of the use of two different instruments (Hiseq frequencies are higher).. It could be a problem for the data analysis or DESeq2 solves this problem with normalization?

          Comment


          • #6
            By "extracted" I assume you mean "sequenced". Were the HiSeq and MiSeq libraries prepared at the same time? If everything was prepared at the same time and with the same procedure and just sequenced on different machines then the library size normalization will take care of things. If not, then you should add a batch nuisance variable into your model.

            Comment


            • #7
              Originally posted by dpryan View Post
              Were the HiSeq and MiSeq libraries prepared at the same time?
              Yes, they were prepared at the same time and with the same kit.

              Originally posted by dpryan View Post
              If everything was prepared at the same time and with the same procedure and just sequenced on different machines then the library size normalization will take care of things. If not, then you should add a batch nuisance variable into your model.
              The only difference is in the sequencer machine. We used both Hiseq and Miseq, so some samples have an higher number of reads than other.

              Comment


              • #8
                OK, in theory that should be OK. In practice, though, it's good to make a PCA plot and then see if samples start clustering by machine. If that's the case then you have a notable machine effect and can just add a variable to your model. Alternatively, you could see if svaseq finds a meaningful batch effect worthy of compensation.

                Comment


                • #9
                  Ok, I created a new variable that identify Hiseq/Miseq and I redid the model with these commands ("categories" is the "disease variants" variable, "machine" is the new variable ):

                  pg2 <- newCountDataSet(countTable,categories)

                  countD <- counts(pg2)

                  colData <- data.frame(rownames=colnames(countD), condition=categories, mach=machine)

                  cds <- DESeqDataSetFromMatrix(countData=countD,colData=colData, design=~condition+mach)

                  dds <- DESeq(cds)

                  Is the model correct?
                  then I made PCA:

                  rld <- rlog(dds)
                  plotPCA(rld, intgroup=c("mach"))

                  this is the result:

                  Attached Files
                  Last edited by Fischer; 09-15-2015, 05:54 AM.

                  Comment


                  • #10
                    I guess there is a batch effect (glad I suggested you check!). You might also figure out what's going on with those 2 samples leading to PC1.

                    Comment


                    • #11
                      Thank you so much for your suggestion!
                      Because these two samples have a strange behavior, in your opinion, can I delete them from analysis? For design it wouldn't be a problem because they are "disease variant 2" samples.

                      this is the results without these two samples, and with a model design:
                      ~variant+mach

                      Attached Files
                      Last edited by Fischer; 09-16-2015, 12:05 AM.

                      Comment


                      • #12
                        You should try to see if there's a good reason why they're doing that first (not to mention also doing some hierarchical clustering). In general, though, I would say that those samples are good candidates for exclusion if they can't be otherwise explained (e.g., due to having much lower coverage).

                        Comment


                        • #13
                          Hi DESeq2 experts,

                          I have a very related question. My group design is as following:
                          Control
                          A, n=4
                          B, n=8

                          KO
                          C, n=4
                          D, n=12

                          Groups A,C are untreated, B,D treated.
                          So far so good, I used DESeq2 to compare AvsB and CvsD and now I am looking at the differences of these comparisons (rather than directly comparing BvsD, which I am also doing, but that's not the question here).
                          As you can imagine I get a more DE genes in CvsD, as D has 50% more samples than B, while A and C have the same number of samples. But it's a lot more (AvsB: ~1000; CvsD: ~2500, so 2.5x more, using same FDR/log2FC cutoffs of course).

                          So my question is: Is my "meta-comparison", i.e. looking at what is different in both comparisons actually valid? And is the 2.5-fold difference in DE genes more likely to be a result of group D having higher n (so CvsD has more power than AvsB) or could it also be due to experimental condition, which would be great as that would be biologically meaningful (which was of course the hypothesis)?

                          To be more precise: in my CvsD comparison I get a highly interesting group of genes, so good enrichment of this pathway, while in my AvsB comparison I don't get any of those - and now I'm afraid that this might be due to design rather than biology!

                          Any suggestions would be much appreciated.

                          Comment


                          • #14
                            Comparing lists that made based on p-value and fold-change thresholds is the path of last resort. Your design lends itself nicely to a factorial treatment and those are the questions that likely make the most biological sense...so just do that instead.

                            Comment


                            • #15
                              Thanks @dpryan!

                              You are probably right. I'll have a look at factorial design then.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Latest Developments in Precision Medicine
                                by seqadmin



                                Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                                Somatic Genomics
                                “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                                05-24-2024, 01:16 PM
                              • seqadmin
                                Recent Advances in Sequencing Analysis Tools
                                by seqadmin


                                The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                                05-06-2024, 07:48 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 05-24-2024, 07:15 AM
                              0 responses
                              13 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 05-23-2024, 10:28 AM
                              0 responses
                              17 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 05-23-2024, 07:35 AM
                              0 responses
                              20 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 05-22-2024, 02:06 PM
                              0 responses
                              10 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X