Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Michael Love
    Senior Member
    • Jul 2013
    • 333

    #16
    note that the replicates are right on top of each other in the PCA plot. Are these technical or biological replicates?

    The dispersion is calculated based on variance within conditions, so the dispersion is not necessarily large though you have large differences across conditions.

    I'm not so familiar with microbial analysis. I'd guess, like others mentioned above, that you have many genes with counts for only one species. And there is not a clear group of genes which are not DE across the conditions. This makes normalization difficult, as the automatic methods within DESeq or edgeR are based on the assumption that there are enough genes that are not DE, such that robust measures like median or trimmed mean can find the center of the distribution of log ratios of samples.

    Is there a set of genes that the biologists suspect might be equally expressed across the groups?

    Comment

    • alyamahmoud
      Member
      • Nov 2013
      • 29

      #17
      Hi Michael

      These are biological replicates.

      The dispersion is calculated based on variance within conditions, so the dispersion is not necessarily large though you have large differences across conditions.
      I am not sure I get what you mean here.

      There is a set of genes that the biologist know should be varying and these are non-metagenomics samples; single species per sample.

      What would you suggest ?

      Comment

      • alyamahmoud
        Member
        • Nov 2013
        • 29

        #18
        If I use collapseReplicates the number of DEG decreases massively, however, this doesn't improve the ma plot (attached)!!

        any help ?
        Attached Files

        Comment

        • gringer
          David Eccles (gringer)
          • May 2011
          • 845

          #19
          genome

          Just a general question related to this (based on the mapping IDs you have provided), is there anyone here on seqanswers who has successfully done a DESeq / DESeq2 run on E. coli?

          Comment

          • alyamahmoud
            Member
            • Nov 2013
            • 29

            #20
            hierarchial model

            Is there any objection to applying a hierarchial model on the normalized counts ? I tried limma analysis on the normalized counts, the MA plot is also attached.
            Attached Files

            Comment

            • Michael Love
              Senior Member
              • Jul 2013
              • 333

              #21
              collapseReplicates() is for technical replicates only. We obviously do not recommend collapsing biological replicates, as you throw away information from the experiment.

              The "problem" of too many p-values or a p-value distribution with a spike at 0 means that you have many large differences across the conditions.

              Can you say more about the genes here? If you are sequencing multiple species, what is the relation of each species to the reference genome/transcriptome to which the reads were aligned?

              Comment

              • alyamahmoud
                Member
                • Nov 2013
                • 29

                #22
                They are not multiple species, only one species (same as reference) but under different environmental conditions (different pH ranges, anaerobic, water vs wt that is aerobic)

                Comment

                • alyamahmoud
                  Member
                  • Nov 2013
                  • 29

                  #23
                  is it wrong to apply a hierarchal model ? it reduces the number of sig genes drastically but the the MA plot looks better I think ?

                  Comment

                  • Michael Love
                    Senior Member
                    • Jul 2013
                    • 333

                    #24
                    The last plot you posted, hm_norm_counts, was not an MA plot. it was logFC ~ adjusted p value.

                    An MA plot is logFC ~ log of mean counts, or mean of log counts.

                    Also, note the scale of the y axis is much larger than the previous plots.

                    Regarding the goal of reducing the number of sig genes: I don't know if you've reduced the number of significant genes for better or for worse. We can easily reduce the number of genes, either by reducing the FDR threshold or increasing the lfcThreshold argument.

                    We know for sure, from the PCA plot, that the differences between the samples are very large compared to the variation between biological replicates.

                    Could you send the dds object to me privately, so I can have a look?

                    My email is listed here:

                    maintainer("DESeq2")

                    Comment

                    • gringer
                      David Eccles (gringer)
                      • May 2011
                      • 845

                      #25
                      That plot also doesn't look wonderful, presumably because you've got p-val on the X axis. MA plot is usually log fold change on Y, and average log expression on the X.

                      Are you able to do a scatter plot of the raw counts for each experiment, preferably log-transformed or using the VST from DESeq/DESeq2? If you're not getting a line that distributes around y=x with those plots, it's probably not a good idea trying to shoehorn in a differential expression analysis.

                      Comment

                      • Michael Love
                        Senior Member
                        • Jul 2013
                        • 333

                        #26
                        Yes, David's right.

                        Here's a pairs plot of your counts in the log scale

                        Code:
                        y <- log10(counts(dds)+1)
                        pairs(y, panel = function(...) smoothScatter(..., nrpoints = 0, add = TRUE),lower.panel=NULL)
                        For the samples other than 'water', we can see the diagonal line that would specify a log fold change of 0 between the two samples. This is the line that DESeq and edgeR use for defining a scaling factor for normalizing for sequencing depth.

                        However, for water vs others, a simple scaling factor automatically detected from the data will not work.

                        For the scatterplot of 1 vs 3 and 1 vs 10, there seems to be a faint line of genes on the diagaonal. Maybe you can investigate what is special about these genes. It is possible that nearly all the genes are differentially expressed (upregulated in the treated groups), but then the experiment really should use spike in controls for normalization.

                        I wonder if the experimental protocol might have been different for the water samples?

                        Another option for analysis would be to remove the water samples and use the 'contrast' argument to just compare the treatment groups against each other.
                        Attached Files

                        Comment

                        Latest Articles

                        Collapse

                        • SEQadmin2
                          From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                          by SEQadmin2


                          Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                          The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                          ...
                          06-02-2026, 10:05 AM
                        • SEQadmin2
                          Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                          by SEQadmin2


                          With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                          Introduction

                          Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                          05-22-2026, 06:42 AM
                        • SEQadmin2
                          Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                          by SEQadmin2

                          Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                          Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                          05-06-2026, 09:04 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by SEQadmin2, Today, 08:59 AM
                        0 responses
                        1 view
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-02-2026, 12:03 PM
                        0 responses
                        21 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 06-02-2026, 11:40 AM
                        0 responses
                        14 views
                        0 reactions
                        Last Post SEQadmin2  
                        Started by SEQadmin2, 05-28-2026, 11:40 AM
                        0 responses
                        29 views
                        0 reactions
                        Last Post SEQadmin2  
                        Working...