Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • alyamahmoud
    Member
    • Nov 2013
    • 29

    non-typical p values distribution running DESeq

    Hi All

    I have 2 reps*5 conditions (4 + control). I ran followed the nature protocol to come to differentially expressed genes between each of the 4 conditions relative to the control.

    The distribution of p values looks is attached for one of the conditions as well as the corresponding MA plot. I am getting too many differentially expressed genes (below) and the p values distribution doesn't look like expected.

    Would you please advise what I could be missing ? or is such a distribution of p values expected in some cases and why ?

    Thank you very much
    Alyaa


    Code:
    > table (res$padj < 0.1)
    FALSE  TRUE 
     3245  3578 
    
    > table (res$padj < 0.01)
    FALSE  TRUE 
     4813  2010 
    
    > table (res$padj < 0.05)
    FALSE  TRUE 
     3862  2961
    Attached Files
  • gringer
    David Eccles (gringer)
    • May 2011
    • 845

    #2
    Based on the MA plot, I'm guessing you're using DESeq2, rather than DESeq, which is good.

    However, your MA plot looks crazy. It should be distributed around the y axis (0 log2 fold change).

    I've got no idea what would do that, but it certainly indicates something screwy is going on. Are you sub-sampling genes prior to running them through DESeq2? Are you using normalised counts as input, instead of raw counts? What command did you run to produce this MA plot?

    Comment

    • alyamahmoud
      Member
      • Nov 2013
      • 29

      #3
      Hi gringer

      Thanks for the prompt reply.

      I am not subsampling, and I am using the raw counts as input to DESeq and not DESeq2.

      I ran the following to generate the MA plot:

      Code:
      cds =  newCountDataSet(counts_table, conds)
      cds <- estimateSizeFactors(cds)
      cds <- estimateDispersions(cds)
      res = nbinomTest(cds, "water", "aerobic") # one of the conditions vs control
      plotMA(res)

      Comment

      • gringer
        David Eccles (gringer)
        • May 2011
        • 845

        #4
        I am not subsampling, and I am using the raw counts as input to DESeq and not DESeq2.
        Oh, okay. Do you get the same results when using DESeq2?

        The only reason I can think of off the top of my head why increased expression in both groups would result in increased log2 fold change is if one of the groups had 0 expression for all genes. Can you show the first few lines of your results, i.e. "head(res)"? I think DESeq (v1, not v2) should report estimated/normalised expression for each group in that result.

        If that is the problem, then you may have chosen your condition names incorrectly in the nbinomTest command:
        Code:
        > conditions(cds)
        [1] "water" "aerobic" # should be something like this
        Or alternatively, the count table or condition list could be formatted incorrectly:
        Code:
        > colnames(counts_table)
        [1] "water_r1" "water_r2" "aerobic_r1" "aerobic_r2" # something like this
        > dim(counts_table)
        [1] 30215 4 # should be something like this
        > conds
        [1] "water" "water" "aerobic" "aerobic"  # should be something like this
        But please try DESeq2. It will complain a bit louder when you do things wrong, which will hopefully give you more insights into what went wrong.

        Comment

        • alyamahmoud
          Member
          • Nov 2013
          • 29

          #5
          I tried DESeq2 and the results are not very different. The p values distribution and the MA plot using DESeq2 are attached.


          Here are the formats for the counts table and design:

          Code:
          >conditions(cds)
              water_1     water_2       ph5_1       ph5_2       ph9_1       ph9_2 
                water       water         ph5         ph5         ph9         ph9 
          anaerobic_1 anaerobic_2   aerobic_1   aerobic_2 
            anaerobic   anaerobic     aerobic     aerobic 
          Levels: aerobic anaerobic ph5 ph9 water
          Code:
          > colnames(counts_table)
           [1] "water_1"     "water_2"     "ph5_1"       "ph5_2"       "ph9_1"      
           [6] "ph9_2"       "anaerobic_1" "anaerobic_2" "aerobic_1"   "aerobic_2"
          Code:
          >conds
           [1] water     water     ph5       ph5       ph9       ph9       anaerobic
           [8] anaerobic aerobic   aerobic  
          Levels: aerobic anaerobic ph5 ph9 water
          Attached Files

          Comment

          • alyamahmoud
            Member
            • Nov 2013
            • 29

            #6
            Am I formatting the conditions file wrongly:

            This is my exp_design file:

            Code:
            >exp_design
                        condition
            water_1         water
            water_2         water
            ph5_1             ph5
            ph5_2             ph5
            ph9_1             ph9
            ph9_2             ph9
            anaerobic_1 anaerobic
            anaerobic_2 anaerobic
            aerobic_1     aerobic
            aerobic_2     aerobic
            and these are the commands in DESeq and DEseq2, respectively.
            Code:
            conds = exp_design$condition
            cds =  newCountDataSet(counts_table, conds)
            Code:
            ddsFullCountTable <- DESeqDataSetFromMatrix(countData = counts_table, colData = exp_design, design = ~condition)

            Comment

            • gringer
              David Eccles (gringer)
              • May 2011
              • 845

              #7
              Can you show the first few lines of results (and/or your count table)? The rest of what you've got looks fine.

              Comment

              • alyamahmoud
                Member
                • Nov 2013
                • 29

                #8
                Code:
                > head (counts_table)
                              water_1 water_2 ph5_1 ph5_2 ph9_1 ph9_2 anaerobic_1 anaerobic_2
                FN649414.4579     500     243   133   647   141   114         222         106
                FN649414.7957      23      20    10    91    12    13          13           6
                FN649414.7767     135      50    55   321    52    43          96          53
                p948.168            5       0     0     0     0     0          66          28
                
                              aerobic_1 aerobic_2
                FN649414.4579        50       113
                FN649414.7957        16        34
                FN649414.7767        33       101
                p948.168              0         0

                Comment

                • alyamahmoud
                  Member
                  • Nov 2013
                  • 29

                  #9
                  I tried

                  Code:
                  >cdsFilt = estimateDispersions(cdsFilt, method  = "blind", sharingMode="fit-only)
                  and the MA plot is attached, does it look any better ?
                  Attached Files

                  Comment

                  • gringer
                    David Eccles (gringer)
                    • May 2011
                    • 845

                    #10
                    No. You're expecting points looking like a triangle (or diamond) shaped wedge elongated along the Y axis, centered on the Y axis. I've attached an example based on DESeq results. The DESeq2 plot should look similar, but narrows down to a point for low expression values. If you're not getting that (and all other steps check out), then it suggests that your experimental conditions aren't appropriately controlled.
                    Attached Files
                    Last edited by gringer; 06-26-2014, 09:28 PM.

                    Comment

                    • gringer
                      David Eccles (gringer)
                      • May 2011
                      • 845

                      #11
                      You wouldn't happen to be trying to do a metagenomic analysis, would you? If you have a different sample population for each condition, then that might explain the differences that you're seeing.

                      Comment

                      • alyamahmoud
                        Member
                        • Nov 2013
                        • 29

                        #12
                        I will check with the biologist who gave me the data, but I don't think they are coming from different population.

                        How reliable/ir-reliable would be the results of the DEG accordingly ?

                        Comment

                        • gringer
                          David Eccles (gringer)
                          • May 2011
                          • 845

                          #13
                          Do you have mapping percentages for your genome? If they're wildly different, that might point to sample differences that could greatly influence the results.

                          What your MA plots seem to be showing is that one of the populations is producing normalised counts for genes that are effectively zero.

                          You can look at the count input data (or DESEq-normalised data) to see if this is the case -- just looking at total counts per experiment should show odd patterns. If the differences are as much as they look from the MA plot, you won't need DESeq at all to find differences.

                          Comment

                          • Michael Love
                            Senior Member
                            • Jul 2013
                            • 333

                            #14
                            Indeed, something looks strange about this experiment. Try running the PCA steps in the vignette to see how the samples are distributed. It looks like the different conditions are very different from each other for many rows. This can make the normalization difficult without spike-in controls.

                            Can you describe the experiment in more details? Is this standard RNA-Seq data?

                            Comment

                            • alyamahmoud
                              Member
                              • Nov 2013
                              • 29

                              #15
                              Hi Michael

                              Thanks for your reply.

                              I attached the PCA plot.

                              Yes, these are very different conditions for some bacterial species, paired-end standard RNAseq, % of mapped reads >= 97%.

                              How can I handle this data without spike-in controls ? How much effect does this massive dispersion have on the final DEG ?
                              Attached Files

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                New Genomics Tools and Methods Shared at AGBT 2025
                                by seqadmin


                                This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                                The Headliner
                                The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                                03-03-2025, 01:39 PM
                              • seqadmin
                                Investigating the Gut Microbiome Through Diet and Spatial Biology
                                by seqadmin




                                The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                                02-24-2025, 06:31 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, Today, 05:03 AM
                              0 responses
                              15 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, Yesterday, 07:27 AM
                              0 responses
                              12 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-18-2025, 12:50 PM
                              0 responses
                              14 views
                              0 reactions
                              Last Post seqadmin  
                              Started by seqadmin, 03-03-2025, 01:15 PM
                              0 responses
                              185 views
                              0 reactions
                              Last Post seqadmin  
                              Working...