Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • rndouglas
    Member
    • Jun 2012
    • 23

    What to include in my count table(s) for DESeq

    I have five experimental groups (A, B, C, D, and E; 3-6 bioreps each) that I want to compare to each other and a control (Y; 3 bioreps) using DESeq.

    When I run DESeq is it better to use a count table that includes A, B, C, D, E, and Y, then run each comparison, or is it better to make a count table for each comparison (ie: a table for A and Y, another table for B and Y, another table for A and B, etc.) and end up with ~15 different count tables?

    I ask because I end up with different lists of significantly changed loci depending on how I run the analysis and I'm not sure which is more 'correct.'
  • feralBiologist
    Member
    • Jun 2011
    • 61

    #2
    It is better to have a single count table. This shall lead to increased statistical power. You are likely to get somewhat higher number of differentially expressed genes as DESeq would be able to tease out more signal from the noise.

    Comment

    • rndouglas
      Member
      • Jun 2012
      • 23

      #3
      This is what I had been doing all along (running everything in one count table), but then this morning I thought I'd try A vs. B in a separate count table.

      I was surprised to find almost double the loci with padj < 0.05 compared to when I ran everything in one count table (and hence my new-found concern).

      Comment

      • dpryan
        Devon Ryan
        • Jul 2011
        • 3478

        #4
        Have a look at the size factors. If one of them from the full dataset is very different than the others, that can cause this sort of result.

        Comment

        • feralBiologist
          Member
          • Jun 2011
          • 61

          #5
          Originally posted by rndouglas View Post
          This is what I had been doing all along (running everything in one count table), but then this morning I thought I'd try A vs. B in a separate count table. I was surprised to find almost double the loci with padj < 0.05 compared to when I ran everything in one count table (and hence my new-found concern).
          This is really strange and counter-intuitive. Do you find more loci for all the contrasts? I can imagine this being the case for a small subset of the contrasts where there happens to be less variation within the sample groups but I find it hard to believe that you would get in total a much higher number of differentially expressed features by splitting the count table. What sort of normalisation do you do? Can you give more details about your bioinformatics workflow?

          Comment

          • rndouglas
            Member
            • Jun 2012
            • 23

            #6
            I just realized I'm only seeing this in my smallRNA libraries (adapter removed, t/rRNA removed, size-selected for 20-25nt in sRNA Workbench).

            I map the reads using bowtie (-v 0).

            I generate read counts with htseq-count, then build my count table(s).

            I run DESeq following along with the vignette section 3.1.

            So far, every 1v1 count table I've looked (7 of the possible 15) at has called more significantly changed loci (padj < 0.05) than I get when looking at the exact same comparison using a count table that includes all 30 of my bio-reps.

            The biggest 'jump' was from 76 to 372 loci for one comparison.

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Pathogen Surveillance with Advanced Genomic Tools
              by seqadmin




              The COVID-19 pandemic highlighted the need for proactive pathogen surveillance systems. As ongoing threats like avian influenza and newly emerging infections continue to pose risks, researchers are working to improve how quickly and accurately pathogens can be identified and tracked. In a recent SEQanswers webinar, two experts discussed how next-generation sequencing (NGS) and machine learning are shaping efforts to monitor viral variation and trace the origins of infectious...
              03-24-2025, 11:48 AM
            • seqadmin
              New Genomics Tools and Methods Shared at AGBT 2025
              by seqadmin


              This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

              The Headliner
              The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
              03-03-2025, 01:39 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 03-20-2025, 05:03 AM
            0 responses
            49 views
            0 reactions
            Last Post seqadmin  
            Started by seqadmin, 03-19-2025, 07:27 AM
            0 responses
            57 views
            0 reactions
            Last Post seqadmin  
            Started by seqadmin, 03-18-2025, 12:50 PM
            0 responses
            50 views
            0 reactions
            Last Post seqadmin  
            Started by seqadmin, 03-03-2025, 01:15 PM
            0 responses
            201 views
            0 reactions
            Last Post seqadmin  
            Working...