Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • differential gene expression and variance issues

    I have 5 time-points with 2 biological replicates (collected and prepared on separate days following exactly the same protocol) of bacteria during starvation-induced development. I've done analysis using CLC genome workbench and tophat-cufflinks-cuffdiff (yes, I realize I probably only need bowtie for bacteria, but I figured looking for nonexistent splice junctions would just take computational time and shouldn't change anything).

    My problem is this; there are a number of genes that I know are differentially regulated (previously published, validated by me by qPCR) that go up by many fold (one example goes from 50 RPKM to like 4000) but that both programs say are not statistically significantly regulated because there is high variability between replicates.

    Instead, the genes that are given as statistically significantly regulated are expressed at very low levels and don't have as much variability or a very high fold up-(or down) regulation (from 20 to 2 RPKM, for example). These seem less likely to be interesting biologically.

    So my question is, am I going to be able to get anything statistically valid out of this data, or if there's a lot of variation am I just out of luck? I am sure I could just cherry-pick genes for future work, but that seems like a waste of data.

    If I try DESeq, will I just have the same problem in a different format, or might the different ways the programs analyze the data change the way statistics are calculated?

    Thanks,
    Anna

  • #2
    If you want to know whether DESeq will give you the same answer, you will just have to try.

    As for the qPCR validation: Have you only validated that the gene goes up in one replicate, or have you also validated that the variance is low by performing your qPCR on the time points of the second replicate, too?

    Comment


    • #3
      I didn't do qPCR validation of the second data set, but if I do parallel analyses for each set of replicates (at least in the CLC software) I do see up-regulation of a number of known genes within each replicate set of timepoints. There is a bit of variation in timing, etc. but the genes I expect to go up do go up.


      The problem comes when I try to do statistics, then the large variance in levels between the replicates makes the p-values really big for most of my "known" up-regulated genes.
      I'm considering whether I need to do some sort of paired comparison, but then I'm not sure if I'll have to do separate analyses for each timepoint, comparing each timepoint to 0hrs, and then if I do that, do I have to make an even more severe significance correction if I'm effectively doing 4 separate tests...I wish I'd taken statistics more recently than 10 years ago.

      On a partly unrelated note, the more I look through my data, the more I feel like cufflinks/cuffdiff is just not ideal for bacterial genomes. I feel like it doesn't deal well with the whole "many genes are in operons" issue. Has anyone else had experience with this and did you find something better?

      And are there any programs that don't lump sense and antisense transcripts when counting reads mapping to a particular genomic region (also a somewhat bacteria-specific problem, I think)?

      Comment


      • #4
        Yes, when a paired analysis is warranted, it can have much more power than a naive one. Then, you have to use a tool like DESeq, because cuffdiff does not offer functionality for designs that go beyond a two-group comparison.

        Comment


        • #5
          Thanks, Simon, I'll give DESeq a try.

          Comment


          • #6
            Hi amcloon

            I am interested in the outcome of your analysis with DEseq, since I have a similar issue with multiple timepoint analysis and variability between samples.

            Did you end up using the paired analysis, or staying with single analyses comparing everything to time zero?

            Cheers

            Sam

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM
            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            25 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            28 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            24 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-04-2024, 09:00 AM
            0 responses
            52 views
            0 likes
            Last Post seqadmin  
            Working...
            X