Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • rskr
    Senior Member
    • Oct 2010
    • 249

    #16
    Originally posted by whataBamBam View Post
    Great. Actually my original interpretation (before I posted this) was correct then. That the p values are perfectly valid (in fact conservative) and the problem of no replicates is actually low statistical power.

    So basically you are saying that you have less statistical power because you have overestimated the variance. And if you see significant differences DESPITE this low statistical power then go for it.

    To be fair it says in the vignette (or the paper I can't remember which) that there is simply low statistical power if you have no replicates.
    If the hypothesis test was significant, this would indicate that there isn't a problem with power, since a lower powered test should make it more difficult to get significant results, if the test was actually answering the question you were asking. Though in theory this does seem a little confusing, because getting a hypothesis to fit thousands of sample should be harder than just a few samples, however with thousands of samples the means in the population can be known very accurately, so even trivial differences like color between two placebos can be significant.

    Comment

    • whataBamBam
      Member
      • May 2013
      • 27

      #17
      Originally posted by rskr View Post
      If the hypothesis test was significant, this would indicate that there isn't a problem with power, since a lower powered test should make it more difficult to get significant results, if the test was actually answering the question you were asking. Though in theory this does seem a little confusing, because getting a hypothesis to fit thousands of sample should be harder than just a few samples, however with thousands of samples the means in the population can be known very accurately, so even trivial differences like color between two placebos can be significant.
      http://en.wikipedia.org/wiki/Lindley's_paradox
      Yeah the first part is what I meant - well kind of. Yes a lower power test makes it more difficult to observe significant results - but we observe them. So the test had enough power to detect the differences it detected but there could be other differences it did not detect because it did not have enough power. This is what I mean by saying it's conservative.

      The next part I'm less sure about.. but I think this paradox, Lindleys paradox would only apply if there were a very large number of replicates? Which we aren't ever likely to see.

      Comment

      • rskr
        Senior Member
        • Oct 2010
        • 249

        #18
        Originally posted by whataBamBam View Post
        Yeah the first part is what I meant - well kind of. Yes a lower power test makes it more difficult to observe significant results - but we observe them. So the test had enough power to detect the differences it detected but there could be other differences it did not detect because it did not have enough power. This is what I mean by saying it's conservative.

        The next part I'm less sure about.. but I think this paradox, Lindleys paradox would only apply if there were a very large number of replicates? Which we aren't ever likely to see.
        I don't know, I've seen some impressive microarray datasets, I don't see any reason that when rna-seq data drops a little in price there won't be some large data sets.

        Comment

        • tompoes
          Junior Member
          • Jul 2014
          • 2

          #19
          Hi

          I want to use the DESEQ package between a control (3 biological replicates) and treatment (1 biological replicate).

          IN DESeq I herefore used the following code, and got 266 genes with padj < 0.05:

          table <- read.delim("test.txt")
          row.names(table) <- table$Feature_ID
          count_table <- table[, -1]
          conds <- c("ctrl", "ctrl", "ctrl", "treatment")
          cds <- newCountDataSet(count_table, conds)
          cds <- estimateSizeFactors(cds)
          cds <- estimateDispersions(cds, method="blind", sharingMode="fit-only")
          results <- nbinomTest(cds, "ctrl", "treatment")

          In DESeq2 I used the follwing command, but got > 10000 genes with padj < 0.05:

          table <- read.delim("test.txt")
          row.names(table) <- table$Feature_ID
          count_table <- table[, -1]
          colData <- DataFrame(condition=factor(c("ctrl", "ctrl", "ctrl", "treatment")))
          dds <- DESeqDataSetFromMatrix(count_table, colData, formula(~ condition))
          results <- DESeq(dds, minReplicatesForReplace=Inf)

          So probably I need to add extra parameters to the DESEQ2 analysis but for now I can't figure out how?

          Thank you for helping

          Wannes
          Last edited by tompoes; 12-02-2015, 12:31 PM.

          Comment

          • Sow
            Member
            • Feb 2016
            • 16

            #20
            Hi,
            Can anyone advise me if its okay to normalize my data-set before mapping my reads to the genome?

            Thanks

            Comment

            • dpryan
              Devon Ryan
              • Jul 2011
              • 3478

              #21
              There's nothing to normalize before mapping.

              Comment

              • FJwlf
                Junior Member
                • Feb 2019
                • 2

                #22
                Originally posted by Simon Anders View Post
                To be honest, we couldn't yet be bothered to explain how to analyse such data in DESeq2. It's tricky to write up because too many people will misinterpret whatever I write as if it were actually possible to conduct a meaningful statistical analysis when comparing just two samples.

                So, if you promise to not use any such comparisons for actual science, here is how you do it:

                Start as above:

                Code:
                library(DESeq2)
                library(pasilla)
                data("pasillaGenes")
                countData <- counts(pasillaGenes)
                countData<-countData[,c("treated1fb","untreated1fb")]
                colData <- pData(pasillaGenes)[c("treated1fb","untreated1fb"),c("condition","type")]
                dds <- DESeqDataSetFromMatrix(
                       countData = countData,
                       colData = colData,
                       design = ~ condition)
                Now use DESeq2's new "rlog transformation". This replaced the VST we had before. It transforms the average of the genes across samples to a log2 scale but "pulls in" those genes for which the evidence for strong fold changes is weak due to low counts.

                Code:
                rld <- rlogTransformation( dds )
                As this is a logarithm-like scale, the differences between the samples can be considered as a "regularized log2 fold change". Let's make a result data frame:
                Code:
                res <- data.frame(
                   assay(rld), 
                   avgLogExpr = ( assay(rld)[,2] + assay(rld)[,1] ) / 2,
                   rLogFC = assay(rld)[,2] - assay(rld)[,1] )
                And now we have a ranking of genes by regularized fold changes:

                Code:
                > head( res[ order(res$rLogFC), ] )
                            treated1fb untreated1fb avgLogExpr    rLogFC
                FBgn0011260   7.830359     6.627326   7.228842 -1.203033
                FBgn0001226  10.128636     8.929985   9.529311 -1.198652
                FBgn0034718   8.503006     7.315640   7.909323 -1.187366
                FBgn0003501   7.927864     6.743974   7.335919 -1.183889
                FBgn0033635  11.126300     9.973979  10.550139 -1.152321
                FBgn0033367  13.411814    12.269436  12.840625 -1.142378
                This ranking put ones genes on top which are strongly downregulated in the second sample compared to the first one. If you do this with normal log expressions, the weak genes will appear at the top because they are noisiest and hence tend to have exaggerated fold changes.

                The advantage of this procedure is that it does not produce any p values (which would be misleading anyway).
                Hi everyone!
                just a silly question... why is a (-) and not a division (/) to calculate the rlogFC;

                res <- data.frame(
                assay(rld),
                avgLogExpr = ( assay(rld)[,2] + assay(rld)[,1] ) / 2,
                rLogFC = assay(rld)[,2] / assay(rld)[,1] )

                Thank you!!

                Comment

                • dpryan
                  Devon Ryan
                  • Jul 2011
                  • 3478

                  #23
                  Originally posted by FJwlf View Post
                  Hi everyone!
                  just a silly question... why is a (-) and not a division (/) to calculate the rlogFC;

                  res <- data.frame(
                  assay(rld),
                  avgLogExpr = ( assay(rld)[,2] + assay(rld)[,1] ) / 2,
                  rLogFC = assay(rld)[,2] / assay(rld)[,1] )

                  Thank you!!
                  The values are already log scale and log(a) - log(b) is the same as log(a/b).

                  Comment

                  • FJwlf
                    Junior Member
                    • Feb 2019
                    • 2

                    #24
                    Originally posted by dpryan View Post
                    The values are already log scale and log(a) - log(b) is the same as log(a/b).
                    Thank you!

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      New Genomics Tools and Methods Shared at AGBT 2025
                      by seqadmin


                      This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                      The Headliner
                      The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                      03-03-2025, 01:39 PM
                    • seqadmin
                      Investigating the Gut Microbiome Through Diet and Spatial Biology
                      by seqadmin




                      The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                      02-24-2025, 06:31 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, 03-20-2025, 05:03 AM
                    0 responses
                    17 views
                    0 reactions
                    Last Post seqadmin  
                    Started by seqadmin, 03-19-2025, 07:27 AM
                    0 responses
                    18 views
                    0 reactions
                    Last Post seqadmin  
                    Started by seqadmin, 03-18-2025, 12:50 PM
                    0 responses
                    19 views
                    0 reactions
                    Last Post seqadmin  
                    Started by seqadmin, 03-03-2025, 01:15 PM
                    0 responses
                    186 views
                    0 reactions
                    Last Post seqadmin  
                    Working...