Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #31
    Now i'm getting this error:

    Code:
    Warning messages:
    1: glm.fit: algorithm did not converge 
    2: In estimateDispersionsFit(object, fitType = fitType, quiet = quiet) :
      parametric fit failed, trying local fit. use plotDispEsts() to check the quality of the local fit
    This is my code:

    Code:
    #Count matrix input
    Cele_SPvsEB = read.csv (file.choose(), header=TRUE, row.names=1)
    CeleDesign <- data.frame(
      row.names = colnames(Cele_SPvsEB$target_id),
      condition = factor(c("SP", "SP", "EB", "EB")))
    dds <- DESeqDataSetFromMatrix(countData = Cele_SPvsEB,
                                  colData = CeleDesign,
                                  design = ~ condition)
    dds
    I obtained this:

    Code:
    class: DESeqDataSet 
    dim: 198821 4 
    exptData(0):
    assays(1): counts
    rownames(198821): comp100_c0_seq1 comp100004_c0_seq1 ... comp99994_c0_seq1 comp99999_c0_seq1
    rowData metadata column names(0):
    colnames(4): 1 2 3 4
    colData names(1): condition
    And after running:

    Code:
    #Differential expression analysis
    dds <- DESeq(dds)
    (...) the error warning is appearing.

    Any idea how to fix that?

    I also want to change the colnames, how can i do that?

    Thanks.

    Comment


    • #32
      You don't need to fix anything.

      There are 3 levels of messages to the user in R: message, warning and stop/error. A warning is a non-fatal problem. In this case, I am warning the user that the default method for fitting the dispersion trend line didn't work so the software is doing something else instead.

      The way to read the warning is that "algorithm did not converge" within the function estimateDispersionsFit. Then it says that the parametric fit failed, and that a local fit will be tried instead.

      If you look up ?estimateDispersionsFit it will link over to ?estimateDispersions, here I describe the 3 options for the fitType argument, and the default is parametric. The local fit usually does well as a substitute for the parametric fit, but I also warn the user that they should manually examine the plot generated by plotDispEsts(dds). You should check this plot to make sure the local fit is capturing the trend in the black points, as it should be. If the local fit does not seem to capture the trend in the black points, you can use the last option fitType="mean" instead. Note that the fitType arguments can be provided to DESeq() as well as to estimateDispersions().

      For changing columns names, check ?colnames.
      Last edited by Michael Love; 10-17-2013, 06:13 AM.

      Comment


      • #33
        The local fit usually does well as a substitute for the parametric fit, but I also warn the user that they should manually examine the plot generated by plotDispEsts(dds). You should check this plot to make sure the local fit is capturing the trend in the black points, as it should be.
        This is my dispersion plot, is not exactly like in the manual... so i don't know what to think.
        Can you give me your opinion, please?

        Thanks,
        Attached Files

        Comment


        • #34
          As there is not a clear dependence of the dispersion estimates on the mean of normalized counts, I would recommend to use fitType="mean".

          Comment


          • #35
            This is what I got with default setting (parametric), does it look ok or should I try mean or local fit?

            One question:
            What happens when I increase the "maxit" value? From 100 to 200, 159 was reduced to 110 rows that did not converge in beta. I dont know what this means.

            Thanks!

            Rplot.pdf

            Comment


            • #36
              This dispersion plot looks good/typical.

              re:maxit: Unlike with linear regression, there is not a closed form solution for generalized linear models (the model used by DESeq2). Instead there is an iterative algorithm to find the optimal betas (the log fold changes). 'maxit' is the maximum number of iterations to run, and if the convergence criteria has not been met then the software prints a warnings and tells you how to find these genes for which this was the case.

              However, I believe most datasets would have all genes converge if you used DESeq2 v1.2 which was just released.

              This involves upgrading R to 3.0.2 and Bioconductor to v2.13. Information here:

              The Bioconductor project aims to develop and share open source software for precise and repeatable analysis of biological data. We foster an inclusive and collaborative community of developers and data scientists.

              Comment


              • #37
                I upgraded all as you said, but still 159 not fitted at maxit 100.

                Comment


                • #38
                  Then these might just be difficult rows for some reason.

                  You can examine them with:

                  head(counts(dds[!mcols(dds)$betaConv,]))

                  You can exclude them from the results table with:

                  res <- results(dds[mcols(dds)$betaConv,])

                  Comment


                  • #39
                    Originally posted by Michael Love View Post
                    As there is not a clear dependence of the dispersion estimates on the mean of normalized counts, I would recommend to use fitType="mean".
                    Hi,

                    By doing what you suggested:

                    Code:
                    dds<- estimateSizeFactors(dds)
                    ddsMean <- estimateDispersions(dds, fitType="mean")
                    ddsMean <- nbinomWaldTest(ddsMean)
                    plotDispEsts(ddsMean)
                    I'm obtaining the subsequent plot.
                    It's really weird, right?
                    Should i continue with fitType="mean"?

                    Thanks.
                    Attached Files

                    Comment


                    • #40
                      There is a lot of shrinkage of dispersion estimates (the blue points are close to the red line) presumably because you have few degrees of freedom (number of samples minus number of parameters to estimate). Yes, I would use fitType="mean" for this dataset as there is not a clear trend.

                      Comment


                      • #41
                        Originally posted by Michael Love View Post
                        There is a lot of shrinkage of dispersion estimates (the blue points are close to the red line) presumably because you have few degrees of freedom (number of samples minus number of parameters to estimate). Yes, I would use fitType="mean" for this dataset as there is not a clear trend.
                        Hi Michael,

                        First of all, thank you for all the help.

                        I tried fitType="mean" with the code:

                        Code:
                        #Count matrix input
                        Cele_SPvsLR = read.csv (file.choose(), header=TRUE, row.names=1)
                        CeleDesign <- data.frame(
                          row.names = colnames(Cele_SPvsLR),
                          condition = factor(c("SP", "SP", "LR", "LR")))
                        dds <- DESeqDataSetFromMatrix(countData = Cele_SPvsLR,
                                                      colData = CeleDesign,
                                                      design = ~ condition)
                        dds
                        
                        
                        #Est size factor = normalize for library size
                        dds<- estimateSizeFactors(dds)
                        ddsMean <- estimateDispersions(dds, fitType="mean")
                        ddsMean <- nbinomWaldTest(ddsMean)
                        #For a parametric dispersion
                        dds <- DESeq(dds)
                        #plot dispersion
                        plotDispEsts(ddsMean)
                        plotDispEsts(dds)
                        
                        #Differential expression analysis
                        resultsNames(ddsMean)
                        res <- results(ddsMean, name= "condition_SP_vs_LR") 
                        res <- res[order(res$padj),]
                        head(res)
                        plotMA(ddsMean)
                        sum(res$padj < .1, na.rm=TRUE)
                        This resulted in:

                        Code:
                        > sum(res$padj < .1, na.rm=TRUE)
                        [1] 0
                        But when I tried fitType="local" I get:

                        Code:
                        > sum(res$padj < .1, na.rm=TRUE)
                        [1] 337
                        (*see attached plots)

                        The other "weird" (at least for me) thing is that by filtering by overall count to my fitType="local" results I get less genes:

                        Code:
                        > plot(res$baseMean, pmin(-log10(res$pvalue), 50),
                        + log="x", xlab="mean of normalized counts",
                        + ylab=expression(-log[10](pvalue)))
                        Warning message:
                        In xy.coords(x, y, xlabel, ylabel, log) :
                          19407 x values <= 0 omitted from logarithmic plot
                        > abline(v=10, col="red", lwd=1)
                        > use <- res$baseMean >= 10 & !is.na(res$pvalue)
                        > table(use)
                        use
                         FALSE   TRUE 
                         93445 105376 
                        > resFilt <- res [use, ]
                        > resFilt$padj <- p.adjust(resFilt$pvalue, method="BH")
                        > sum(resFilt$padj < .1, na.rm=TRUE)
                        [1] 152
                        Just as a refresher, I'm working with 4 samples (2 replicates/reproductive stages).

                        Do you think that it is a bad idea to use fitType="local" given that it is the fitType that gives me results?

                        Can you help me understand why by filtering my "local" results I am obtaining less genes?

                        Thanks, thanks, and thanks.
                        Attached Files

                        Comment


                        • #42
                          The local fit helps you out by dipping down at the high counts, giving you lower estimates of dispersion there than by taking the mean overall. The large circles are genes whose dispersion estimates are not shrunk because they deviate too far from the fitted line. The local fit is useable as well I would say.

                          I presume -- because you haven't posted the output of sessionInfo() -- that you could be using a version of DESeq2 which performs automatic independent filtering to optimize the number of rejections. Check the Details section of ?results and the argument 'independentFiltering'.

                          Comment


                          • #43
                            Hi!
                            Quick question; what is the default normalisation method in DESeq2?
                            Cant find anything in the manual.

                            Thanks!

                            Comment


                            • #44
                              estimateSizeFactors() is the same function used for normalization as in DESeq, so the method is the same median ratio method as in DESeq. I can try to make this more clear in the man page for DESeq().

                              Comment


                              • #45
                                Ok, but typing in "estimateSizeFactors()" to look for options gives no info just (object, ...)

                                But what is the default? Im curious because Im running edgeR with "upper quartile".

                                Thanks!

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Choosing Between NGS and qPCR
                                  by seqadmin



                                  Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                                  10-18-2024, 07:11 AM
                                • seqadmin
                                  Non-Coding RNA Research and Technologies
                                  by seqadmin




                                  Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                                  Nobel Prize for MicroRNA Discovery
                                  This week,...
                                  10-07-2024, 08:07 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 05:31 AM
                                0 responses
                                10 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-24-2024, 06:58 AM
                                0 responses
                                20 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-23-2024, 08:43 AM
                                0 responses
                                48 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-17-2024, 07:29 AM
                                0 responses
                                58 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X