Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #91
    Can an updated ENSEMBL database release have an effect on DESeq2 results?

    Well, obviously it can, but the changes in the output I see are rather dramatic.

    Here's in brief my situtation:

    Data from an RNAseq experiment (mouse) was
    mapped to GRCm38p2 in Feb. 2014, using STAR,
    counted with HTSeq (latest version),
    DEG estimations done with DESeq2 (latest version at that time)
    and Ensemble release e75 GTF file.
    Results were interesting and made sense, biologically.

    Now we remapped everything to GRCm38p3
    using Ensemble release e78 GTF and latest versions of those programms (which have not changed much).
    Results are radically different, with some important changes gone, (which we validated by qPCR!)
    (example: a gene with log2FC=-0,63, padj=0.0688 in the first mapping, and log2FC=-0.21, padj=0.22 in the second) - however, the counts for each sample and on average (baseMean) are very very similar, so I guess it's not the mapping that makes the difference.

    Notably the number of annotated genes found in e78 is much higher compared to e75 so I expect this has an influence on the adjusted p-value - but how can it affect the log2FC!? Could it be that the number of genes tested affects the gene models / dispersions that DESeq2 estimates?

    I would be happy about any comments.

    Thanks.

    Comment


    • #92
      Presumably e78 added a bunch of overlapping genes that are resulting in heavily altered counts for those genes.

      Comment


      • #93
        Originally posted by dpryan View Post
        Presumably e78 added a bunch of overlapping genes that are resulting in heavily altered counts for those genes.
        Well, at least the raw counts are very(!) similar, not to say: almost exactly the same for most genes.
        Also, with more/better genome information, I think, the DEG estimations should be getting better, not worse, right? We confirmed a couple of genes by qPCR and in many cases they are showing surprisingly accurate matches of the RNAseq results, but then again, we do find some genes where this nice match is completely lost using e78...

        Comment


        • #94
          I wonder if the additional genes in the analysis are skewing the prior in either the fits or the p-value adjustment (just adding a background uniform p-value distribution in that case). That'd be the other likely possibility. What happens if you simply exclude the added genes from the analysis?

          Comment


          • #95
            re: changing the gene model from e75 to e78: Although it might seem at first that having the same read count for a given gene should mean that the results are identical, consider that all the parameter estimation steps involve looking at all the genes: including the library size normalization, the dispersion estimation, the posterior log fold changes, and the p-value adjustment (as Devon points out). If you want less dependence between genes, you could try using the setting betaPrior=FALSE with the DESeq() function, and then the inference is on maximum likelihood estimates of log fold change. This is a choice for the investigator, if they prefer slightly noisier LFC estimates, but an analysis such that individual genes will have less effect on each other's fold changes. But there will still be dependence between genes on dispersion estimation and p-value adjustment, so you shouldn't expect the p-values to be identical after changing gene models. The inference is nearly identical in terms of hypothesis testing, although genes on the margin of a significance boundary can have p-values move slightly one way or the other.

            Comment


            • #96
              Dear Michael and Devon,

              thank you for taking the time to think about this and write an answer.
              These are importnat consideradtions and we will take them into account now.

              The other thing that has changed between the two analyses is of course the DESeq2 version (v1.4.1 vs 1.6.3), which might also influence results slightly, I guess.

              Removing the new IDs is not really an option because it not only new ones, but also substitutes and other changes that are updated with new ensembl releases.

              We'll try the option beta-prior=false to see what effect this has on our data set.

              Thanks again,

              Roman

              Comment


              • #97
                Hi all,

                I am new to DEseq2. Maybe this is silly question, but I do not understand why meanSdPlot fails.... any ideas to solve this problem?


                library("vsn")
                par(mfrow=c(1,3))
                notAllZero <- (rowSums(counts(dds))>0)
                meanSdPlot(log2(counts(dds,normalized=TRUE)[notAllZero,] + 1))
                meanSdPlot(assay(rld[notAllZero,]))
                meanSdPlot(assay(vsd[notAllZero,]))

                Error: could not find function "meanSdPlot"


                Thanks

                Comment


                • #98
                  Hello!

                  I'm trying DESeq2 with synthetic RNA-seq count data without replicates (40% of the synthetic genes are differentially expressed (DE), with a up - and downregulation ratio of 50%). Although DESeq2 detects these ratios correctly, it doesn't detect the DE genes. The result table is empty, and my MAplot doesn't show differential expression either. Is there any solution to this? Do I have to change any parameter?

                  Comment


                  • #99
                    When you ran DESeq(), you should have seen the following message printed to console:

                    Code:
                    estimating dispersion by treating samples as replicates.
                    read the ?DESeq section on 'Experiments without replicates'
                    Did you read this section?

                    Comment


                    • Yes, I did. And I'm sorry, since I've seen that this problem is treated in other posts (such as "DESeq2 without biol replicates"). I tried the following code, but it doesn't work by now (I need to obtain a result table with statistics like the adjusted p-values and lfc):

                      design_vector <- c("C", "T")

                      coldat=DataFrame(grp=factor(design_vector), each=1)
                      dds <- DESeqDataSetFromMatrix(raw_filter, colData=coldat, design =~grp)
                      dds <- DESeq(dds)

                      rld <- rlogTransformation( dds )
                      NOTE: fitType='parametric', but the dispersion trend was not well captured by the
                      function: y = a/x + b, and a local regression fit was automatically substituted.
                      specify fitType='local' or 'mean' to avoid this message next time.

                      deseq2.res <- results(dds)


                      What do I have to change? Cause I'm sure I'm doing it completely wrong

                      Comment


                      • I guess the point I want to emphasize is the quote from the original DESeq paper, regarding the without-replicates analysis:

                        "Some overestimation of the variance may be expected, which will make that approach conservative."
                        Furthermore, "while one may not want to draw strong conclusions from such an analysis, it
                        may still be useful for exploration and hypothesis generation."

                        The analysis without replicates is very conservative, and only recommended for exploration (for example, you can look at which log fold changes are largest in the MA plot).

                        So to answer your question from before:

                        "it doesn't detect the DE genes. The result table is empty, and my MAplot doesn't show differential expression either"

                        The without-replicates analysis is not expected to produce small p-values or adjusted p-values. We have overestimated the variance by treating the samples from different conditions as replicates for the purpose of estimating variance.

                        Comment


                        • The without-replicates analysis is not expected to produce small p-values or adjusted p-values. We have overestimated the variance by treating the samples from different conditions as replicates for the purpose of estimating variance
                          Ok, so when estimating the variance this way, it is not sufficient to detect DE genes? I know that to work with no replicates is not appropiate, but in these cases I need to provide some information of the data for the end user . Isn't there some way to obtain a DE genes list (even if the results are to be taken with care)?
                          (I think that it was possible with DESeq)

                          Comment


                          • DESeq() in DESeq2 also estimates p-values, and adjusted p-values (for those genes which pass independent filtering step).

                            I would guess that you are filtering with a p-value or adjusted p-value threshold which is less than all the p-values or adjusted p-values in the object returned by results(), and generating a 0-row dataframe.

                            Comment


                            • Hi Michael,

                              You were absolutely right I had filtered the object returned from the results() function with a minimum log2foldChange of 1.5 and an adjusted p-value of 0.01. It was definitively too much. The lfc of the result data is less than 1 in most cases, and the adj p-value is superior than one...

                              With no filtering, I get the results table with the statistics I need, but no DE genes are detected and so none are plotted (with red points). Is there some way tu increase the statistical power (maybe in the results() function)?Or to change some parameters in order to mark genes as differentially expressed (even if their values are not magnific)?

                              Thank you for your time =)

                              Comment


                              • Here's a bit random and fascinating observation I've made when using DESeq2.

                                I've had three groups of experiments - with conditions labelled as "B", "12", and "24". I've ran pairwise comparison with donor-sensitive design on all three groups, getting differential expression for them in the same way:

                                B vs 12 (cond1 = B, cond2 = 12)
                                B vs 24 (cond1 = B, cond2 = 24)
                                12 vs 24 (cond1 = 12, cond2 = 24)

                                so, in first two cases, log2FC was reported for cond1/cond2 (i.e. positive numbers were for genes down-regulated at 12 or 24). In the latter case, log2FC was reported for cond2/cond1.

                                The differences were subtle enough that it took me a long long while to catch this error. I figured it's probably due to some sorting - either internal or in the condition file. Maybe it would make sense to write explicitly in the logs which one is cond1 and which one is cond2?

                                Cheers

                                Comment

                                Latest Articles

                                Collapse

                                • seqadmin
                                  Choosing Between NGS and qPCR
                                  by seqadmin



                                  Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                                  10-18-2024, 07:11 AM
                                • seqadmin
                                  Non-Coding RNA Research and Technologies
                                  by seqadmin




                                  Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                                  Nobel Prize for MicroRNA Discovery
                                  This week,...
                                  10-07-2024, 08:07 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by seqadmin, Yesterday, 05:31 AM
                                0 responses
                                10 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-24-2024, 06:58 AM
                                0 responses
                                20 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-23-2024, 08:43 AM
                                0 responses
                                48 views
                                0 likes
                                Last Post seqadmin  
                                Started by seqadmin, 10-17-2024, 07:29 AM
                                0 responses
                                58 views
                                0 likes
                                Last Post seqadmin  
                                Working...
                                X