Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • roseadele
    Junior Member
    • Nov 2011
    • 1

    #31
    DESeq

    I have a table of contants in excell containing the name of the genes, the conditions and the number of reads per each gene in each consition. How can I use this data in DESeq packcage? How I put the table in R?

    Comment

    • Gators
      Member
      • Feb 2011
      • 22

      #32
      Originally posted by cascoamarillo View Post

      So it has been removed from the new version, or what does it mean?

      Thanks
      Yes, it has been replaced

      Comment

      • arvid
        Senior Member
        • Jul 2011
        • 156

        #33
        Originally posted by roseadele View Post
        I have a table of contants in excell containing the name of the genes, the conditions and the number of reads per each gene in each consition. How can I use this data in DESeq packcage? How I put the table in R?
        Googling "read data from Excel R" gives me 136 millions answers, and the first ten looked clear and simple. The rest of the information is found in the DESeq manual (search for "Analysing RNA-Seq data with the "DESeq" package"), which is nicely written with clear examples.
        You might need a R tutorial if you are not familiar with it; you could start here: http://cran.r-project.org/doc/manuals/R-intro.html.

        Comment

        • crh
          Member
          • Dec 2009
          • 46

          #34
          DESeq w/o replicates - padj

          Hi New to RNASeq.

          We are looking at data w/o replicates (bad I know, but $$ prohibited).
          Can someone explain how I interpret padj values =1. I believe this is a measure of FDR type I error?

          In the data below, we appear to have 4 genes that are significantly DE?
          I know that w/o replicates we are underestimating the true DE discovery..

          Charles

          deseq_id gene_counts(nano) gene_counts(ctrl) baseMean baseMeanA baseMeanB foldChange log2FoldChange pval padj
          9600 174 13 83.74641874 16.32121019 151.1716273 9.262280527 3.211367453 0.000890382 0.740886757
          10604 227 19 110.5361169 23.85407643 197.2181574 8.267692025 3.047484649 0.001206005 0.771936026
          8063 593 59 294.6365205 74.07318469 515.1998562 6.955281569 2.79810892 0.001591703 0.88218547
          9821 245 23 120.8662944 28.87598725 212.8566016 7.371405167 2.881939658 0.001793433 0.88218547
          680 61 4 29.00943031 5.021910827 52.9969498 10.55314434 3.399601013 0.002231307 1
          8550 402 44 202.2498031 55.24101909 349.2585872 6.322450109 2.660483748 0.002796612 1

          Comment

          • Simon Anders
            Senior Member
            • Feb 2010
            • 995

            #35
            No, you have nothing.

            An FDR of 0.1 (i.e., 10%), for example, means that your gene list contains at most an estimated 10% of false positives. To get such a list, you take all genes with padj<.1.

            Thus, padj=1 means that you cannot include the gene even if you are willing to accept 99% false positives.

            What I never understand is why people claim that lack of money precluded them from doing replicates. First, you now have wasted all the money you paid for the sequencing run, because without replicates it is highly unlikely to ever get useful results.

            Second, while it may have been expensive to obtain replicate samples it is not expensive to sequence additional samples. After all, having twice as many samples does not mean that you need to use twice as many lanes. You simply use multiplexing to sequence each sample to only half the depth and still get more statistical power than with fewer samples at more depth. The only extra expense is the additional library prep kits, not the sequencing itself.

            Comment

            • crh
              Member
              • Dec 2009
              • 46

              #36
              No replicates

              Duly noted.

              c

              Comment

              • vyellapa
                Member
                • Oct 2011
                • 59

                #37
                Originally posted by Simon Anders View Post
                The purpose of the 'blind' method was never to offer a proper analysis method for experiments without replication, because is is simply not possible (not just "dangerous") to get conclusions. The whole point of replicates is to allow you to draw the line for significance, i.e., to know how much fold change you need to see to consider an effect real. Without replicates, you can guess, of course, but it has to be a wild guess, unless you are happy with the extremely over-careful guess that e.g. the "blind" method give you.
                Is the "guess work" similar to what cuff-diff when replicates are not provided. There seems to be some mathematical modeling that cuff-diff does that I don't completely understand. Is the method 'blind' for a non statistical person to understand mentioned anywhere?

                Comment

                • Simon Anders
                  Senior Member
                  • Feb 2010
                  • 995

                  #38
                  If no replicates are provided, there is no way to know the real biological variability, and hence there are at least two options:

                  (i) You can ignore the issue by (implicitly) postulating the biological variance to be zero. Unfortunately, this is the option most commonly chosen in the literature, despite the fact that it is clearly untenable and will lead to nearly all strongly expressed genes being called differentially expressed if you have sequenced deeply. Cuffdiff, in the versions described in the papers, also suffered from this flaw, but I don't know what the current version does. A way to find out might be to compare if you get more or less hits if you apply the tool of your choice first on a dataset with replicates and then on only two samples from this dataset, one from each treatment group. If you get more significant hits with less data, this would hint at biological variation not being properly accounted for.

                  (ii) If you think that only very few genes are differentially expressed, you can pretend that your two samples are replicates with respect to the majority of genes, and use this to assess variability. You might strongly overestimate variance that way and dramatically lose power. In other words: you only consider those genes as differentially expressed that differ so much more between the two samples than nearly all other samples that they "stick out" very prominently. This is what DESeq's "blind" approach attempts. Obviously, you typically only get very few hits this way, and even these could be just fluke findings. See the vignette and the paper for details.

                  Wu et al. (BMC Bioinformatics 2010, 11:564) tried to find a middle ground here but I have not heard about any practical experiences with their approach. Anybody here tried that?

                  Comment

                  • gstitan
                    Junior Member
                    • Oct 2009
                    • 7

                    #39
                    prb with DESeq with estimateVarianceFunctions

                    Originally posted by Simon Anders View Post
                    Start R, load DESeq, and type "?estimateVarianceFunctions". If you don't see anything there about 'metho', you have an old DESeq version.

                    Simon
                    Hey Simon,

                    I try to use DESeq. "?estimateVarianceFunctionse" give me :
                    ...
                    "Usage:

                    estimateVarianceFunctions(cds, method = c( "normal", "blind", "pooled" ),
                    pool = NULL, locfit_extra_args = list(), lp_extra_args = list(),
                    modelFrame = NULL )"

                    but when I use it, I obtain a error message :
                    "cds <- estimateVarianceFunctions(cds,method="blind")
                    Erreur : tentative d'appliquer un objet qui n'est pas une fonction"

                    So I don't understand why. Before this line, I do
                    cds <- newCountDataSet(countsTable,conds)

                    cds <- estimateSizeFactors(cds)

                    and it's works but not this method "estimateVarianceFunctions".

                    Can you help me please ?

                    Thanks

                    Comment

                    • Simon Anders
                      Senior Member
                      • Feb 2010
                      • 995

                      #40
                      You must have managed to override the definition of estimateVarianceFunctions further up in your session. Independent of that, please update to a current version of R and Bioconductor.

                      Comment

                      • vyellapa
                        Member
                        • Oct 2011
                        • 59

                        #41
                        If I am trying to find differentially expressed genes between say tumor and relapse samples and I have 3 samples each for tumor and relapse patients. Can I group 3 tumor patients as replicates and do the same for relapse samples to get the differentially expressed genes between tumor and relapse cases?

                        Would such grouping cause any weird results to inaccurate variance estimation that would result from 1)biological noise 2) between sample variation

                        Comment

                        • Simon Anders
                          Senior Member
                          • Feb 2010
                          • 995

                          #42
                          Sure, it is correct to group in this manner. Of course, you will not get any results due to the high between-group variance, but I guess you know that there is no chance of finding differences between tumour types with so few samples.

                          Comment

                          • BFM
                            Member
                            • Jun 2014
                            • 10

                            #43
                            Hi i am using DEseq with no replicates

                            > conds <- factor( c( "A-Mock", "A-Infect", "B-Mock", "B-infect" ) )

                            i need to compare the diff expression between A-mock and A infect similarly B mock B infected. It doesnt seem to work i am using

                            res <- nbinomTest( cds, "A-mock", "A-infect", )
                            > res <- nbinomTest( cds, "B-mock", "B-infect", )

                            but at the end i am getting only one p value. How to solve this problem. Please help

                            Comment

                            • Jeremy
                              Senior Member
                              • Nov 2009
                              • 190

                              #44
                              Those are two different tests, but you are overwriting the first reslt object (res) with the second.

                              res <- nbinomTest( cds, "A-mock", "A-infect", )
                              res <- nbinomTest( cds, "B-mock", "B-infect", ) # replaces the above result with the new one

                              Call one resA and the other resB for example. In the end you might want to merge the two for comparison or just write out both as separate tables.

                              resA <- nbinomTest( cds, "A-mock", "A-infect", )
                              resB <- nbinomTest( cds, "B-mock", "B-infect", )

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM
                              • SEQadmin2
                                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                by SEQadmin2

                                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                05-06-2026, 09:04 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Today, 08:59 AM
                              0 responses
                              11 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              21 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              17 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              31 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...