Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DE analysis with multiple groups

    Hi,

    How can I do DE analysis on multiple groups. I did small RNA seq (miRNA) and I've fours group. I want to do a statisticak test like ANOVA or Kruskal-Wallis do detect a difference of expression between the groups . DESeq ? edgeR ?

    Thanks

    N.

  • #2
    nobody knows ?

    Comment


    • #3
      DESeq and edgeR both offer ANODEV tests.

      Comment


      • #4
        what's the function to do this ?

        I've a dataframe A with 20 samples ( 4 groups of 5 samples )

        Code:
        groups <- c(rep("A",5),rep("B",5),rep("C",5),rep("D",5))
        cds <- newCountDataSet(A,groups)
        cds <- estimateSizeFactors(cds)
        cds <- estimateVarianceFunctions(cds)
        
        res <- nbinomTest(cds,condA="A",condB="B") # two group test

        Comment


        • #5
          Sorry, I need to add this to the vignette. In your case, this should work:

          Code:
          groups <- c( rep("A",5), rep("B",5), rep("C",5), rep("D",5) )
          cds <- newCountDataSet( A, groups )
          cds <- estimateSizeFactors( cds )
          cds <- estimateVarianceFunctions( cds, "pooled" )
          
          fit0 <- nbinomFitGLM( cds, count ~ 1 )
          fit1 <- nbinomFitGLM( cds, count ~ condition )
          pvals <- nbinomGLMTest( fit1, fit0 )
          Note that this is a good deal slower than the standard test.

          If you have more than one factor, pass a data frame to 'newCountDataSet' instead of 'groups' and then, you can write down more interesting model formulae than the one in this example, using the data frame's column names instead of 'condition'.

          Simon

          Comment


          • #6
            ok I'll try this.

            Thanks Simon

            Comment


            • #7
              Dear Simon,
              Is there any place where I can find more information about performing ANODEV analysis using DESeq?
              Alternatively, when do you think the vignette will include this information?
              Thanks!
              RSK

              Comment


              • #8
                Dear Simon,

                I am currently working with RNA-seq datasets in 5 conditions without replicates. It is so lucky for me to see this vignette, Howerer, I found that I can't use "pooled" methods in estimateVarianceFunctions. Or alternatively, I can't go through nbinomFitGLM with "blind" methods in estimateVarianceFunctions. Both error msgs is attached below. But in the nbinomFitGLM function description, it says:

                Use this function to estimate coefficients and calculate deviance
                from a GLM for each gene. The GLM uses the ‘nbkd.sf’ family, with
                the dispersion estimate according to ‘getVarianceFunction(cds)’.
                Note that this requires that the variance functions were estimated
                with method "pooled" or "blind".

                When I was trying to use "method=pooled" in estimateVarianceFunctions

                Code:
                Error in rowSums(sapply(tapply((1:ncol(counts))[replicated_sample], factor(conditions[replicated_sample]),  : 
                  'x' must be an array of at least two dimensions
                When I was using "method=blind" instead.

                Code:
                Error in nbinomFitGLM(cds, count ~ 1) : 
                  No pooled variance function found. Have you called 'estimateVarianceFunctions' with 'method="pooled"'?



                my code is:

                Code:
                library(DESeq);
                list=read.table("../data/5_condition.txt",row.names=1,header=T);
                conds <- c("CFGU","CFGW","CFHG","CFHH","CFHI");
                cds <- newCountDataSet( list,conds);
                cds <- estimateSizeFactors( cds );
                cds <- estimateVarianceFunctions(cds, method='blind');
                fit0 <- nbinomFitGLM( cds, count ~ 1 )
                fit1 <- nbinomFitGLM( cds, count ~ condition )
                result <- nbinomGLMTest( fit1, fit0 )
                Could you tell me what is my mistake here?
                Thanks in advance,

                Hao Chen



                Originally posted by Simon Anders View Post
                Sorry, I need to add this to the vignette. In your case, this should work:

                Code:
                groups <- c( rep("A",5), rep("B",5), rep("C",5), rep("D",5) )
                cds <- newCountDataSet( A, groups )
                cds <- estimateSizeFactors( cds )
                cds <- estimateVarianceFunctions( cds, "pooled" )
                
                fit0 <- nbinomFitGLM( cds, count ~ 1 )
                fit1 <- nbinomFitGLM( cds, count ~ condition )
                pvals <- nbinomGLMTest( fit1, fit0 )
                Note that this is a good deal slower than the standard test.

                If you have more than one factor, pass a data frame to 'newCountDataSet' instead of 'groups' and then, you can write down more interesting model formulae than the one in this example, using the data frame's column names instead of 'condition'.

                Simon

                Comment


                • #9
                  hi,

                  just for reference the function is now called
                  fitNbinomGLMs

                  Comment


                  • #10
                    Differential expression DESeq using GLM for timecourse experiment

                    Dear Members
                    I am currently working on NGS data and performed some differential expression analysis using DESeq. At the moment I am trying to find differentially expressed miRNAs using GLM functionality.
                    My experimental design consists of 4 samples (4stages of developing flower) with 2 biological replicates each. I would like to identify the differentially expressed miRNA during different developmental stages(Stage 1,2,3 and 4). However, I have some doubts regarding my final data interpretation and here is the code I used and the output.

                    My Dataset:
                    >Str(SSSA)
                    Dataframe: 662 obs. of 8 variables:
                    $ Flower1 : int 1 1 6 0 0 0 6709 0 3 0 ...
                    $ Flower1.1: int 1 1 1 0 0 0 2261 2 1 2 ...
                    $ Flower2 : int 0 4 49 0 0 0 7311 0 4 0 ...
                    $ Flower2.1: int 1 14 69 0 0 0 11211 0 1 0 ...
                    $ Flower3 : int 0 8 89 1 0 0 1978 0 0 1 ...
                    $ Flower3.1: int 0 4 72 0 0 0 5392 0 1 1 ...
                    $ Flower4 : int 0 1 67 0 0 0 4675 0 0 0 ...
                    $ Flower4.1: int 0 0 21 0 0 0 4629 0 1 5 ...


                    My code
                    groups <- c( rep("F1",2), rep("F2",2), rep("F3",2), rep("F4",2) )
                    cds <- newCountDataSet(SSSA , groups )
                    cds <- estimateSizeFactors( cds )
                    cds <- estimateDispersions( cds, "pooled" )
                    fit0 <- fitNbinomGLMs ( cds, count ~ 1 )
                    fit1 <- fitNbinomGLMs ( cds, count ~ groups )
                    pvals <- nbinomGLMTest( fit1, fit0 )
                    padjGLM <- p.adjust( pvals, method="BH" )
                    fit1.pval <- cbind(fit1, padjGLM)

                    My Result
                    > str(fit1.pval)
                    'data.frame': 662 obs. of 7 variables:
                    $ (Intercept): num 1.23 1.16 2.77 -32.46 NA ...
                    $ groupsF2 : num -3.17 1.07 2.17 -2.1 NA ...
                    $ groupsF3 : num -32.569 0.859 2.973 30.973 NA ...
                    $ groupsF4 : num -31.675 -2.008 2.835 -0.875 NA ...
                    $ deviance : num 0.746 1.895 2.041 0.234 NA ...
                    $ converged : logi TRUE TRUE TRUE TRUE NA NA ...
                    $ pval : num 0.28395 0.44052 0.00622 0.86603 NA ...

                    My doubts.....
                    1.In my case, is it better to set the conditions as a data frame instead of a vector- “groups <- c( rep("F1",2), rep("F2",2), rep("F3",2), rep("F4",2) )”?
                    2.In the code, I calculated the object fit0 setting the value 1 after the tilde -“fit0 <- fitNbinomGLMs ( cds, count ~ 1 )” but I am not very clear.. what count ~ 1 stand for/refers to ?

                    3.My final results( above) shows 7 variables(including the adjusted pvalue) in the data.frame and I am not clear in interpreting this data. In particular , by using ‘groups’ in calling fit 1 i.e. fit1 <- fitNbinomGLMs ( cds, count ~ groups ),I can’t understand if I really compared F1 against all other stages F2,F3,F4, as I intend or If I made any other comparisions.

                    Could anyone help me on this?

                    Jay

                    Comment


                    • #11
                      No body knows ?

                      Comment


                      • #12
                        Please do not cross-post. The reply is in the other thread.

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          Addressing Off-Target Effects in CRISPR Technologies
                          by seqadmin






                          The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...
                          08-27-2024, 04:44 AM
                        • seqadmin
                          Selecting and Optimizing mRNA Library Preparations
                          by seqadmin



                          Sequencing mRNA provides a snapshot of cellular activity, allowing researchers to study the dynamics of cellular processes, compare gene expression across different tissue types, and gain insights into the mechanisms of complex diseases. “mRNA’s central role in the dogma of molecular biology makes it a logical and relevant focus for transcriptomic studies,” stated Sebastian Aguilar Pierlé, Ph.D., Application Development Lead at Inorevia. “One of the major hurdles for...
                          08-07-2024, 12:11 PM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 08-27-2024, 04:40 AM
                        0 responses
                        16 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 08-22-2024, 05:00 AM
                        0 responses
                        293 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 08-21-2024, 10:49 AM
                        0 responses
                        135 views
                        0 likes
                        Last Post seqadmin  
                        Started by seqadmin, 08-19-2024, 05:12 AM
                        0 responses
                        124 views
                        0 likes
                        Last Post seqadmin  
                        Working...
                        X