Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Two-way ANOVA, replicates, software.

    Hi,

    I have a couple quick questions regarding experimental setup and analysis. I am trying to figure out how to best design my experiment to get the information I need while also minimizing cost. I am using an inducible transgene to identify the targets of my transcription factor with mRNA-seq. I am setting up my experiment with a two- way ANOVA design:

    Inducible Line (not treated) ; Inducible line (treated with inducing chemical)
    Wild-type control (not treated); Wild-type control (treated with inducing chemical)

    At a minimum, I plan to submit two biological replicates for sequencing for both lines/conditions. I am also hoping to using additional time points post induction (again, will depend on cost).

    Question 1) I know more replicates are better, but there is the cost limitation. How much will my statistical power increase by including a 3rd replicate? Is it likely that just two replicates would be useful or is this something that needs to be determined empirically? I have read the other threads on the website about the Fisher's Exact Test (two replicates) vs. T-test (need 3 replicates), but don't know if the value of the 3rd replicate similarly holds true for the two-way ANOVA.

    Question 2) When I have my data, I will need software to accommodate the two-way ANOVA analysis. Do most NGS analysis software packages offer two-way ANOVA analysis? Does anyone have any recommendations? I am working with Arabidopsis.

    If anyone has any insight into either of these questions, I would be grateful for your input. Thanks!

    eggplant72

  • #2
    Some comments in random order:

    - Fisher's exact test, as usually employed, cannot deal with any replicates, and hence should not be used.

    - Replicates are not as much a cost issue as people seem to think, because you can multiplex several samples on one lane. Hence, you should decide how many lanes you can afford and how many samples you can obtain. The best approach, in my opinion, is to tag all the samples with multiplexing tags, pool them and spread them over the available lanes. (See e.g. Doerge and Auer, 2010.)

    - Power depends more on sequencing depth than replicate number. This is because once you share information across genes, as done by edgeR and DESeq, it does not make that much difference whether you have two or three replicates. If you don't share information (e.g., do a standard t test), you won't get anywhere with less than, say, six or seven replicates.

    - Nevertheless, if you want 100 counts for a given gene, you are better off getting them from four replicate samples, each sequenced to 25, than two sequenced to 50 each.

    - Outliers are an annoying issue, and more replicates help here.

    - edgeR and DESeq both support two-way anova, BaySeq does not (if I recall correctly), cuffdiff neither.

    Comment


    • #3
      Originally posted by Simon Anders View Post
      Some comments in random order:

      - Fisher's exact test, as usually employed, cannot deal with any replicates, and hence should not be used.

      - Replicates are not as much a cost issue as people seem to think, because you can multiplex several samples on one lane. Hence, you should decide how many lanes you can afford and how many samples you can obtain. The best approach, in my opinion, is to tag all the samples with multiplexing tags, pool them and spread them over the available lanes. (See e.g. Doerge and Auer, 2010.)

      - Power depends more on sequencing depth than replicate number. This is because once you share information across genes, as done by edgeR and DESeq, it does not make that much difference whether you have two or three replicates. If you don't share information (e.g., do a standard t test), you won't get anywhere with less than, say, six or seven replicates.

      - Nevertheless, if you want 100 counts for a given gene, you are better off getting them from four replicate samples, each sequenced to 25, than two sequenced to 50 each.

      - Outliers are an annoying issue, and more replicates help here.

      - edgeR and DESeq both support two-way anova, BaySeq does not (if I recall correctly), cuffdiff neither.
      Hi Simon, I will look into edgeR and DESeq. Thank you very much for your help!

      Comment


      • #4
        Hi Simon,
        How to run 2-way ANOVA using DESeq?
        Thanks,

        Comment


        • #5
          See the vignette of the devel version: http://www.bioconductor.org/packages.../doc/DESeq.pdf

          Comment


          • #6
            Thanks, How to download/find the pasilla data package?

            Comment


            • #7
              You will need the development version of R (2.14) and do:

              Code:
              source("http://bioconductor.org/biocLite.R")
              biocLite("pasilla")
              Or from here:

              This package provides per-exon and per-gene read counts computed for selected genes from RNA-seq data that were presented in the article

              Comment


              • #8
                You can also download pasilla package source and install it on other versions of R (eg. mine is 2.13.1)

                Code:
                install.packages("/home/xxx/pasilla_0.2.5.tar.gz", repos=NULL, type="source")

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Best Practices for Single-Cell Sequencing Analysis
                  by seqadmin



                  While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
                  06-06-2024, 07:15 AM
                • seqadmin
                  Latest Developments in Precision Medicine
                  by seqadmin



                  Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                  Somatic Genomics
                  “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                  05-24-2024, 01:16 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 06-07-2024, 06:58 AM
                0 responses
                13 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 06-06-2024, 08:18 AM
                0 responses
                20 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 06-06-2024, 08:04 AM
                0 responses
                18 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 06-03-2024, 06:55 AM
                0 responses
                13 views
                0 likes
                Last Post seqadmin  
                Working...
                X