Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Two-way ANOVA, replicates, software.

    Hi,

    I have a couple quick questions regarding experimental setup and analysis. I am trying to figure out how to best design my experiment to get the information I need while also minimizing cost. I am using an inducible transgene to identify the targets of my transcription factor with mRNA-seq. I am setting up my experiment with a two- way ANOVA design:

    Inducible Line (not treated) ; Inducible line (treated with inducing chemical)
    Wild-type control (not treated); Wild-type control (treated with inducing chemical)

    At a minimum, I plan to submit two biological replicates for sequencing for both lines/conditions. I am also hoping to using additional time points post induction (again, will depend on cost).

    Question 1) I know more replicates are better, but there is the cost limitation. How much will my statistical power increase by including a 3rd replicate? Is it likely that just two replicates would be useful or is this something that needs to be determined empirically? I have read the other threads on the website about the Fisher's Exact Test (two replicates) vs. T-test (need 3 replicates), but don't know if the value of the 3rd replicate similarly holds true for the two-way ANOVA.

    Question 2) When I have my data, I will need software to accommodate the two-way ANOVA analysis. Do most NGS analysis software packages offer two-way ANOVA analysis? Does anyone have any recommendations? I am working with Arabidopsis.

    If anyone has any insight into either of these questions, I would be grateful for your input. Thanks!

    eggplant72

  • #2
    Some comments in random order:

    - Fisher's exact test, as usually employed, cannot deal with any replicates, and hence should not be used.

    - Replicates are not as much a cost issue as people seem to think, because you can multiplex several samples on one lane. Hence, you should decide how many lanes you can afford and how many samples you can obtain. The best approach, in my opinion, is to tag all the samples with multiplexing tags, pool them and spread them over the available lanes. (See e.g. Doerge and Auer, 2010.)

    - Power depends more on sequencing depth than replicate number. This is because once you share information across genes, as done by edgeR and DESeq, it does not make that much difference whether you have two or three replicates. If you don't share information (e.g., do a standard t test), you won't get anywhere with less than, say, six or seven replicates.

    - Nevertheless, if you want 100 counts for a given gene, you are better off getting them from four replicate samples, each sequenced to 25, than two sequenced to 50 each.

    - Outliers are an annoying issue, and more replicates help here.

    - edgeR and DESeq both support two-way anova, BaySeq does not (if I recall correctly), cuffdiff neither.

    Comment


    • #3
      Originally posted by Simon Anders View Post
      Some comments in random order:

      - Fisher's exact test, as usually employed, cannot deal with any replicates, and hence should not be used.

      - Replicates are not as much a cost issue as people seem to think, because you can multiplex several samples on one lane. Hence, you should decide how many lanes you can afford and how many samples you can obtain. The best approach, in my opinion, is to tag all the samples with multiplexing tags, pool them and spread them over the available lanes. (See e.g. Doerge and Auer, 2010.)

      - Power depends more on sequencing depth than replicate number. This is because once you share information across genes, as done by edgeR and DESeq, it does not make that much difference whether you have two or three replicates. If you don't share information (e.g., do a standard t test), you won't get anywhere with less than, say, six or seven replicates.

      - Nevertheless, if you want 100 counts for a given gene, you are better off getting them from four replicate samples, each sequenced to 25, than two sequenced to 50 each.

      - Outliers are an annoying issue, and more replicates help here.

      - edgeR and DESeq both support two-way anova, BaySeq does not (if I recall correctly), cuffdiff neither.
      Hi Simon, I will look into edgeR and DESeq. Thank you very much for your help!

      Comment


      • #4
        Hi Simon,
        How to run 2-way ANOVA using DESeq?
        Thanks,

        Comment


        • #5
          See the vignette of the devel version: http://www.bioconductor.org/packages.../doc/DESeq.pdf

          Comment


          • #6
            Thanks, How to download/find the pasilla data package?

            Comment


            • #7
              You will need the development version of R (2.14) and do:

              Code:
              source("http://bioconductor.org/biocLite.R")
              biocLite("pasilla")
              Or from here:

              This package provides per-exon and per-gene read counts computed for selected genes from RNA-seq data that were presented in the article

              Comment


              • #8
                You can also download pasilla package source and install it on other versions of R (eg. mine is 2.13.1)

                Code:
                install.packages("/home/xxx/pasilla_0.2.5.tar.gz", repos=NULL, type="source")

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Exploring the Dynamics of the Tumor Microenvironment
                  by seqadmin




                  The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
                  07-08-2024, 03:19 PM
                • seqadmin
                  Exploring Human Diversity Through Large-Scale Omics
                  by seqadmin


                  In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
                  06-25-2024, 06:43 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 07-19-2024, 07:20 AM
                0 responses
                32 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 07-16-2024, 05:49 AM
                0 responses
                44 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 07-15-2024, 06:53 AM
                0 responses
                54 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 07-10-2024, 07:30 AM
                0 responses
                43 views
                0 likes
                Last Post seqadmin  
                Working...
                X