Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Biological replicates for RNA-seq

    Hi all,

    I'm not sure if this has already been discussed elsewhere, but after looking around I didn't find anything directly answering my question, so if it has already been discussed, sorry for the repeat and please point me in the right direction!

    I'm going to be doing RNA-seq for DE analysis of a bacteria growing in two different environments. I'm trying to determine the number of biological replicates that would be required to provide statistically-meaningful results. I saw that technical replicates aren't exactly necessary, and with cost of course being an issue, we were hoping to run 2 biological replicates of each environment, but we don't want to find out afterwards that we should have included more. We will be multiplexing our data, and are using illumina technology.

    I'm a microbiologist, with little background in stats, so any input or thoughts on this would be greatly appreciated!

    Thanks in advance.

    - Vanessa

  • #2
    This will depend on what statistics you use to determine statistical significance.

    If you are using something like a T-test, you really want as many replicates as possible. You can't do these tests without at least 3 replicates. Depth isn't particularly useful here, but more replicates are. It's an oversimplified approach to the statistics, IMO.

    If you use something like Fisher's Exact Test (hypergeometric or poisson distribution) then two biological replicates should be reasonable. You can actually mix multiple biological replicates in the same barcoded sample and it will likely give the same answer (since the reads from replicates are just added together). In this case the read depth is more useful than additional replicates. This method is what we use, but a number of people (probably more knowledgeable than me) have raised concerns with it.

    If you use something like DESeq I don't have an answer for you, because I don't know anything about their statistical models. My guess is that two biological replicates would be fine for this type of analysis, though you may need 3.

    I think this is a useful question, though, and I am interested to see what others think.

    Comment


    • #3
      like mrawlins, we've mixed multiple biological replicates in the same barcoded sample. but we're working on small rna seq.

      It works great

      Comment


      • #4
        mrawlins, can you suggest an article that explains the difference between these different statistical tests and why they require different numbers of biological replicates of RNA-seq data to be as powerful?

        I've been reading the Cufflinks paper and trying to understand their statistical model used to analyze RNA-seq data, as written in the Supplementary Methods (Trapnell et al., 2010, in Nature Biotechnology). Can someone explain it to me in simple terms with as little math as possible ?

        Comment


        • #5
          By "mix multiple biological replicates in the same barcoded sample", do you mean each sample has its own barcode and is pooled together and sequenced. Or do you mean that multiple samples are pooled together, given the same barcode and then sequenced?

          If you mix samples and give them the same barcode, how do you calculate variance?

          Comment


          • #6
            We always recommend at least 3 biological replicates. If you do two, how do you know one isn't bad? If you do three, and one is bad, you can at least eliminate it and continue.

            I think 4 would be ideal especially from a statistical standpoint, but that's not always possible because of cost. However, talk with your sequencing core facility to determine if barcoding multiple samples is an option. If it is, you may be able to sequence more samples for the same cost.

            Take a look at this paper as well:

            Statistical Design and Analysis of RNA Sequencing Data
            Genetics, Vol. 185, No. 2. (1 June 2010), pp. 405-416.

            Comment


            • #7
              I've explained this in a number of posts before, so I just repeat the core points.

              - If you use FIsher's exact test or something similar, you don't need any replicates because it cannot accommodate for them. The results, though, will be wrong, especially for strongly expressed genes.

              This is because Fisher's exact test tests whether two samples differ in the concentration of a given transcript. This is, however, not the question you want to ask. What you want to know is whether the difference between two samples with different treatment is stronger than what you expect to see between two samples that are replicates, because otherwise, you cannot attribute the difference to the treatment.

              This criticism also applies to cuffdiff, at least to the version described in the paper. (There is a new version of cuffdiff that allows for biological replicates but there is no documentation on its method yet, and hence it is unclear whether it now asks the relevant question.)

              - If you have many replicates, use a t test.

              - With only two or three replicates, you need to pool across genes, i.e., assume that similar genes have similar variance. Our DESeq package assumes that genes with similar expression strength have similar variance, and so pools information from these in order to get a reasonable estimate of biological variability, which is then used for the test.

              Simon

              Comment


              • #8
                Thank you everyone for the replies!!

                golharam - thank you for pointing me to that paper!

                Simon - sorry for making you repeat everything over again -- if there is a "better" post for me to look at, please point me in that direction!

                Comment


                • #9
                  Originally posted by vpp605 View Post
                  Simon - sorry for making you repeat everything over again -- if there is a "better" post for me to look at, please point me in that direction!
                  No Problem. We had a couple of discussions on the subject of replicates, but they are spread over several threads, so they may be hard to find.

                  Here are a few of them:

                  Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

                  Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

                  Application of sequencing to RNA analysis (RNA-Seq, whole transcriptome, SAGE, expression analysis, novel organism mining, splice variants)

                  Application of sequencing to RNA analysis (RNA-Seq, whole transcriptome, SAGE, expression analysis, novel organism mining, splice variants)


                  Note that some of the mentioned software packages have got new functionality quite recently, i.e., some arguments in these threads about their limitations are out of date.

                  Simon

                  Comment


                  • #10
                    Cuffdiff - differential expression analysis between groups of samples

                    This criticism also applies to cuffdiff, at least to the version described in the paper. (There is a new version of cuffdiff that allows for biological replicates but there is no documentation on its method yet, and hence it is unclear whether it now asks the relevant question.)
                    Hello,

                    Simon mentioned the existence of a new version of Cuffdiff that allows for biological replicates. Does anyone know anything else about this new version? Will it be released soon or is it already available somewhere?

                    Given the currently available Cuffdiff version (v0.9.3), is there any viable workaround to analyze groups of samples (e.g. control samples compared with treated samples)?

                    Thank you,
                    Alexandra

                    Comment


                    • #11

                      Comment


                      • #12
                        Originally posted by Jeremy View Post
                        By "mix multiple biological replicates in the same barcoded sample", do you mean each sample has its own barcode and is pooled together and sequenced. Or do you mean that multiple samples are pooled together, given the same barcode and then sequenced?

                        If you mix samples and give them the same barcode, how do you calculate variance?
                        hi Jeremy, have you got the answer of your concern?

                        Comment


                        • #13
                          Originally posted by eastasiasnow View Post
                          hi Jeremy, have you got the answer of your concern?
                          Based on my quote marks I think I was asking the OP what they meant and then pointing out (via rhetorical question) that you can't get within group variance using a pooled approach. But it was so long ago I can't remember and the phrase that I quoted seems to no longer be there.

                          Comment


                          • #14
                            Originally posted by Jeremy View Post
                            Based on my quote marks I think I was asking the OP what they meant and then pointing out (via rhetorical question) that you can't get within group variance using a pooled approach. But it was so long ago I can't remember and the phrase that I quoted seems to no longer be there.
                            yeah, pooling biological replicate samples will lose group variance. but could I use this design to do the following analysis? do people accept this design when I apply it in my paper? if so, what kind of tools can do this?

                            thank you very much.

                            Comment


                            • #15
                              Originally posted by eastasiasnow View Post
                              yeah, pooling biological replicate samples will lose group variance. but could I use this design to do the following analysis? do people accept this design when I apply it in my paper? if so, what kind of tools can do this?

                              thank you very much.
                              For differential expression analysis, I wouldn't. That design would have a lot of trouble getting published. For almost the same price you can sequence biological replicates that have been individually tagged and get results that are far more biologically relevant.

                              Comment

                              Latest Articles

                              Collapse

                              • seqadmin
                                Genetic Variation in Immunogenetics and Antibody Diversity
                                by seqadmin



                                The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...
                                Yesterday, 07:24 PM
                              • seqadmin
                                Choosing Between NGS and qPCR
                                by seqadmin



                                Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
                                10-18-2024, 07:11 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by seqadmin, 11-01-2024, 06:09 AM
                              0 responses
                              29 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 10-30-2024, 05:31 AM
                              0 responses
                              21 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 10-24-2024, 06:58 AM
                              0 responses
                              26 views
                              0 likes
                              Last Post seqadmin  
                              Started by seqadmin, 10-23-2024, 08:43 AM
                              0 responses
                              57 views
                              0 likes
                              Last Post seqadmin  
                              Working...
                              X