Seqanswers Leaderboard Ad

**mrawlins** · 11-16-2010, 04:41 PM

This will depend on what statistics you use to determine statistical significance.

If you are using something like a T-test, you really want as many replicates as possible. You can't do these tests without at least 3 replicates. Depth isn't particularly useful here, but more replicates are. It's an oversimplified approach to the statistics, IMO.

If you use something like Fisher's Exact Test (hypergeometric or poisson distribution) then two biological replicates should be reasonable. You can actually mix multiple biological replicates in the same barcoded sample and it will likely give the same answer (since the reads from replicates are just added together). In this case the read depth is more useful than additional replicates. This method is what we use, but a number of people (probably more knowledgeable than me) have raised concerns with it.

If you use something like DESeq I don't have an answer for you, because I don't know anything about their statistical models. My guess is that two biological replicates would be fine for this type of analysis, though you may need 3.

I think this is a useful question, though, and I am interested to see what others think.

**NicoBxl** · 11-17-2010, 12:21 AM

like mrawlins, we've mixed multiple biological replicates in the same barcoded sample. but we're working on small rna seq.

It works great

**ecofriendly** · 11-24-2010, 01:31 PM

mrawlins, can you suggest an article that explains the difference between these different statistical tests and why they require different numbers of biological replicates of RNA-seq data to be as powerful?

I've been reading the Cufflinks paper and trying to understand their statistical model used to analyze RNA-seq data, as written in the Supplementary Methods (Trapnell et al., 2010, in Nature Biotechnology). Can someone explain it to me in simple terms with as little math as possible

?

**Jeremy** · 11-24-2010, 07:26 PM

By "mix multiple biological replicates in the same barcoded sample", do you mean each sample has its own barcode and is pooled together and sequenced. Or do you mean that multiple samples are pooled together, given the same barcode and then sequenced?

If you mix samples and give them the same barcode, how do you calculate variance?

**golharam** · 11-24-2010, 07:36 PM

We always recommend at least 3 biological replicates. If you do two, how do you know one isn't bad? If you do three, and one is bad, you can at least eliminate it and continue.

I think 4 would be ideal especially from a statistical standpoint, but that's not always possible because of cost. However, talk with your sequencing core facility to determine if barcoding multiple samples is an option. If it is, you may be able to sequence more samples for the same cost.

Take a look at this paper as well:

Statistical Design and Analysis of RNA Sequencing Data
Genetics, Vol. 185, No. 2. (1 June 2010), pp. 405-416.

**Simon Anders** · 11-25-2010, 03:26 AM

I've explained this in a number of posts before, so I just repeat the core points.

- If you use FIsher's exact test or something similar, you don't need any replicates because it cannot accommodate for them. The results, though, will be wrong, especially for strongly expressed genes.

This is because Fisher's exact test tests whether two samples differ in the concentration of a given transcript. This is, however, not the question you want to ask. What you want to know is whether the difference between two samples with different treatment is stronger than what you expect to see between two samples that are replicates, because otherwise, you cannot attribute the difference to the treatment.

This criticism also applies to cuffdiff, at least to the version described in the paper. (There is a new version of cuffdiff that allows for biological replicates but there is no documentation on its method yet, and hence it is unclear whether it now asks the relevant question.)

- If you have many replicates, use a t test.

- With only two or three replicates, you need to pool across genes, i.e., assume that similar genes have similar variance. Our DESeq package assumes that genes with similar expression strength have similar variance, and so pools information from these in order to get a reasonable estimate of biological variability, which is then used for the test.

Simon

**vpp605** · 11-29-2010, 06:04 AM

Thank you everyone for the replies!!

golharam - thank you for pointing me to that paper!

Simon - sorry for making you repeat everything over again -- if there is a "better" post for me to look at, please point me in that direction!

**Simon Anders** · 11-29-2010, 09:03 AM

Originally posted by vpp605 View Post

Simon - sorry for making you repeat everything over again -- if there is a "better" post for me to look at, please point me in that direction!

No Problem. We had a couple of discussions on the subject of replicates, but they are spread over several threads, so they may be hard to find.

Here are a few of them:

Multiple DGE libraries comparison. (EdgeR baySeq DESeq) - SEQanswers

http://seqanswers.com/forums/showthread.php?t=4349

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

Differential gene expression: Can Cufflinks/Cuffcompare handle biological replicates? - SEQanswers

http://seqanswers.com/forums/showthread.php?t=5180

Discussion of next-gen sequencing related bioinformatics: resources, algorithms, open source efforts, etc

RNA-seq output - SEQanswers

http://seqanswers.com/forums/showthread.php?t=5248

Application of sequencing to RNA analysis (RNA-Seq, whole transcriptome, SAGE, expression analysis, novel organism mining, splice variants)

DEG for paired samples, biological replicates - SEQanswers

http://seqanswers.com/forums/showthread.php?t=7108

Application of sequencing to RNA analysis (RNA-Seq, whole transcriptome, SAGE, expression analysis, novel organism mining, splice variants)

Note that some of the mentioned software packages have got new functionality quite recently, i.e., some arguments in these threads about their limitations are out of date.

Simon

**adumitri** · 12-07-2010, 07:47 AM

Cuffdiff - differential expression analysis between groups of samples

This criticism also applies to cuffdiff, at least to the version described in the paper. (There is a new version of cuffdiff that allows for biological replicates but there is no documentation on its method yet, and hence it is unclear whether it now asks the relevant question.)

Hello,

Simon mentioned the existence of a new version of Cuffdiff that allows for biological replicates. Does anyone know anything else about this new version? Will it be released soon or is it already available somewhere?

Given the currently available Cuffdiff version (v0.9.3), is there any viable workaround to analyze groups of samples (e.g. control samples compared with treated samples)?

Thank you,
Alexandra

**jminich444** · 03-03-2011, 02:23 PM

http://www.genetics.org/cgi/content/abstract/185/2/405

**eastasiasnow** · 08-28-2014, 01:59 AM

Originally posted by Jeremy View Post

By "mix multiple biological replicates in the same barcoded sample", do you mean each sample has its own barcode and is pooled together and sequenced. Or do you mean that multiple samples are pooled together, given the same barcode and then sequenced?

If you mix samples and give them the same barcode, how do you calculate variance?

hi Jeremy, have you got the answer of your concern?

**Jeremy** · 08-28-2014, 07:28 PM

Originally posted by eastasiasnow View Post

hi Jeremy, have you got the answer of your concern?

Based on my quote marks I think I was asking the OP what they meant and then pointing out (via rhetorical question) that you can't get within group variance using a pooled approach. But it was so long ago I can't remember and the phrase that I quoted seems to no longer be there.

**eastasiasnow** · 08-28-2014, 07:48 PM

Originally posted by Jeremy View Post

Based on my quote marks I think I was asking the OP what they meant and then pointing out (via rhetorical question) that you can't get within group variance using a pooled approach. But it was so long ago I can't remember and the phrase that I quoted seems to no longer be there.

yeah, pooling biological replicate samples will lose group variance. but could I use this design to do the following analysis? do people accept this design when I apply it in my paper? if so, what kind of tools can do this?

thank you very much.

**Jeremy** · 08-28-2014, 07:53 PM

Originally posted by eastasiasnow View Post

yeah, pooling biological replicate samples will lose group variance. but could I use this design to do the following analysis? do people accept this design when I apply it in my paper? if so, what kind of tools can do this?

thank you very much.

For differential expression analysis, I wouldn't. That design would have a lot of trouble getting published. For almost the same price you can sequence biological replicates that have been individually tagged and get results that are far more biologically relevant.

Topics	Statistics	Last Post
Gene Misexpression in the Healthy Human Population by seqadmin Started by seqadmin, Yesterday, 06:46 AM	0 responses 9 views 0 likes	Last Post by seqadmin Yesterday, 06:46 AM
New Method for Rapid Genetic Diagnosis of Mendelian Disorders by seqadmin Started by seqadmin, 07-24-2024, 11:09 AM	0 responses 26 views 0 likes	Last Post by seqadmin 07-24-2024, 11:09 AM
Advancing Nanopore Technology for Portable Sensing Devices by seqadmin Started by seqadmin, 07-19-2024, 07:20 AM	0 responses 160 views 0 likes	Last Post by seqadmin 07-19-2024, 07:20 AM
New RNA-Based Gene Writing Technology Achieves Precise Gene Integration by seqadmin Started by seqadmin, 07-16-2024, 05:49 AM	0 responses 127 views 0 likes	Last Post by seqadmin 07-16-2024, 05:49 AM

Seqanswers Leaderboard Ad

Announcement

Biological replicates for RNA-seq

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News