Seqanswers Leaderboard Ad

**sdriscoll** · 04-13-2012, 09:02 AM

An update to this issue. If I run the low coverage sample as a third condition then cuffdiff seems to correctly perform the normalization. But then my replicate sample is no longer a replicate. So it's not a solution.

**billstevens** · 04-15-2012, 01:54 PM

Wow, seems like a pretty big issue. Perhaps you should email the authors, and if you do, please let post the response. I'm also about to get some replicates back that have a lower read count, and I didn't think that would be an issue, but now I'm worried.

**sdriscoll** · 04-17-2012, 11:07 PM

It IS a big issue. It's not so bad for me though because I don't use cuffdiff as a primary tool for differential expression. Every time we get a sequencing run done at our lab I check in and see if cufflinks is "useable" for me yet and so far I always run into something that turns me away. We don't like the decisions it makes and since it's basically a black box you can't really track why it made the decisions it made. What we like to do is generate coverageBed files from the aligned reads and have a look at them in the Ucsc genome browser. We'll go through the differential expression results and look at the coverage at the same time. It's always been the case that the cufflinks or cuffdiff output does not follow the raw coverage Ina way that makes sense. Not on a massive scale, mind you, but there seem to be a large percentage of the DE hits from cuffdiff that we have to toss out after observing the coverage. Sometime you get hig expressions for genes that clearly don't have many reads aligned to them. Overall I'd say it makes me nervous at best.

I like to use a simple approach and I'm willing to accept that differential expression at the isoform level just isn't there yet. At least not when you go in and look closely at the results. I use a read counter like HTSeq and the use count based DE tools like DESeq and edgeR. I've found the output of DESeq to make a lot of sense when compared to the coverage data. plus getting used to using R is a good thing for anybody that's going to do this stuff in any amount.

**billstevens** · 04-18-2012, 11:20 AM

Great post, I'll certainly take a look at the software you mentioned.

**sdriscoll** · 05-02-2012, 11:01 AM

bump! i'm still wondering if anyone has seen this - and if it'll be fixed in cufflinks 1.4

**sdriscoll** · 05-07-2012, 05:30 PM

it would appear this has been fixed in Cufflinks 2.0! thanks guys.

**billstevens** · 07-02-2012, 10:59 AM

Originally posted by sdriscoll View Post

it would appear this has been fixed in Cufflinks 2.0! thanks guys.

I'm not so sure. Please take a look at these plots.

The first is a plot generated using Cufflinks 1.3 without any duplicates. There were three samples run on one lane. The genes match up nicely along the 1:1 line.

The second is a plot generated from the newest cufflinks with duplicates that are half the size. So I took the three samples on one lane, and then ran duplicates with six samples on one lane. Meaning I had different sizes for my duplicates. I see this huge branching off.

Any ideas or thoughts on how to get around this?

Attached Files

**billstevens** · 07-13-2012, 07:31 AM

This was actually a problem with my data, not Cufflinks. Cuffdiff is now good on this.

**IBseq** · 10-12-2012, 06:48 AM

Hi,
I read with interst an old post of yours...I assume bythen you have solve dyour issue.

My question:
I have used cuffdiff to look as t DE between wt (no replicates) and one mutant (no replicates). I have done this for 7 different mutats, but all taken as "single" and not replicates.

Well, I am struggling to explain how cuffdiff calculate the DE. In some examples clearly there is a big fold change between the two samples, but somehow, cuffdiff says it is not significant..

any suggestion??
thanks,
ib

**sdriscoll** · 10-12-2012, 08:55 AM

I guess you should start by asking yourself this: in what other type of experiment would you attempt to assign significance in a 1 vs 1 comparison?

The only reason it's possible to assign p-values to genes in a 1 vs 1 condition has to do with some assumptions. One of the simplest to grasp is the idea that similarly expressed genes tend to have similar variance. Therefore for any gene you can look at the distribution of expression of genes similarly expresses between the two samples. Another part of this assumption that makes it work is that we can usually assume that most genes are not differentials expressed. So you can cheat a little and use these assumptions to figure out if the observed change for a gene is typical for genes with similar expressions or not. If not then its potentially significant. The visual version of this is to plot the expressions of your two samples against one another and spot the points that appear to stand out from the mass or from the 1:1 line that should run up the center of the mass of points.

On a finer scale it's likely that similarly expressed genes do not have the same variance across replicates but without replicated its not possible to model those.

If the DE results don't make sense it's probably because there is more going on then a gene by gene comparison which makes it difficult to visually check without knowing their statistic test and where the variance value comes from. I'd try using more than one tool to test for de genes and compare the lists.

**IBseq** · 10-12-2012, 09:36 AM

hi sdriscoll,
thanks a lot for the reply. I understand more now than before.

In the manual it is explained how cuffidiff calculates DE, but I do not have strong statistic bases, thus I really did not get it. As far as I can get, I know I have set up a cut off of .05 (in the program I set FDR= 0.05 and significance is given looking at the q value and not p value. I understood the q values is the p value FDR adjusted) (http://cufflinks.cbcb.umd.edu/howitworks.html#hdif).

but, whatever the program is choosing, is it still ok to do statistic on 2 samples and draw some conclusion? can some odd results be explained by the fact I do not have repeats?

well, any answer is appreciated!
ib

**sdriscoll** · 10-18-2012, 07:23 AM

I guess aside from the theoretical issue with doing a 1 vs 1 comparison I can say that I do indeed often see lots of variation between replicate samples. Gene expression is the most varied across samples for low expressed genes (say maybe the bottom 10 to 20% of genes when ranked by expression or read counts). I think that percentile goes down when you sequence deeper because, obviously, there's a cap on the total number of genes that can be expressed in a sample so at some point you should stop seeing more genes with low expression and start seeing the expression levels of all genes go up and become very stable.

Variance of expression of genes above the median expression level is usually very tight across replicates. Up at that level a 3 fold change or possibly a 2 fold change (in the upper quartile) in expression is pretty unlikely.

These aren't proven facts - just my own observations from the various experiments I've done analysis for.

Topics	Statistics	Last Post
ASHG 2024 Highlights – Part Two by seqadmin Started by seqadmin, Today, 11:09 AM	0 responses 22 views 0 likes	Last Post by seqadmin Today, 11:09 AM
ASHG 2024 Highlights – Part One by seqadmin Started by seqadmin, Today, 06:13 AM	0 responses 20 views 0 likes	Last Post by seqadmin Today, 06:13 AM
Seq-Scope Expands Possibilities for High-Resolution Gene Expression Analysis by seqadmin Started by seqadmin, 11-01-2024, 06:09 AM	0 responses 30 views 0 likes	Last Post by seqadmin 11-01-2024, 06:09 AM
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, 10-30-2024, 05:31 AM	0 responses 21 views 0 likes	Last Post by seqadmin 10-30-2024, 05:31 AM

Seqanswers Leaderboard Ad

Announcement

cuffdiff overcompensating normalization of samples?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News