Seqanswers Leaderboard Ad

**sudders** · 02-21-2012, 09:57 AM

I am also having the same problem in both splicing.diff and promotor.diff.

I have 5 samples in each of two groups, but only 25M reads per sample (100bp paired end).

Promoters

Cufflinks 1.0.3

OK 11,309
NOTEST 8,637
FAIL 648

with 7,902 significant

Cuffdiff 1.3

OK 1,383
NOTEST 10,243
FAIL 8,948

2 significant

Splicing

Cufflinks 1.0.3

OK 13,996
NOTEST 51,240
FAIL 2,933

8,645 genes with sig. differential splicing

Cufflinks 1.3

OK 1,017
NOTEST 28,032
FAIL 37,736
LOWDATA 1,401

6 genes with sig. differential splicing.

**Cole Trapnell** · 02-28-2012, 04:23 PM

Just to be clear - this is happening only for the splicing.diff, promoters.diff (and maybe cds.diff?) files? Note gene_exp.diff or isoform_exp.diff?

Can one or both of you send us (to the support list) a gene's worth of reads in one of these loci that are failing? And a snippet of GTF to run it against? The new code uses a sampling-based approach to estimate a null distribution of relative isoform abundances in each condition, rather than an analytic null model based on the gradient. The upside of this approach is that it's more accurate and more conservative - the downside is that the sampling method can fail under some conditions. I haven't seen this happening in any of my datasets (and I have one that looks extremely similar to yours), so I probably can't do anything about it without a small test data set that reproduces the problem. It's possible it's something easy I overlooked and can fix quickly.

**turnersd** · 04-04-2012, 04:30 AM

Just came across this. Anecdotally, a colleague here has seen something similar, switching from the old pipeline (using 1.0.3, with a workflow similar to Jeremy's Galaxy exercise - tophat, cufflinks -g, cuffcompare -R, cuffdiff -N) to the new pipeline (using v1.3 from the protocols paper - tophat -G [gtf], cufflinks (no RABT), cuffmerge -g (RABT), cuffdiff -u -b). Unfortunately I don't have example data to send. Just wondering if you guys or others were able to figure out what was happening.

**francicco** · 04-12-2012, 06:44 AM

Dear all,
I also using the cufflinks package for our RNA-seq analysis and I also ran into the same problem.

Upon examination of cuffdiff results we also note a striking amount of transcripts (38%) and genes (25%) with status FAIL. We found such result very hard to be justified. How we can manage to loose the 25% of examined elements?

In order to get better results, reducing the FAIL number, we conducted different tests with different conditions.

1) We have three samples, each one with three biological replicas. The plot1 (see Plot1 attached) shows the number of FAIL (cuffdiff's v1.3.0) obtained with 1, 2 or 3 biological replicas.
As you can see the number of elements dramatically increases with the number of replicas (FAIL in tracking file).

2) As DerSeb previously shown, I also tested the cuffdiff's behavior using different versions with all the three replicas.
The plot (see Plot2 attached) clearly shows very different results. From cuffdiff 1.2.0, a remarkable worsening of the number of FAIL appears.

From what I was able to see, there is no improvement with FAIL as the say with the 1.2.0 version. Essentially the number of FAILs increases with the number of biological replicas and with cufflinks versions following the v1.1.0.

Do somebody find any possible solution for this issue? Could anybody provide me any explanation behind this results?

Thanks
Francesco

Attached Files

**Cole Trapnell** · 04-12-2012, 07:22 AM

Hi francicco,

We've fixed this issue in the upcoming release of Cufflinks 1.4.0, which is right around the corner. We've been a little swamped with the release of TopHat 2 and other items, but we're working hard to get this out because I know several groups have run into this. The explanation of what was going on is a bit complicated, but we were able to reproduce the issue on one of our test sets, and came with a nice fix for it. The newest version produces a handful of FAIL genes at most, and when we've looked at those, the genes are ones where Cuffdiff has flagged a genuine structural problem that prevents us from calling gene expression.

**francicco** · 04-12-2012, 11:29 PM

Dear Cole,

Thank you for you rapid answer! Do you know when the new version will be public available?

I offer myself for testing the new version on my data, do you think would be possible?

Cheers
Francesco

**gcoppola** · 04-13-2012, 12:56 PM

Hi,

I am having a similar issue. I am running Tophat/Cufflinks pipeline.

I have two groups of individuals (5 each), test and control. Two tissue samples each individual.

Cuffdiff gives only one DE gene and one diff splicing. I do get about 400 with either DESeq or edgeR and about 800 hits for diff exon usage.

I am running all latest versions (although the data were mapped with an older Tophat release, I think 1.0.3).

Can I use the older version (1.0.3) of Cuffdiff with the files prodced by cuffmerge and cufflinks 1.3.0?

Thanks

**francicco** · 04-16-2012, 04:06 AM

Originally posted by gcoppola View Post

Can I use the older version (1.0.3) of Cuffdiff with the files prodced by cuffmerge and cufflinks 1.3.0?
Thanks

That is also what I'm doing, can somebody say if that is correct?
Cheers
F

**billstevens** · 05-17-2012, 01:53 PM

Hi guys,

Anyone try this with Cufflinks 2.0? Is the problem resolved? I also have approximately 40% of my genes as NOTEST right now with the old cufflinks

**lshen** · 05-19-2012, 05:44 AM

Cuffdiff or DESeq - SEQanswers

http://seqanswers.com/forums/showthread.php?t=17678&page=2

Application of sequencing to RNA analysis (RNA-Seq, whole transcriptome, SAGE, expression analysis, novel organism mining, splice variants)

I got another issue with new CUFFLINK 2:

When I directly quantify against ensembl gtf, the cufflinks returned 0 expression for most of them. This only occurred when I used replicates. single sample group is fine. And seems only when transcripts matched to known gene's annotation.

#command:
cufflinks-2.0.0.Linux_x86_64/cuffdiff -p 8 -L P1,P2 -c 1 -b anFam2.fa -o cuffdiff.P1.P2.ensembl canFam2.67.gtf TOPHAT2.C1.bam,TOPHAT2.C2.bam,TOPHAT2.C3.bam,TOPHAT2.C4.bam TOPHAT2.C5.bam,TOPHAT2.C6.bam,TOPHAT2.C7.bam,TOPHAT2.C8.bam,TOPHAT2.C9.bam

Here are the number of genes returned FPKM 0 in cuffdiff:

$8 is for treatment P1, $9 is for treatment P2 in output.

awk ' $8 ==0 { i++}; END {print i " of " NR " = " i/NR*100 "%"} ' cuffdiff.P1.P2.ensembl/gene_exp.diff

cufflinks-2.0.0:

P1: 24649 of 24661 = 99.9513%
P2: 24645 of 24661 = 99.9351%

cufflinks 1.3.0 seems right:

P1: 6345 of 24661 = 25.7289%
P2: 6564 of 24661 = 26.6169%

Now I am using edgeR and DESeq for identifying DE genes, and use cuffddiff (v1.3.0) results (pvalue, FC >=1.5) as additional evidence in filtering.

But seems edgeR and DESeq only work on gene level and can not do isoform level analysis.

**francicco** · 05-22-2012, 07:54 AM

I personally do not trust cufflinks 2 results. For instance it gives 0 FPKM to transcript clearly expressed

Developers need to do something, sooner or later...

**glados** · 05-30-2012, 03:40 AM

I have a similar issue. When I add more replicates the number of sig. genes goes down drastically. Finally after much searching I discovered that the number of FAIL in gene_exp.diff increases with more replicates. I reran everything with tophat2 and cufflinks2 and the results now are 0 sig. genes with all replicates, which it shouldn't be. When I look at the gene_exp.diff file I see that the big majority of status messages was not FAIL this time, but NOTEST.

Here's some statistics to my statement.

2+2 replicates (cufflinks 1.3.0)
NOTEST 8130
OK 34495
FAIL 271

3+3 replicates (cufflinks 1.3.0)
NOTEST 8271
OK 29908
FAIL 4887

4+4 replicates (cufflinks 1.3.0)
NOTEST 8645
OK 25996
FAIL 8823

Notice how the status FAIL increases here with more replicates.

Below is the statistics from the cufflinks2 runs with very large number of NOTEST resulting in 0 sig. genes.

4+4 replicates (cufflinks 2)
NOTEST 35560
OK 9142
FAIL 9

7+8 replicates (cufflinks 2)
NOTEST 38875
OK 6269
FAIL 0

7+8 replicates (cufflinks 2) but without frag-bias-correct, upper-quartile-norm and multiread-correct in the cuffdiff run
NOTEST 17534
OK 27558
FAIL 52

I would very much like to know the reason to this and if I can correct it somehow.

**sudders** · 05-30-2012, 04:41 AM

We eventually came to the conclusion that the original problem in Cufflinks 1.3 was being caused by excessive variance between our samples. As more samples were added, the variance was getting bigger - this is why we only saw the problems in datasets with large numbers of samples. This made biological sense for us: our samples were from different patients with each patient given a before and after treatment sample.

In cufflinks 2, the large variance no longer caused the model to fall over, but it didn't find any significant genes: presumably because the variances were so large (which can be seen in the confidence limits on the FPKM estimation). We didn't see the large number of NOTESTs though.

**Cole Trapnell** · 05-31-2012, 05:47 AM

Originally posted by glados View Post

I have a similar issue. When I add more replicates the number of sig. genes goes down drastically. Finally after much searching I discovered that the number of FAIL in gene_exp.diff increases with more replicates. I reran everything with tophat2 and cufflinks2 and the results now are 0 sig. genes with all replicates, which it shouldn't be. When I look at the gene_exp.diff file I see that the big majority of status messages was not FAIL this time, but NOTEST.

Here's some statistics to my statement.

2+2 replicates (cufflinks 1.3.0)
NOTEST 8130
OK 34495
FAIL 271

3+3 replicates (cufflinks 1.3.0)
NOTEST 8271
OK 29908
FAIL 4887

4+4 replicates (cufflinks 1.3.0)
NOTEST 8645
OK 25996
FAIL 8823

Notice how the status FAIL increases here with more replicates.

Below is the statistics from the cufflinks2 runs with very large number of NOTEST resulting in 0 sig. genes.

4+4 replicates (cufflinks 2)
NOTEST 35560
OK 9142
FAIL 9

7+8 replicates (cufflinks 2)
NOTEST 38875
OK 6269
FAIL 0

7+8 replicates (cufflinks 2) but without frag-bias-correct, upper-quartile-norm and multiread-correct in the cuffdiff run
NOTEST 17534
OK 27558
FAIL 52

I would very much like to know the reason to this and if I can correct it somehow.

Can you try re-running this analysis with --min-outlier-p 0 to see if it's the inline model checking that's causing the increase in NOTESTs?

Topics	Statistics	Last Post
Mechanical Forces in DNA Transcription Uncovered by Clemson Researchers by seqadmin Started by seqadmin, 10-02-2024, 04:51 AM	0 responses 13 views 0 likes	Last Post by seqadmin 10-02-2024, 04:51 AM
New Epigenetic Clock Links Cheek Cells to Mortality Risk by seqadmin Started by seqadmin, 10-01-2024, 07:10 AM	0 responses 21 views 0 likes	Last Post by seqadmin 10-01-2024, 07:10 AM
AI-Powered Blood Test Shows Promise for Early Ovarian Cancer Detection by seqadmin Started by seqadmin, 09-30-2024, 08:33 AM	0 responses 25 views 0 likes	Last Post by seqadmin 09-30-2024, 08:33 AM
Stem Cell Research Suggests Human Cells May Enter Developmental Pause by seqadmin Started by seqadmin, 09-26-2024, 12:57 PM	0 responses 18 views 0 likes	Last Post by seqadmin 09-26-2024, 12:57 PM

Seqanswers Leaderboard Ad

Announcement

New differential testing of cuffdiff/cufflinks since 1.3.0

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News