Originally posted by francicco
View Post
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
Originally posted by Cole Trapnell View PostCan you try re-running this analysis with --min-outlier-p 0 to see if it's the inline model checking that's causing the increase in NOTESTs?
I reran in cuffdiff two different cufflinks files I had with the same parameters (4+4 replicates) and --min-outlier-p 0, and the result was exactly like I got before without --min-outlier-p. In gene-exp_diff 35560 NOTESTs for one cufflinks and 11240 NOTESTs for the other.
The difference between the two cufflinks groups I tested is that the first has been run with --multi-read-correct --upper-quartile-norm --frag-bias-correct both in cufflinks and cuffdiff, and the other group only in cuffdiff (frag-bias-correct in cufflinks). Both has been run with --GTF-guide. I am wondering if this has any influence on number of NOTEST. You can use these parameters in both programs but is better to use them only in one of them, if so which one? However, I still think that 11240 NOTESTs are too high and it gives me practically 0 significant genes which I'm sure is incorrect.
edit: I also tested with using these parameters in cufflinks only and not in cuffdiff, and got the same results as I did when I used them in cuffdiff but not in cufflinks, i.e. 11240 NOTESTs.
Additionally I wonder why the variance on every gene is huge. I have so many replicates I would expect it to become smaller but the error bars always reach the bottom (i.e. fpkm conf_lo = 0 and fpkm_conf_hi is extremely high). I am wondering if this has anything to do with me not getting any significant DE genes.Last edited by glados; 06-01-2012, 01:25 AM.
Comment
-
Originally posted by glados View PostAbsolutely. Thank you so much for trying to help. I really need this to work soon.
I reran in cuffdiff two different cufflinks files I had with the same parameters (4+4 replicates) and --min-outlier-p 0, and the result was exactly like I got before without --min-outlier-p. In gene-exp_diff 35560 NOTESTs for one cufflinks and 11240 NOTESTs for the other.
The difference between the two cufflinks groups I tested is that the first has been run with --multi-read-correct --upper-quartile-norm --frag-bias-correct both in cufflinks and cuffdiff, and the other group only in cuffdiff (frag-bias-correct in cufflinks). Both has been run with --GTF-guide. I am wondering if this has any influence on number of NOTEST. You can use these parameters in both programs but is better to use them only in one of them, if so which one? However, I still think that 11240 NOTESTs are too high and it gives me practically 0 significant genes which I'm sure is incorrect.
edit: I also tested with using these parameters in cufflinks only and not in cuffdiff, and got the same results as I did when I used them in cuffdiff but not in cufflinks, i.e. 11240 NOTESTs.
Comment
-
Originally posted by Cole Trapnell View PostI'm confused about what you did: are you following the protocol from the Cufflinks website (the Nature Protocols one)? If not, can you provide the full sequence of commands that you ran? Cufflinks doesn't emit NOTESTs - that's a Cuffdiff only thing.
What I mean is that the parameters --multi-read-correct --upper-quartile-norm and --frag-bias-correct is available for both cufflinks and cuffdiff, so I've tried using them only in cufflinks, only in cuffdiff and in both. I get much more NOTESTs in the gene_exp.diff-file from cuffdiff when I use these parameters in both cufflinks and cuffdiff (35560) than in only one of them (11240), so I wondered if that had anything to do with it, the number of NOTESTs is still high though.
My cuffdiff command can be something like this
Code:cuffdiff -o output_path --labels X,Y --num-threads 12 --frag-bias-correct genome.fa --upper-quartile-norm --multi-read-correct merged.gtf X1.bam,X2.bam,X3.bam,X4.bam Y1.bam,Y2.bam,Y3.bam,Y4.bam
Last edited by glados; 06-01-2012, 03:41 AM.
Comment
-
Originally posted by glados View PostYes I'm following the protocol. Cufflinks on each individual sample's bam-file from tophat2, then cuffmerge on the assemblies text-file with paths to the transcript.gtf-files. After Cuffdiff on the merged.gtf with 2 groups and paths to each sample's bam-file.
What I mean is that the parameters --multi-read-correct --upper-quartile-norm and --frag-bias-correct is available for both cufflinks and cuffdiff, so I've tried using them only in cufflinks, only in cuffdiff and in both. I get much more NOTESTs in the gene_exp.diff-file from cuffdiff when I use these parameters in both cufflinks and cuffdiff (35560) than in only one of them (11240), so I wondered if that had anything to do with it, the number of NOTESTs is still high though.
Based on your comment that the variances are huge, I'm wondering if the problem is with the assembly. Cuffdiff takes into consideration both cross-replicate variability and fragment assignment uncertainty (disambiguating how many reads came from each isoform). In general, the more isoforms a gene has, the more uncertainty there will be in assigning reads to each isoform, and the more uncertainty there will be in the overall gene expression level. That means more variance, so if you have a ton of isoforms (possibly because of a bad assembly), you'll see very few differentially expressed genes.
Another thing to check is whether you still see this when using a reference GTF. Have you tried that as a sanity check?
Comment
-
Originally posted by Cole Trapnell View PostHmm. What happens when you cuffcompare the merged GTF files from cuffmerge produced using the different methods? Does Cufflinks produce substantially different assemblies when bias correction + multireads + quartile norm is enabled/disabled?
Based on your comment that the variances are huge, I'm wondering if the problem is with the assembly. Cuffdiff takes into consideration both cross-replicate variability and fragment assignment uncertainty (disambiguating how many reads came from each isoform). In general, the more isoforms a gene has, the more uncertainty there will be in assigning reads to each isoform, and the more uncertainty there will be in the overall gene expression level. That means more variance, so if you have a ton of isoforms (possibly because of a bad assembly), you'll see very few differentially expressed genes.
I think the assembly went alright. The reads have been quality filtered and trimmed before tophat. About 75% mapped in tophat 1.4.1 and much more in tophat2, I haven't checked mapping statistics in tophat 2 yet but one sample gives me 86% mapped. I used the -GTF option in tophat. When I look at the bam-file visually in IGV it looks good to me at least. A lot of reads seem to map to the exons. But I'm not an expert on how the assembly is supposed to look.
Another thing to check is whether you still see this when using a reference GTF. Have you tried that as a sanity check?
What is weird is that I did not get this many NOTESTs with Cufflinks 1.3, but instead more FAIL and much less significant genes when I added more replicates in cuffdiff.
Comment
-
Hello all,
I have the similar problems. First, the output of cuffdiff populates zeros for almost all the genes. I analyzed the same dataset with an older version and got non-zero fpkm. I even see the reads on the genes when I uploaded bam files on IGV.
Another dataset with 2000 DEGs shows only 200 DEGs after analyzing with cuffdiff2.
It would be appreciated if developer of cuffdiff help us to figure out these issues.
Thanks,
Robert
Comment
-
Originally posted by glados View PostI'm not sure what you're asking. I have used the --GTF-guide option in cufflinks and the --ref-gtf option in cuffmerge. In tophat I used the --GTF option also. Do you want me to try to run cufflinks with --GTF instead of --GTF-guide?
What is weird is that I did not get this many NOTESTs with Cufflinks 1.3, but instead more FAIL and much less significant genes when I added more replicates in cuffdiff.
It sounds like there are two different things going on here that aren't supposed to be happening:
1) When you run Cuffdiff with --frag-bias-correct --multiread-correct and --upper-quartile-norm you see more NOTEST genes than when you leave all three off.
2) You see a very high number of NOTEST genes, and this number grows with more replicates.
I can't reproduce #1 with the datasets I've looked at. I have seen the number of NOTESTs grow with more replicates (see below for why this can happen), but I've not seen the number be so large.
A gene can be marked as NOTEST for one of several reasons:
1) There are not enough reads falling on the gene in either condition. The default threshold is 10 (though the threshold is applied to the common-scale normalized count). Genes with no detectable expression thus get marked NOTEST. You can control this behavior with the -c option.
2) Before testing, Cuffdiff 2 checks that its variance model is a good fit for the gene. For each gene, Cuffdiff 2 has a mean expression across replicates, a variance derived by its model (which give you the confidence intervals), and an expression measurement from each replicate. If one or more of the replicates lies outside of the 99% confidence interval (by default, this is controlled with min-outlier-p), Cuffdiff 2 thinks the variance model is a bad fit for the gene, and thus doesn't perform any testing and marks the gene NOTEST. Cuffdiff 1.3.0 doesn't do this, it's new behavior.
So what might be happening is that as you add more and more replicates, you're increasing the number of genes for which one of these replicates will lie outside of the model's variance estimate, causing the gene to get marked NOTEST. That's why I asked if you had set --min-outlier-p 0, because that should disable this whole model checking behavior. The model checking is meant to improve robustness of the results when you have very few replicates (2 or 3) - with 7 or 8 it's probably not helping much anyways.
A few more questions to help me figure out where the problem is:
1) What happens with you set -c 0? Does the number of NOTESTs go down
2) Can you figure out which of --multiread-correct, --frag-bias-correct, or --upper-quartile-norm is causing the increase in NOTESTs in that 7+8 run?
3) Do the replicates segregate together when you cluster them using CummeRbund's csDendro function? You can check this easily by passing replicates=T to csDendro. I just want to rule out one of the replicates being bad.
Comment
-
Originally posted by robert-nci View PostHello all,
I have the similar problems. First, the output of cuffdiff populates zeros for almost all the genes. I analyzed the same dataset with an older version and got non-zero fpkm. I even see the reads on the genes when I uploaded bam files on IGV.
Another dataset with 2000 DEGs shows only 200 DEGs after analyzing with cuffdiff2.
It would be appreciated if developer of cuffdiff help us to figure out these issues.
Thanks,
Robert
Comment
-
Originally posted by robert-nci View PostAnother dataset with 2000 DEGs shows only 200 DEGs after analyzing with cuffdiff2.
Comment
-
Originally posted by Cole Trapnell View PostA few more questions to help me figure out where the problem is:
1) What happens with you set -c 0? Does the number of NOTESTs go down
2) Can you figure out which of --multiread-correct, --frag-bias-correct, or --upper-quartile-norm is causing the increase in NOTESTs in that 7+8 run?
3) Do the replicates segregate together when you cluster them using CummeRbund's csDendro function? You can check this easily by passing replicates=T to csDendro. I just want to rule out one of the replicates being bad.
1) Yes I got much fewer NOTESTs with -c 0 (1188 instead of 11240). --min-outlier-p 0 didn't affect NOTESTs as I mentioned in an earlier post.
2) I have finally figured out that it is the --frag-bias-correct that gives extremely many NOTESTs (35560) if I use this parameter in both cufflinks then in cuffdiff. Does not seem to affect NOTESTs when used in only one of them or disabled (11240). --multi-read-correct and --upper-quartile-norm does not seem to affect the number of NOTESTs either in cufflinks, cuffdiff, or in both, at least when I have tested it.
3) The csDendro plot with replicates=T looks good for the 2+2 and 3+3 replicate runs. The two conditions end up in different clades with almost equal branch lengths. For 4+4 it's the only plot in cummeRbund that doesn't work for me. I don't know why.. It gives this error message:
Error in plot.window(...) : need finite 'ylim' values
In addition: There were 32 warnings (use warnings() to see them)
Summarized:
--frag-bias-correct gave more NOTESTs when used in both cufflinks and cuffdiff. I will avoid doing this, lesson learned.
When adding parameter -c 0, it gave fewer NOTESTs. Should I use this?
Reference.gtf in cuffmerge then cuffdiff did not give better confidence intervals but instead gives the same problem that the error bars increase when adding more replicates.
Any idea what the problem might be? Again, thanks for helping me, I appreciate it a lot.
Comment
Latest Articles
Collapse
-
by seqadmin
In recent years, precision medicine has become a major focus for researchers and healthcare professionals. This approach offers personalized treatment and wellness plans by utilizing insights from each person's unique biology and lifestyle to deliver more effective care. Its advancement relies on innovative technologies that enable a deeper understanding of individual variability. In a joint documentary with our colleagues at Biocompare, we examined the foundational principles of precision...-
Channel: Articles
01-27-2025, 07:46 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Genetic Mapping of Plasmodium knowlesi Identifies Essential Genes and Drug Resistance Mechanisms
by seqadmin
Started by seqadmin, Yesterday, 09:30 AM
|
0 responses
16 views
0 likes
|
Last Post
by seqadmin
Yesterday, 09:30 AM
|
||
Started by seqadmin, 02-05-2025, 10:34 AM
|
0 responses
28 views
0 likes
|
Last Post
by seqadmin
02-05-2025, 10:34 AM
|
||
Started by seqadmin, 02-03-2025, 09:07 AM
|
0 responses
27 views
0 likes
|
Last Post
by seqadmin
02-03-2025, 09:07 AM
|
||
Started by seqadmin, 01-31-2025, 08:31 AM
|
0 responses
35 views
0 likes
|
Last Post
by seqadmin
01-31-2025, 08:31 AM
|
Comment