Thats was crazy helpful-thanks.
-MW
Seqanswers Leaderboard Ad
Collapse
Announcement
Collapse
No announcement yet.
X
-
With cufflinks you can have three different normalizations: fragments mapped to genome (in millions), fragments mapped to transcriptome (in millions: --compatable-hits-norm) or upper quartile (-N). Regardless of the normalization the same number of reads is quantified at each gene. I've looked into it myself. If you run cufflinks using all three of those normalizations then look at each of the separate isoforms.fpkm_tracking files you can confirm it. Check for the coverage and FPKM columns. You should see different FPKMs but identical coverages across the three quantifications. Furthermore if you divide the FPKMs by each other you should see that at each gene there's a constant ratio between the FPKMs.
If you calc FPKMs yourself you can see why the numbers shift around. To be honest the "FPKM" designation is misleading when you're using any normalization other than "mapped reads in millions". Right? Fragments per kilobase per million mapped reads is what you're used to.
So say we have a gene that's 2500 bases long. We've got 121 fragments that mapped to it and we've got 34.7 million fragments mapped to the genome. We can get the FPKM like so..
Code:FPKM = 121/(34.7*2.5) = 1.394813
Code:FPKM = 121/(27.4*2.5) = 1.766423
Code:FPKM = 121/(0.012*2.5) = 4033.333
The point of these different normalizations is only applicable to when you're comparing samples to each other. So if you're goal is to see if gene X is expressed higher in Sample A verses B then regardless of the normalization used (as long as you use the same one on both samples) you'll find your answer. The upper quartile normalization has been showing to be more robust so maybe it's better to use it for comparing samples to one another. Also, obviously, for the expression levels to make sense to other people we all need to be using the same normalization. We should probably all be using upper quartile normalization but that puts the numbers on a different scale than we used to seeing.
Hope that helped.
Leave a comment:
-
Large FPKM with Cufflink
This issue have been discussed several times on this forum and I also have brought this issue on this forum that cufflink will give very high FPKM for some of genes irrepsective of their size (cole suggested for short). I will be intrested to know:
1. range of FPKM values with and without -N option.
2. What excatly you think may have happened with -N option to give high FPKM?
I have observed no solution to this high FPKM problem but you have found it will be intresting to learn more about this.
Thanks
Leave a comment:
-
Cufflinks large FPKM with -g and -N options
Perplexing,
Running Cufflinks (2.0.0 and previous version) with
the -g/--GTF-guide AND -N/--upper-quartile-norm options
can result in enormous FPKM values (~e+11), and not only
for short/novel transcripts---for many many genes.
Of 16725 genes, 12094 are nonzero (min= 432, max = 5.7e+11).
For nonzero genes:
25th %ile = 216,582
50th %ile = 1e+06
75th %ile = 3.12e+06
I appreciate that -N should just be re-scaling FPKMs
and have seen reasonable results from Cufflinks
in -G (non-deNovo) mode, but these -g levels
seem strange to me.
Removing the -N option brings the FPKM levels
back to a typical range of values.
Unfortunately the error bounds appear to be identically
equal to the mean FPKM, with and without -N, at least
for Cufflinks 2.0.0. Removing the -b option (which was
also originally used) restored the error-bars.
It seems that -g & -N together give an FPKM range that
is quite different from the range produced by -G & -N.
I would guess that the reason for this is related to tiling of
the annotated transcripts with faux reads in RABT.
Perhaps, somehow, this is making almost all genes
look like upper-quartile expressors, but this is only a guess.
Latest Articles
Collapse
-
by seqadmin
The field of immunogenetics explores how genetic variations influence immune responses and susceptibility to disease. In a recent SEQanswers webinar, Oscar Rodriguez, Ph.D., Postdoctoral Researcher at the University of Louisville, and Ruben Martínez Barricarte, Ph.D., Assistant Professor of Medicine at Vanderbilt University, shared recent advancements in immunogenetics. This article discusses their research on genetic variation in antibody loci, antibody production processes,...-
Channel: Articles
11-06-2024, 07:24 PM -
-
by seqadmin
Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...-
Channel: Articles
10-18-2024, 07:11 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 11:09 AM
|
0 responses
24 views
0 likes
|
Last Post
by seqadmin
Today, 11:09 AM
|
||
Started by seqadmin, Today, 06:13 AM
|
0 responses
20 views
0 likes
|
Last Post
by seqadmin
Today, 06:13 AM
|
||
Started by seqadmin, 11-01-2024, 06:09 AM
|
0 responses
30 views
0 likes
|
Last Post
by seqadmin
11-01-2024, 06:09 AM
|
||
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks
by seqadmin
Started by seqadmin, 10-30-2024, 05:31 AM
|
0 responses
21 views
0 likes
|
Last Post
by seqadmin
10-30-2024, 05:31 AM
|
Leave a comment: