Seqanswers Leaderboard Ad
Collapse
X
-
Do you have replicates or any other means to assess sample-to-sample variability? Then, you could use DESeq. (The real reason why Fisher's test does not work is that it implicitly assumes biological and extra-Poisson technical variation to be zero.)
-
-
Yes, this is for a form of IP. So I'm trying to gauge the enrichment of the IP over the control in a given window. I've heard that RPKK is apparently not a good measure anymore, and that length normalization actually increases variance, so I agree with your point there.
So we've opted to just use a read count ratio, normalized by total number of reads mapped in IP/control, respectively. Using Fisher's exact test produces too many p-value counts of 0s, because the enrichment is too high to be quantified with the test.
Thanks!
Leave a comment:
-
-
The division by length is plain wrong. For an enrichment score, you want to divide some measure of signal strength in IP with a measure in CNTL. If your colleagues insist that these measures should be normalized for length, they can do so. However, as both measures are divided by the same length, it cancels out. Incidentally, this is why RPKM is not so useful for differentially expression, either. Dividing by length just obscures how much evidence you have: A ratio of 5 to 2 reads has the same ratio as 500 to 200 reads, but in the latter case you can be more sure that this is a real enrichment and not just chance. This is why the raw number of reads (without normalization) is useful and also why looking at the ratio only is not sufficient.
BTW, are you talking about CLIP, or how come you have IP and control?
Leave a comment:
-
-
Computing Enrichment and RPKM
I'm conducting analysis of RNA HiSeq data, and we are trying to compute enrichment for a given window of reads in the IP over reads in our control. This window could be an entire gene, or a very small 25 bp segment within an exon. Working with some collaborators, we've been in discussion about specifically how to compute enrichment and whether or not that includes RPKM. I've now thoroughly confused myself and I was wondering if anyone had insight into better ways of computing this.
My initial method of computing enrichment was the ratio of reads in the IP to the reads in the control, normalized by total number of reads sequenced in each:
Enrichment = (#IPw / Σ IP) / (#CNTLw / Σ CNTL),
where w represents the number of reads that mapped to that given window and Σ represents the total number of reads that were mapped to the genome (as a normalization factor).
However, our collaborators insisted that we incorporate RPKM as a normalization factor (that is divide), to account for differing gene lengths, so our final equation then became:
Enrichment = (#IPw / Σ IP) / (#CNTLw / Σ CNTL) / (10^9 * #CNTLg / Σ CNTL / length),
where here #CNTLg is the number of reads that map to the gene exons (so excluding introns) and length refers to the length of the mature transcript (CDS + UTRs, no introns).
However, our results are very strange, since low RPKM values (< 1) result in a very high enrichment score, and this doesn't make sense for computing enrichment. Furthermore, through answers on this forum, it sounds like RPKM is used more for differential expression between two samples, e.g., two biological replicates, and not necessarily to be used for computing the enrichment of our IP over the control. We're not trying to find DE genes here, but trying to determine an enrichment of our IP over our control for any given window.
Discussing this with my PI, we thought perhaps excluding RPKM but normalizing solely over the transcript length might be better. One odd result of dividing the enrichment by RPKM is that you're essentially multiplying by the transcript length, which is opposite of what I'd think we're trying to achieve.
Another possibility I thought is to perhaps compute the RPKM for the control, and then compute the RPKM as such for the IP, and take the ratio of that. This at least seems consistent with what RPKM seems to have been designed for, if I'm understanding RPKM correctly, but I'm still not sure if that makes any more sense or is better than the other approaches.
Thank you very much and I greatly appreciate your help if anyone has any ideas!
Latest Articles
Collapse
-
by seqadmin
This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.
The Headliner
The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...-
Channel: Articles
03-03-2025, 01:39 PM -
-
by seqadmin
The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...-
Channel: Articles
02-24-2025, 06:31 AM -
ad_right_rmr
Collapse
News
Collapse
Topics | Statistics | Last Post | ||
---|---|---|---|---|
Started by seqadmin, Today, 05:03 AM
|
0 responses
15 views
0 reactions
|
Last Post
by seqadmin
Today, 05:03 AM
|
||
Started by seqadmin, Yesterday, 07:27 AM
|
0 responses
12 views
0 reactions
|
Last Post
by seqadmin
Yesterday, 07:27 AM
|
||
Started by seqadmin, 03-18-2025, 12:50 PM
|
0 responses
15 views
0 reactions
|
Last Post
by seqadmin
03-18-2025, 12:50 PM
|
||
Started by seqadmin, 03-03-2025, 01:15 PM
|
0 responses
185 views
0 reactions
|
Last Post
by seqadmin
03-03-2025, 01:15 PM
|
Leave a comment: