Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Simon Anders
    replied
    Do you have replicates or any other means to assess sample-to-sample variability? Then, you could use DESeq. (The real reason why Fisher's test does not work is that it implicitly assumes biological and extra-Poisson technical variation to be zero.)

    Leave a comment:


  • ysaletore
    replied
    Yes, this is for a form of IP. So I'm trying to gauge the enrichment of the IP over the control in a given window. I've heard that RPKK is apparently not a good measure anymore, and that length normalization actually increases variance, so I agree with your point there.

    So we've opted to just use a read count ratio, normalized by total number of reads mapped in IP/control, respectively. Using Fisher's exact test produces too many p-value counts of 0s, because the enrichment is too high to be quantified with the test.

    Thanks!

    Leave a comment:


  • Simon Anders
    replied
    The division by length is plain wrong. For an enrichment score, you want to divide some measure of signal strength in IP with a measure in CNTL. If your colleagues insist that these measures should be normalized for length, they can do so. However, as both measures are divided by the same length, it cancels out. Incidentally, this is why RPKM is not so useful for differentially expression, either. Dividing by length just obscures how much evidence you have: A ratio of 5 to 2 reads has the same ratio as 500 to 200 reads, but in the latter case you can be more sure that this is a real enrichment and not just chance. This is why the raw number of reads (without normalization) is useful and also why looking at the ratio only is not sufficient.

    BTW, are you talking about CLIP, or how come you have IP and control?

    Leave a comment:


  • ysaletore
    started a topic Computing Enrichment and RPKM

    Computing Enrichment and RPKM

    I'm conducting analysis of RNA HiSeq data, and we are trying to compute enrichment for a given window of reads in the IP over reads in our control. This window could be an entire gene, or a very small 25 bp segment within an exon. Working with some collaborators, we've been in discussion about specifically how to compute enrichment and whether or not that includes RPKM. I've now thoroughly confused myself and I was wondering if anyone had insight into better ways of computing this.

    My initial method of computing enrichment was the ratio of reads in the IP to the reads in the control, normalized by total number of reads sequenced in each:
    Enrichment = (#IPw / Σ IP) / (#CNTLw / Σ CNTL),
    where w represents the number of reads that mapped to that given window and Σ represents the total number of reads that were mapped to the genome (as a normalization factor).

    However, our collaborators insisted that we incorporate RPKM as a normalization factor (that is divide), to account for differing gene lengths, so our final equation then became:
    Enrichment = (#IPw / Σ IP) / (#CNTLw / Σ CNTL) / (10^9 * #CNTLg / Σ CNTL / length),
    where here #CNTLg is the number of reads that map to the gene exons (so excluding introns) and length refers to the length of the mature transcript (CDS + UTRs, no introns).

    However, our results are very strange, since low RPKM values (< 1) result in a very high enrichment score, and this doesn't make sense for computing enrichment. Furthermore, through answers on this forum, it sounds like RPKM is used more for differential expression between two samples, e.g., two biological replicates, and not necessarily to be used for computing the enrichment of our IP over the control. We're not trying to find DE genes here, but trying to determine an enrichment of our IP over our control for any given window.

    Discussing this with my PI, we thought perhaps excluding RPKM but normalizing solely over the transcript length might be better. One odd result of dividing the enrichment by RPKM is that you're essentially multiplying by the transcript length, which is opposite of what I'd think we're trying to achieve.

    Another possibility I thought is to perhaps compute the RPKM for the control, and then compute the RPKM as such for the IP, and take the ratio of that. This at least seems consistent with what RPKM seems to have been designed for, if I'm understanding RPKM correctly, but I'm still not sure if that makes any more sense or is better than the other approaches.

    Thank you very much and I greatly appreciate your help if anyone has any ideas!

Latest Articles

Collapse

  • seqadmin
    New Genomics Tools and Methods Shared at AGBT 2025
    by seqadmin


    This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

    The Headliner
    The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
    03-03-2025, 01:39 PM
  • seqadmin
    Investigating the Gut Microbiome Through Diet and Spatial Biology
    by seqadmin




    The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
    02-24-2025, 06:31 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Today, 05:03 AM
0 responses
15 views
0 reactions
Last Post seqadmin  
Started by seqadmin, Yesterday, 07:27 AM
0 responses
12 views
0 reactions
Last Post seqadmin  
Started by seqadmin, 03-18-2025, 12:50 PM
0 responses
15 views
0 reactions
Last Post seqadmin  
Started by seqadmin, 03-03-2025, 01:15 PM
0 responses
185 views
0 reactions
Last Post seqadmin  
Working...