Unconfigured Ad

**Meligethes** · 11-12-2012, 04:31 AM

I have the same "problem" !

**mbblack** · 11-13-2012, 05:43 AM

You cannot compute a fold change for a gene you did not detect, simple as that. You would not use missing data in other analyses would you? Had you done an array experiment, would you include genes which did not appear as expressed on an array? So why would you do so for this instance. If you do not have a value for transcript abundance for a gene, you will have to remove that gene from your comparisons.

Personally, I'm now in the habit of only including genes with a raw mapped count of > 10 in all my normalizations for differential gene expression analysis (I just filter the raw count table and only retain rows with a "count > 10" for all samples, and whatever remains is what I have for normalization and differential gene expression). I've also seen publications which have only used genes with RPKM values of > 0.1 for differential gene expression. The thinking is that samples with very low counts (e.g. < 10) represent estimates of transcript abundance that are too unreliable for inclusion in differential gene expression analysis.

But the bottom line is, you cannot compute fold change at all for a gene unless it was actually detected in BOTH of your sample groups. No data is no data - ignore those and go with the genes you actually have data for.

P.S. I've also seen publications using a RPKM cutoff of > 0.5. Regardless, the growing consensus in published work seems to be that a minimum value cutoff should be a best practice for DGE analysis.

**maize** · 11-17-2012, 11:57 PM

I had same problem before. Mbblack, thank you for clear answers!

I understand differential expression can only be calculated for genes expressed across all sample groups. Sample group with missing data can not be included in.

How to deal with the missing data within biological replications if each sample group is consisted of 3 biological replications? Should only gene expressed in all biological replications be considered? I saw many cases where one replication has missing data. The idea of having 3 replications is to do statistical comparison between samples (t test, each with 3 obervations). Mising values in replications make the test impossible. Any suggestions? Thanks.

**mbblack** · 11-18-2012, 07:08 AM

Originally posted by maize View Post

I had same problem before. Mbblack, thank you for clear answers!

I understand differential expression can only be calculated for genes expressed across all sample groups. Sample group with missing data can not be included in.

How to deal with the missing data within biological replications if each sample group is consisted of 3 biological replications? Should only gene expressed in all biological replications be considered? I saw many cases where one replication has missing data. The idea of having 3 replications is to do statistical comparison between samples (t test, each with 3 obervations). Mising values in replications make the test impossible. Any suggestions? Thanks.

As I said, for myself, I am now in the habit of only performing DGE on genes where I have a raw mapped count > 10 for ALL samples (meaning all replicates as well). That is my minimum inclusive cutoff for any gene - all samples/replicates must have a mapped read count of > 10. Any gene(s) with any sample(s) with a count not passing that cutoff are excluded from further DGE analyses.

Other published results, using RPKM, have used minimum cuttoffs of 0.1 or 0.5.

But, the bottom line is, you need to set some minimum limit for inclusion of any gene in your analyses, and then exclude those genes that fail to meet that minimum detection threshold.

**wetSEQer** · 12-26-2013, 12:33 PM

If you have 0 reads in one experiment groups, and more reads in another, you shouldn't discard them, that is the thing you are chasing for, right? Some gene completely on or off with a given sequencing depth....
I never cutoff readings based on raw counts, since there is bias towards short genes vs long genes.
I always go with RPKM and only cutoff reads based on the higher RPKM sample, if you trust all the statistics, you can set a "small" threshold, if you need qPCR to confirm, I guess you need a large threshold, I used 10.

Topics	Statistics	Last Post
High-Resolution Sequencing Exposes Hidden Toxoplasma Diversity by SEQadmin2 Started by SEQadmin2, Today, 11:08 AM	0 responses 6 views 0 reactions	Last Post by SEQadmin2 Today, 11:08 AM
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, 06-30-2026, 05:37 AM	0 responses 11 views 0 reactions	Last Post by SEQadmin2 06-30-2026, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 18 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 52 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM

Unconfigured Ad

Rpkm=0

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News