Seqanswers Leaderboard Ad

**mbblack** · 08-19-2014, 12:04 PM

In a case such as that, the first thing I would do is only even include the subset of genes that you did actually detect in all samples in the comparison. As you indicated, you cannot argue that a failure to detect equates to the absence of expression, so you really should not even be considering such genes in your comparison.

For my own analyses, the first thing I do after mapping reads is derive the subset of features that actually have a raw count greater than zero in all my samples. I only analyze that feature set for differential expression.

**mistrm** · 08-19-2014, 12:40 PM

Removing features which have zero count in more than one sample, will leave me with a very small subset. DESeq2 applies filters where genes with all zero counts are removed AND rows that have extreme count outlier samples - which already reduces the feature set to less than half. Although I suppose that is one way to ensure that lack of reads in not the reason for the observed change

**dpryan** · 08-19-2014, 01:17 PM

Have a look at a PCA plot and/or hierarchical clustering plot and see if the difference in library size is causing one or more samples to be obvious outliers. I've not seen that happen for ~5x size differences, but certainly for >=10x and wouldn't rule it out in any case.

**mbblack** · 08-20-2014, 04:27 AM

Originally posted by mistrm View Post

Removing features which have zero count in more than one sample, will leave me with a very small subset. DESeq2 applies filters where genes with all zero counts are removed AND rows that have extreme count outlier samples - which already reduces the feature set to less than half. Although I suppose that is one way to ensure that lack of reads in not the reason for the observed change

Not to sound harsh, but to my mind, it is immaterial how much it reduces your feature set. The reality is that including DE calls for genes where one of the references is to a sample for which you actually have no data (failure to detect) is simply not valid. If you ran two qPCR reactions, and one worked giving valid data and the other did not and thus gave no data, would you include that gene in your results? Any genes you want to talk about as differentially expressed, you need to have an actual measure of expression for each sample in the comparison. There is a certain stochasticity in detection of low expressors, as those are inherently the rarer transcripts in your sample, so not having even detected anything in one sample makes any statement about differential expression relative to another highly suspect.

If dpryan's suggestion doesn't yield any obvious abberant samples, and you need a larger feature set, then you should either add more replicates or more reads per sample. Do you still have any material left you could sequence further to increase read depth?

**dpryan** · 08-20-2014, 04:30 AM

Originally posted by mbblack View Post

Not to sound harsh, but to my mind, it is immaterial how much it reduces your feature set.

I couldn't agree more. The name of the game is not creating undue extra work and headaches for yourself.

**mistrm** · 08-20-2014, 05:49 AM

Agree with you both. Though (just for discussion purpose), instead of removing features that have a zero count in any sample across both groups wouldn't it make sense to remove only features that have zero count in Group1 (the group with lower depth samples). For Group2 if there is zero count, there are enough reads to more reliably conclude features as low expressors as opposed to failure to detect. Particularly, if there is increased expression of these low expressors in Group1, we would want to capture those changes.

There is still material left and will likely to sequence further as it seems the best solution. Thanks for all the help!

**mbblack** · 08-20-2014, 08:01 AM

Not to my mind. You cannot say anything about differential expression based on the absence of data, regardless of what you see in the other sample. Nor can you, to my mind, say that an absence of data, at any read depth, is equal to an absence of expression. There is simply far too much variability in low expressor detection to say that, regardless of read depth. Again, an absence of count data cannot be taken as an absence of a transcript nor absence of expression of that transcript.

Typically as you increase read depth, you see an ever increasing accumulation of counts for transcripts already detected. Your probability of detecting very low expressors does not change all that much at all, and there will always be a low but persistent probability of detection of novel transcripts relative to higher count features at even read depths of hundreds of millions of reads per sample.

You say "if there is increased expression of these low expressors in Group1" but how can you say anything about relative expression (increased or decreased) if you do not have any actual data for that transcript in Group 2? All you know is you saw it in Group 1 and did not see it in Group 2, but you have no conclusive information about just why you did not see it in Group 2 (was it truly not expressed, or was it expressed and just missed due to the inherent vagaries of detection in every RNA seq experiment?).

The only valid contrasts you can make are between samples/groups for which you actually have data in both. For those where you have no data in one group, all you can say is you detected gene "x" in one, and did not detect it in the other - that's it. To infer anything else about the relative relationship of the two groups is pure speculation, and one for which you do not have supportive data since you have no data at all for one group.

If your goal is to truly demonstrate the absence of expression in one group, then RNA-seq was never the appropriate experiment to use in the first place.

Topics	Statistics	Last Post
New Model Aims to Explain Polygenic Diseases by Connecting Genomic Mutations and Regulatory Networks by seqadmin Started by seqadmin, Yesterday, 05:31 AM	0 responses 10 views 0 likes	Last Post by seqadmin Yesterday, 05:31 AM
Small Blood Stem Cell Subset Linked to Immune System Aging by seqadmin Started by seqadmin, 10-24-2024, 06:58 AM	0 responses 20 views 0 likes	Last Post by seqadmin 10-24-2024, 06:58 AM
New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types by seqadmin Started by seqadmin, 10-23-2024, 08:43 AM	0 responses 48 views 0 likes	Last Post by seqadmin 10-23-2024, 08:43 AM
Microbes in Urban Spaces Adapt to Disinfectants and Scarce Resources by seqadmin Started by seqadmin, 10-17-2024, 07:29 AM	0 responses 58 views 0 likes	Last Post by seqadmin 10-17-2024, 07:29 AM

Seqanswers Leaderboard Ad

Announcement

DESeq2 finding differential expression changes with libraries of different sizes

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News