Unconfigured Ad

**dpryan** · 08-22-2012, 10:27 AM

Have a look at this paper.

**aoifemc** · 08-23-2012, 01:11 AM

Great-Thanks!!

**Simon Anders** · 08-23-2012, 01:06 PM

Note also that we have recently expanded the DESeq vignette with a section discussing such filtering.

**pinki999** · 01-16-2013, 04:57 AM

Can we carry out VST (variance stabilizing transformation) after the filtering step?

**rfilbert** · 01-16-2013, 07:10 AM

pre-filtering is a bad idea. Why do people do it? Because the software can't handle large data?

**dpryan** · 01-16-2013, 07:18 AM

Originally posted by rfilbert View Post

pre-filtering is a bad idea. Why do people do it? Because the software can't handle large data?

There's no benefit to performing tests on genes that have no chance at showing a difference due to counts being too low. Had you bothered to read the paper I referenced, you would have known that.

**chadn737** · 01-16-2013, 07:20 AM

Originally posted by rfilbert View Post

pre-filtering is a bad idea. Why do people do it? Because the software can't handle large data?

1) Specifically why is prefiltering "a bad idea"?

2) It has nothing to do with not being able to handle "large data". Detection power is reduced due to the number of genes tested, and is true for ALL software. Prefiltering is a way of addressing the issue by removing those genes that are unlikely to be differentially expressed and so reduce the overall number of tests performed.

**pinki999** · 01-16-2013, 07:20 AM

But, is it a good idea to filter before variance stabilizing?

**rfilbert** · 01-16-2013, 07:26 AM

why pre-filter at all? It is only an opportunity for false negatives. I think the only reason for filtering is software that can't handle the whole genome.

**rfilbert** · 01-16-2013, 07:29 AM

Originally posted by dpryan View Post

There's no benefit to performing tests on genes that have no chance at showing a difference due to counts being too low. Had you bothered to read the paper I referenced, you would have known that.

Clearly you have little background or access to a real statistician. Hello World! You must filter out low abundance transcripts - they are clearly not important!

**chadn737** · 01-16-2013, 07:47 AM

Originally posted by rfilbert View Post

why pre-filter at all? It is only an opportunity for false negatives. I think the only reason for filtering is software that can't handle the whole genome.

Clearly you have little background or access to a real statistician. Hello World! You must filter out low abundance transcripts - they are clearly not important!

Clearly you don't either, other than the salesman at Partek.

Prefiltering reduces False Negatives. When one prefilters, you are usually removing genes with zero or very few counts. At very low count numbers the shot noise can dominate and all but the most significant changes will not be considered differentially expressed. As typically there will be hundreds, if not thousands of genes with zero to only a few reads, this has a huge affect on multiple testing correction and can lead to a large number of False Negatives. Removing these genes relaxes the multiple testing correction so that more of these genes pass the test.

Prefiltering can increase False Positives, but you said False Negatives, not False Positives. However, this can largely be mitigated if you filter using methods like those described in the PNAS paper linked to previously.

Every program for differential expression has no problem handling entire genomes. I have used DESeq and EdgeR on genomes twice the size of the human genome with ease. Try using them before you blindly criticize.

**chadn737** · 01-16-2013, 07:49 AM

Originally posted by pinki999 View Post

But, is it a good idea to filter before variance stabilizing?

I'm not sure. What are you wanting to do with the variance stabilized data? As Filtering is usually done to increase detection of differential expression, there may not be any advantage in doing it for other purposes.

**rfilbert** · 01-16-2013, 06:06 PM

Originally posted by chadn737 View Post

Clearly you don't either, other than the salesman at Partek.

Prefiltering reduces False Negatives. When one prefilters, you are usually removing genes with zero or very few counts. At very low count numbers the shot noise can dominate and all but the most significant changes will not be considered differentially expressed. As typically there will be hundreds, if not thousands of genes with zero to only a few reads, this has a huge affect on multiple testing correction and can lead to a large number of False Negatives. Removing these genes relaxes the multiple testing correction so that more of these genes pass the test.

Prefiltering can increase False Positives, but you said False Negatives, not False Positives. However, this can largely be mitigated if you filter using methods like those described in the PNAS paper linked to previously.

Every program for differential expression has no problem handling entire genomes. I have used DESeq and EdgeR on genomes twice the size of the human genome with ease. Try using them before you blindly criticize.

Um, if you filter out a gene that is truly differentially expressed, that is a false negative. Are you a statistician? Seems you are not.

**chadn737** · 01-16-2013, 06:12 PM

Originally posted by rfilbert View Post

Um, if you filter out a gene that is truly differentially expressed, that is a false negative. Are you a statistician? Seems you are not.

I see you aren't getting it.

Genes with very few reads, of which there can be hundreds or thousands, are unlikely to be called as differentially expressed, unless the differences are very large.

On the other hand, because of multiple testing correction, many genes of higher expression will not pass the threshold of being considered differentially expressed.

Filtering will result in a handful of genes with low expression that are differentially expressed being discarded, but will allow for many more genes of higher expression to pass the threshold of being differentially expressed.

So overall MORE genes are called differentially expressed and the overall number of false negatives is decreased.

You should take a day or two to actually read some papers on the matter. Like the linked PNAS paper, the DESeq, EdgeR, etc papers and vignettes. These all explain exactly what they do so that nothing is hidden, unlike the black boxes that you are putting your data into.

Topics	Statistics	Last Post
High-Resolution Sequencing Exposes Hidden Toxoplasma Diversity by SEQadmin2 Started by SEQadmin2, 07-02-2026, 11:08 AM	0 responses 9 views 0 reactions	Last Post by SEQadmin2 07-02-2026, 11:08 AM
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, 06-30-2026, 05:37 AM	0 responses 13 views 0 reactions	Last Post by SEQadmin2 06-30-2026, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 20 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 54 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM

Unconfigured Ad

DESeq filtering

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News