Unconfigured Ad

**swbarnes2** · 12-01-2013, 02:18 PM

I don't think that's horribly wrong, though it's not as sophisticated an approach as using the software everyone else uses. If all you want to do is to flag genes for for further testing, it might be okay.

One small problem, when you testing multiple things at the same time, like a few thousand genes, well, see the link below.

Significant

http://xkcd.com/882/

**SNPsaurus** · 12-01-2013, 02:44 PM

Let's imagine that none of your genes have actually changed expression. You will eliminate the genes without much of a fold-change, enriching for the false positives that have unlikely distributions of expression. Now you t-test the enriched set, and find some set with P values less than 0.05. Of course, just from the multiple testing you'd expect 5% of the genes to be significant, even though they actually have no difference in expression (just variation that distributes in unlikely ways). The elimination of low-fold changers will just add to the problem, and should likely be counted as a test even if you don't explicitly do so.

Throwing out genes with few reads is more acceptable, I would think. But your way is going to lead to headaches down the road as you try to make sense of the genes you found.

**gringer** · 12-01-2013, 03:33 PM

Obviously I don't get RFP, but could anyone tell me which way (common way or my way) could lead to more reliable result(s)?

You already mentioned it in your description: use edgeR, cuffdiff (or DESeq) to adjust for multiple testing based on the transcript read distribution.

**lucer105** · 12-01-2013, 08:52 PM

Originally posted by SNPsaurus View Post

Let's imagine that none of your genes have actually changed expression. You will eliminate the genes without much of a fold-change, enriching for the false positives that have unlikely distributions of expression. Now you t-test the enriched set, and find some set with P values less than 0.05. Of course, just from the multiple testing you'd expect 5% of the genes to be significant, even though they actually have no difference in expression (just variation that distributes in unlikely ways). The elimination of low-fold changers will just add to the problem, and should likely be counted as a test even if you don't explicitly do so.

Correct me if I am wrong. Do you mean my t-test actually enrich the false positive read out? As far as I know for edgeR, the inter-replicate variance is within consideration when calculating the p-value, and I think the t-test also return a high p-value if the variance is big. I think part of the answer would come from how edgeR, cuffdiff and etc. distinguish false positive result, do you know?

Thanks for your comments, sincerely~

Y.L

**lucer105** · 12-01-2013, 08:55 PM

Originally posted by gringer View Post

You already mentioned it in your description: use edgeR, cuffdiff (or DESeq) to adjust for multiple testing based on the transcript read distribution.

Thanks for the comments, what I will do is to combine my way with differentally expressed gene programs. The reason that I was asking is to make sure that the first small set of data analysis I did is not P.O.S....

Again, thanks.

**lucer105** · 12-01-2013, 08:59 PM

Originally posted by swbarnes2 View Post

I don't think that's horribly wrong, though it's not as sophisticated an approach as using the software everyone else uses. If all you want to do is to flag genes for for further testing, it might be okay.

One small problem, when you testing multiple things at the same time, like a few thousand genes, well, see the link below.

http://xkcd.com/882/

Thanks for the comments. That was exactly the purpose----find a few hundred genes well beyond the range of noise. I was thinking by dumping out low read genes and small fold change genes I got more solid result although I sacrifice lots of real read out, but not quite sure whether I was right about this.....cause I have no way to compare my method and common method.

Y.L.

Topics	Statistics	Last Post
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, Yesterday, 11:58 AM	0 responses 9 views 0 reactions	Last Post by SEQadmin2 Yesterday, 11:58 AM
A New Method Makes Hantavirus Genome Analysis Faster and More Accessible by SEQadmin2 Started by SEQadmin2, 06-05-2026, 10:09 AM	0 responses 25 views 0 reactions	Last Post by SEQadmin2 06-05-2026, 10:09 AM
A New Single-Cell Method Maps DNA-Protein Interactions by SEQadmin2 Started by SEQadmin2, 06-04-2026, 08:59 AM	0 responses 35 views 0 reactions	Last Post by SEQadmin2 06-04-2026, 08:59 AM
Long-Read RNA Sequencing Uncovers a Hidden Layer of Immune Cell Regulation by SEQadmin2 Started by SEQadmin2, 06-02-2026, 12:03 PM	0 responses 57 views 0 reactions	Last Post by SEQadmin2 06-02-2026, 12:03 PM

Unconfigured Ad

Am I wrong by doing this?

Comment

Comment

Comment

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News