Hello, SEQanswers! I'm a student of biotech and I'm currently doing my bachelor thesis somewhat related to bioinformatics, and a part of it is to look at changes in expression levels for genes from RNA-seq data. I know basically nothing of this field, but I'm trying to learn, so I'm sorry if my questions are way too basic and/or stupid. I feel that bioinformatics could possibly be something I want to do for a masters degree, so I really want to dive as deep into it as I can at this opportunity!
First question, fold change. The way I understand it this is simply the abundance level (I read that RPKM is an abundance level, but I haven't read that much about it yet) for gene X in sample A divided by sample B, correct? But when I search for articles for fold change I mostly find various software for "differential expression" (which I assume is sort of the same as fold change?). I cannot find any articles that use fold change as sample A/sample B... Why is this? I assume that there is some reason that you can't or shouldn't do this calculation, but I don't understand what it is.
Second question, (which I discovered when trying to find answers to the first question) is about genes with very low abundance. Regardless of how you calculate fold change, how do you account for genes that have a very low abundance level, i.e. close to the limit of detection? For example, if you have abundance levels of sample A and sample B that is (both) close to 0, but still yield some fold change you are interested in, can you really say that the gene has a different abundance level? I mean, if both abundance levels are so close to the limit of detection they could both possibly be false, right? How do you generally account for this kind of thing, or do I just misunderstand how RNA-seq detection limits work? I read that an RPKM of 1 is approximately equivalent to 1 RNA molecule per cell, so if you have RPKMs of (for example) 0.8 and 0.2 you will have a fold change of 4, but can you really trust that number?
First question, fold change. The way I understand it this is simply the abundance level (I read that RPKM is an abundance level, but I haven't read that much about it yet) for gene X in sample A divided by sample B, correct? But when I search for articles for fold change I mostly find various software for "differential expression" (which I assume is sort of the same as fold change?). I cannot find any articles that use fold change as sample A/sample B... Why is this? I assume that there is some reason that you can't or shouldn't do this calculation, but I don't understand what it is.
Second question, (which I discovered when trying to find answers to the first question) is about genes with very low abundance. Regardless of how you calculate fold change, how do you account for genes that have a very low abundance level, i.e. close to the limit of detection? For example, if you have abundance levels of sample A and sample B that is (both) close to 0, but still yield some fold change you are interested in, can you really say that the gene has a different abundance level? I mean, if both abundance levels are so close to the limit of detection they could both possibly be false, right? How do you generally account for this kind of thing, or do I just misunderstand how RNA-seq detection limits work? I read that an RPKM of 1 is approximately equivalent to 1 RNA molecule per cell, so if you have RPKMs of (for example) 0.8 and 0.2 you will have a fold change of 4, but can you really trust that number?
Comment