Hi all, I am not sure if it is quite appropriate to ask the question here, but I really appreciate it if anyone can give me some suggestions or comments here.
As we all know, by mapping back the reads from RNA-seq back to the reference genome and counting the number of reads that fall in the region of a gene of interest, we can roughly estimate the gene expression level by the definition of RPKM, which means the number of reads per kilobase per million mapped reads. On the other hand, we know that gene expression is to make the RNA copies from DNA which contains the information for functional product such as protein. The number of RNA copies of a gene of interest may relate to the amount of the product (e.g., protein). Namely, more RNA copies, more associated protein. (If not, why we care about the gene expression at the RNA level?) However, I am not sure if the relation between the number of RNA copies and the protein amount is linear or not. The linear relationship is to say, 10 copies and 11 copies of RNA of a gene will make the 10 units and 11 units of protein (or proportionally), respectively. However, I think in the real world of living things, the manufacturing of the final product would be more robust if the RNA copies, regarding as the mid-product, get saturated. I mean that the functional product may not so sensitive to the exact number of RNA copies (otherwise, the cells need study to count everything). I am wondering whether at most cases, the RNA of a gene is saturated. So it would make no sense to count the exact number of RNA copies, and then to compare the numbers between samples in a precise way. Some statistical test method such as Fisher's exact test has more power when the numbers getting bigger, and however in the other hand the bigger numbers make the mid-product easier be saturated. In the microarray era, the fold change measurement is regarded to be the best to identify gene expression difference. As RNA-seq is becoming widely used, it is commonly thought that RNA-seq can measure the gene expression level digitally, and the fold change measure for gene expression difference may not be the best. I am arguing here that we should also use the fold change even on the RNA-seq data. The systems of lives would not be that exact.
Ok, I wrote a lot here, thanks for reading. As I don’t have a biology background, my view can be incorrect (please help me to correct). Any comments are welcome
Xi
As we all know, by mapping back the reads from RNA-seq back to the reference genome and counting the number of reads that fall in the region of a gene of interest, we can roughly estimate the gene expression level by the definition of RPKM, which means the number of reads per kilobase per million mapped reads. On the other hand, we know that gene expression is to make the RNA copies from DNA which contains the information for functional product such as protein. The number of RNA copies of a gene of interest may relate to the amount of the product (e.g., protein). Namely, more RNA copies, more associated protein. (If not, why we care about the gene expression at the RNA level?) However, I am not sure if the relation between the number of RNA copies and the protein amount is linear or not. The linear relationship is to say, 10 copies and 11 copies of RNA of a gene will make the 10 units and 11 units of protein (or proportionally), respectively. However, I think in the real world of living things, the manufacturing of the final product would be more robust if the RNA copies, regarding as the mid-product, get saturated. I mean that the functional product may not so sensitive to the exact number of RNA copies (otherwise, the cells need study to count everything). I am wondering whether at most cases, the RNA of a gene is saturated. So it would make no sense to count the exact number of RNA copies, and then to compare the numbers between samples in a precise way. Some statistical test method such as Fisher's exact test has more power when the numbers getting bigger, and however in the other hand the bigger numbers make the mid-product easier be saturated. In the microarray era, the fold change measurement is regarded to be the best to identify gene expression difference. As RNA-seq is becoming widely used, it is commonly thought that RNA-seq can measure the gene expression level digitally, and the fold change measure for gene expression difference may not be the best. I am arguing here that we should also use the fold change even on the RNA-seq data. The systems of lives would not be that exact.
Ok, I wrote a lot here, thanks for reading. As I don’t have a biology background, my view can be incorrect (please help me to correct). Any comments are welcome
Xi
Comment