Hi Guys
This might seem like a very trivial question but strangely enough I am not able to come up with an acceptable answer.
I am trying to calculate the size of human exome and human transcriptome in #bases for coverage purposes.
Here is what I did
Downloaded mRNA, exons, refSeq genes BED file from UCSC and summed up the total number of bases in each of those files / feature. Clearly there are overlapping regions in each of these annotation files but the #base that I am getting is far from the numbers one would accept. here is what I am seeing.
1. Total #bases in mRNA : 14,881,824,369
2. Total #bases in exons : 99,752,470
3. Total # bases in RefSeq Genes : 2,011,862,672
Just wondering if I should count the bases common to two genes twice or only uniq regions should be counted.
Any pointers from your experience will help.
Thanks!
-Abhi
This might seem like a very trivial question but strangely enough I am not able to come up with an acceptable answer.
I am trying to calculate the size of human exome and human transcriptome in #bases for coverage purposes.
Here is what I did
Downloaded mRNA, exons, refSeq genes BED file from UCSC and summed up the total number of bases in each of those files / feature. Clearly there are overlapping regions in each of these annotation files but the #base that I am getting is far from the numbers one would accept. here is what I am seeing.
1. Total #bases in mRNA : 14,881,824,369
2. Total #bases in exons : 99,752,470
3. Total # bases in RefSeq Genes : 2,011,862,672
Just wondering if I should count the bases common to two genes twice or only uniq regions should be counted.
Any pointers from your experience will help.
Thanks!
-Abhi
Comment