Hi All,
I am very new for the sequencing world. To be honest, I haven't touched any sequencing data yet. So forgive me if my question is trivial.
I plan to run RNA-seq for 5 human samples in one lane on Illumina GA IIx. Now it could provide 40 million reads per lane. I order the 100bp pair-end reads.
So I am considering to estimate the coverage like how my experiment covers the low expressed genes, I did an estimation like this:
Now for each samples, it covers
8 X10^6 reads, so 8X10^8 base pair
Assume that the average gene length for human genome is 1000 bp, and the for each base pair, the coverage of 10 would be enough to make sure it's real, so for each gene we need 10^4 base pairs to make it sure;
1/(8X10^6/10^4)~ 0.001%, so it means for gene whose expression abundance ~0.001% of the whole transcriptome, we can detect it.
Since I have to write a proposal, I have think of a way to estimate it, can you tell me whether this is totally wrong or it's ok to think this way, but the number should be modified?