Hi all,
Recently, their have been many posts regarding the computation of "per base" sequence coverage as well as empirically calculating a histogram of sequence coverage. I just finished a rewrite of genomeCoverageBed in the BEDTools suite. The new version is pretty fast and for human, it uses roughly 2Gb of RAM (~ max(chromosome size) * 8 bytes).
On my Linux box, I was able to compute a histogram of the coverage of the mouse genome from 100 million aligned reads (in BED format) in 8 minutes with a peak at 2Gb of RAM.
Note that per base coverage can be reported with the "-d" option.
I hope others find this to be of use.
Aaron
Recently, their have been many posts regarding the computation of "per base" sequence coverage as well as empirically calculating a histogram of sequence coverage. I just finished a rewrite of genomeCoverageBed in the BEDTools suite. The new version is pretty fast and for human, it uses roughly 2Gb of RAM (~ max(chromosome size) * 8 bytes).
On my Linux box, I was able to compute a histogram of the coverage of the mouse genome from 100 million aligned reads (in BED format) in 8 minutes with a peak at 2Gb of RAM.
Note that per base coverage can be reported with the "-d" option.
I hope others find this to be of use.
Aaron