Is anybody familiar with the relative speeds of .bam file i/o and decompression? My algorithm is bottlenecking when I try to load reads from a .bam file.
Using BAMTools, I jump to certain, specific locations in the genome, and load the reads in a small region there (this involves decompression). I do this many, many times, loading many reads.
I am thinking it may be more efficient to create a mini-BAM file (only the regions of concern), then load this into memory (cache it), and then search and load the reads. But will the required decompressing from the compressed .bam format still make things too slow?
Using BAMTools, I jump to certain, specific locations in the genome, and load the reads in a small region there (this involves decompression). I do this many, many times, loading many reads.
I am thinking it may be more efficient to create a mini-BAM file (only the regions of concern), then load this into memory (cache it), and then search and load the reads. But will the required decompressing from the compressed .bam format still make things too slow?
Comment