BAM files are compressed using a variant of GZIP (GNU ZIP), called BGZF (Blocked GNU Zip Format). Anyone who has read the SAM/BAM Specification will have seen the terms BGZF and virtual offsets, but what you may not realise is how general purpose this is for random access sections of any large compressed file.
I wrote the above blog post looking at BGZF applied to FASTA, SwissProt and UniProt-XML sequences. In short: BGZF files are bigger than GZIP files, but they are much faster for random access.
So, should we all be considering using BGZF in preference to GZIP?
I wrote the above blog post looking at BGZF applied to FASTA, SwissProt and UniProt-XML sequences. In short: BGZF files are bigger than GZIP files, but they are much faster for random access.
So, should we all be considering using BGZF in preference to GZIP?
Comment