Originally posted by Fabien Campagne
View Post
We want to keep that information for posterity for many reasons, not the least of which includes the ability to bring old data "up to date" and make it comparable to new data.
Now, can you save space by not storing the bases that do not differ from the reference? Absolutely. I'd wager one could probably reduce the size of the data by about 50%, which is fantastic.
But simply saving start and end positions plus variations from the reference genome as fpepin suggested is not adequate to completely reconstruct the raw reads, I'm afraid.
Comment