Originally posted by skruglyak
View Post
We are running three HiSeqs and a few GAs; reading and rewriting a few hundred gigabytes of compressed sequence data just to fix a deficient header is quite annoying IMHO.
I do agree SAM would be a nice option for data storage (it should probably not replace fastq yet, many people do still use fastq as input for their programs).
If it very wise to use a binary (sequencing specific) storage format like BAM ... I don't know, just a bad feeling :-)
Strange enough (never mentioned) ... lots of IT folks would appreciate if the "we create many, many files" madness would be limited to some reasonable number.
1,629,325 files for a 2x120 run is by far too much ...
just my 2p,
Sven
Comment