Originally posted by oria34
View Post
If however you're working on a genome which isn't assembled into chromosomes but instead has a large number of assembly contigs then bismark will try to open a file for each contig. On all operating systems there is a limit to the number of files which can simultaneously be open for writing, and if the number of contigs is larger than the number of allowed filehandles then the script will fail. On the linux systems we checked here the limit is 1024 files, so if you had more contigs than this then this would trigger the problem.
The quick and ugly fix is to increase the number of allowed filehandles on your system. Another option would be to remove very short contigs from your assembly, as these usually make up the majority of total contigs, but contribute very little uniquely mappable sequence.
We have thought about whether we could easily fix this within bismark but the compromises in terms of efficiency to dynamically close and reopen filehandles as required are quite nasty and this probably isn't something we're going to implement in the near future.
Comment