Code:
--scaffolds/--gazillion Users working with unfinished genomes sporting tens or even hundreds of thousands of scaffolds/contigs/chromosomes frequently encountered errors with pre-sorting reads to individual chromosome files. These errors were caused by the operating system's limit of the number of filehandle that can be written to at any one time (typically 1024; to find out this limit on Linux, type: ulimit -a). To bypass the limitation of open filehandles, the option --scaffolds does not pre-sort methylation calls into individual chromosome files. Instead, all input files are temporarily merged into a single file (unless there is only a single file), and this file will then be sorted by both chromosome AND position using the Unix sort command. Please be aware that this option might take a looooong time to complete, depending on the size of the input files, and the memory you allocate to this process (see --buffer_size). Nevertheless, it seems to be working.
Leave a comment: