@blam: A not so small correction. The only solution that is actually correct was to concatenate files together, but then only keep the resulting multi-fasta file (i.e. "cat *.fa > genome.fa" and then either move genome.fa to its own directory or delete the other .fa files). The other solution is not guaranteed to always work correctly. I'm finishing a new release that will fix this. In the new version, bison_index will in fact accept a directory of fasta files, rather than needing one to specify files individually (in fact, I will explicitly remove the ability for it to handle that since it can have unintended consequences).
The previous implementation could work incorrectly in cases where the input file list didn't match the order in which the files appeared in the directory entry, which can actually change over time. What that would mean is that the files could have been indexed in one order (e.g., chr1, then chr2, then chr3, ...) but then later read into memory in a different order (e.g., chr3, then chr1, then chr2, ...), which could cause all sorts of problems. This could only occur if you passed bison_index a list of files, rather than a single multi-fasta file. While I don't expect people to get bitten by this bug, it's very much possible and I consider it a major issue. I'm testing a fix and will upload a new version within the next couple hours.
For anyone who stores the genome in a single file, this won't be an issue for you. If, however, you store chromosomes/contigs in individual files, then I recommend deleting the current indices (just "rm -rf bisulfite_genome" in the directory with the fasta files) and reindexing. The version I'm testing will always process files in the same order, regardless of their order in the dirent structure on disk, so this problem will be resolved.
The previous implementation could work incorrectly in cases where the input file list didn't match the order in which the files appeared in the directory entry, which can actually change over time. What that would mean is that the files could have been indexed in one order (e.g., chr1, then chr2, then chr3, ...) but then later read into memory in a different order (e.g., chr3, then chr1, then chr2, ...), which could cause all sorts of problems. This could only occur if you passed bison_index a list of files, rather than a single multi-fasta file. While I don't expect people to get bitten by this bug, it's very much possible and I consider it a major issue. I'm testing a fix and will upload a new version within the next couple hours.
For anyone who stores the genome in a single file, this won't be an issue for you. If, however, you store chromosomes/contigs in individual files, then I recommend deleting the current indices (just "rm -rf bisulfite_genome" in the directory with the fasta files) and reindexing. The version I'm testing will always process files in the same order, regardless of their order in the dirent structure on disk, so this problem will be resolved.
Comment