With Oxford Nanopore data, basecalling is a crucial and well-documented step, using one of their assorted free and open-source basecaller algorithms for different levels of accuracy and speed. But for Illumina data, basecalling algorithms are barely ever mentioned. How come?
You used to be able to get .CIF files (Cluster Intensity Files), containing the raw signal values of the different dyes, and being the main input of any basecaller program. But these don't seem to be supported anymore, at least not for high-throughput technologies like HiSeq/NovaSeq. I wonder, is it still possible to do your own basecalling on Illumina data, and which alternative algorithms exist that actually offer an improvement of accuracy?
You used to be able to get .CIF files (Cluster Intensity Files), containing the raw signal values of the different dyes, and being the main input of any basecaller program. But these don't seem to be supported anymore, at least not for high-throughput technologies like HiSeq/NovaSeq. I wonder, is it still possible to do your own basecalling on Illumina data, and which alternative algorithms exist that actually offer an improvement of accuracy?