Algorithms for Short Read Assembly, Alignment & Variation Analysis
Special Interest Group, ISMB 2008. Toronto, July 19 2008
Next generation, rapid, low-cost genome sequencing promises to address a broad range of genetic analysis applications. In order to increase throughput, new sequencing platforms that are appearing in the marketplace carry out many parallel reactions. This results in much shorter reads (down to 25-50 bp), but the overall throughput is enormous, with each run producing billions of base-pairs of sequence data.
While the promise of next generation sequencing (NGS) technologies has become a reality, computational methods for assembly, alignment, and variation detection using such short reads are still in their infancy. Programs and algorithms developed for Sanger-style reads must be scaled, or completely reinvented to match the characteristics of the NGS data. For example, SNP discovery from Sanger sequencing reads is reasonably well understood ? however, the situation is very different in the case of NGS, where error rates are higher to those of Sanger and alignment of short reads to reference sequences is complicated by the length of the reads. NGS platforms present not only new challenges, but also new opportunities, not only due to the massive amount of short reads, but also due to the different sequencing methodologies (e.g. dibase sequencing) and different underlying error models that are critical to distinguishing false positives from real variations. Because it is possible to generate mated reads with some NGS technologies, it is possible to infer genome rearrangements, however this will require new approaches as many reads will map to multiple locations in the genome. Short reads can also potentially be used for inferring copy number variants, but the methods for copy number detection using sequence data have not been developed. The problem of short read assembly and co-assembly is equally important, as solving this problem promises to drastically lower the cost of sequencing a new genome.
This one-day will provide a forum for in-depth presentations of the methods and discussion among the scientists working in this field.