We have a lot of comparisons of DINDEL, complete genomes, samtools, and other advanced indels caller, for deep whole genomes, 1000x sample low-pass whole genomes (1000g) and multi-sample exomes. Guillermo del Angel, who is writing the indel caller [plus a recalibrator and filtering program] as well as developing evaluation models, just submitted our 1000G calls (like 5 minutes ago). Over the next few weeks we'll be pushing a lot of our slide decks up to our public dropbox (have a look at the GSA wiki for the link) and there'll be a treasure-trove of analyses, evaluation material, etc. soon. All of these tools -- including our own -- are doing an ok job at indel calling, but there's still a long way to do before indels are as well handled as SNPs.
Glad everyone is enjoying the GATK! If you want to see some crazy fun software engineering -- the slide archive has a presentation on automated distributed parallelism in the GATK, which is live in the codebase. I'd be very interested to hear if people start using this. Also, the engine as of yesterday is doing AWS S3 distributed logging so we should soon be aware of bugs as soon as they occur, anywhere in the world.
Glad everyone is enjoying the GATK! If you want to see some crazy fun software engineering -- the slide archive has a presentation on automated distributed parallelism in the GATK, which is live in the codebase. I'd be very interested to hear if people start using this. Also, the engine as of yesterday is doing AWS S3 distributed logging so we should soon be aware of bugs as soon as they occur, anywhere in the world.
Comment