Hi all,
I am going to be running the haplotypecaller on some bunch genomes. My estimates are that each sample will take around 24hrs to run on our machines, which is too long. I was thinking of splitting the genome up into smaller batches (say each individual chromosome), and then run the caller on each one separately. This way I can utilize multiple machines and many of the cores at once, and should greatly speed up the process. My question, is after everything has run, I will have many gvcf files for each sample, will the GenotypeGVCFs be able to handle all of the files. Specifically, is the program smart enough to be able to match the sample names in the headers of the gvcfs and merge the individual files that way? Or should I combine all of the files for each sample together first, then run the GenotypeGVCFs on the merged files?
Thanks for any help
I am going to be running the haplotypecaller on some bunch genomes. My estimates are that each sample will take around 24hrs to run on our machines, which is too long. I was thinking of splitting the genome up into smaller batches (say each individual chromosome), and then run the caller on each one separately. This way I can utilize multiple machines and many of the cores at once, and should greatly speed up the process. My question, is after everything has run, I will have many gvcf files for each sample, will the GenotypeGVCFs be able to handle all of the files. Specifically, is the program smart enough to be able to match the sample names in the headers of the gvcfs and merge the individual files that way? Or should I combine all of the files for each sample together first, then run the GenotypeGVCFs on the merged files?
Thanks for any help
Comment