I'm using HaplotypeCaller with the -L option, where I explicitly state the interval. I am trying to break my task into pieces and assign it to different jobs for it to complete faster. In one run, I am running it, from say,
Chr1:1-1000000
and another run, I am running it twice, from
Chr1:1-500000 on one jobs and
Chr1:500001-1000000 on another
essentially splitting the interval into 2 different jobs. What I am seeing in this second run is that there are SNPs identified +/- 100 bases from 500000 that are not found in the first run.
My guess is that asking HC to focus only on a region (ie -L 1-500000) does not allow the local aligner to reassemble properly the reads, and hence results in spurious reads and SNPs. I was hoping that by specifying -L, it does the local aligner in a larger region and just report the SNPs in the -L region.
Has anybody heard of this or have a way around?
Chr1:1-1000000
and another run, I am running it twice, from
Chr1:1-500000 on one jobs and
Chr1:500001-1000000 on another
essentially splitting the interval into 2 different jobs. What I am seeing in this second run is that there are SNPs identified +/- 100 bases from 500000 that are not found in the first run.
My guess is that asking HC to focus only on a region (ie -L 1-500000) does not allow the local aligner to reassemble properly the reads, and hence results in spurious reads and SNPs. I was hoping that by specifying -L, it does the local aligner in a larger region and just report the SNPs in the -L region.
Has anybody heard of this or have a way around?