The GATK RealignerTargetCreator has two options for inputting data about known SNPs:
java -Xmx1g -jar /path/to/GenomeAnalysisTK.jar \
-T RealignerTargetCreator \
-R /path/to/reference.fasta \
-o /path/to/output.intervals \
[-I /path/to/input.bam] \
[-L intervals] \
[-B:snps,VCF /path/to/SNP_calls.vcf] \
[-B:indels,VCF /path/to/indel_calls.vcf] \
[-D /path/to/dbsnp.rod]
Explanation of Arguments
The -L option is used to restrict the search to a specific region or set of regions instead of the whole genome.
The -o argument is used to specify the list of intervals being output and that should in turn be passed to the realigner in the next step.
The -B snps binding would be used to pass in SNP calls so that the target creator can find clustered SNPs.
The -B indels and dbsnp bindings would be used to pass in known indel sites for the realigner to target.
I don't understand the difference between the -B and the -D options. I have used the -B option often with this file (from the GATK resource bundle):
00-All.vcf
I saw that the resource bundle also has a file called "dbsnp_132.b37.vcf", and I'm tempted to use that with the -D option, but I really don't know what I'm doing with that. Does anyone understand the difference between these options?
Thank you.
Eric
java -Xmx1g -jar /path/to/GenomeAnalysisTK.jar \
-T RealignerTargetCreator \
-R /path/to/reference.fasta \
-o /path/to/output.intervals \
[-I /path/to/input.bam] \
[-L intervals] \
[-B:snps,VCF /path/to/SNP_calls.vcf] \
[-B:indels,VCF /path/to/indel_calls.vcf] \
[-D /path/to/dbsnp.rod]
Explanation of Arguments
The -L option is used to restrict the search to a specific region or set of regions instead of the whole genome.
The -o argument is used to specify the list of intervals being output and that should in turn be passed to the realigner in the next step.
The -B snps binding would be used to pass in SNP calls so that the target creator can find clustered SNPs.
The -B indels and dbsnp bindings would be used to pass in known indel sites for the realigner to target.
I don't understand the difference between the -B and the -D options. I have used the -B option often with this file (from the GATK resource bundle):
00-All.vcf
I saw that the resource bundle also has a file called "dbsnp_132.b37.vcf", and I'm tempted to use that with the -D option, but I really don't know what I'm doing with that. Does anyone understand the difference between these options?
Thank you.
Eric
Comment