I'm trying to get variant calling to work using a small set of simulated reads of the phi X genome. The problem I'm seeing is that indels are assigned a quality of zero and their vcf entries are incorrect.
I used wgsim from samtools to generate simulated paired-end fastq read files of varying depths of coverage, then used bwa to align the simulated reads to the phi X reference. I followed the GATK recommendations to realign around possible indels, then to recalibrate, then I ran the UnifiedGenotyper using the -glm INDEL option along with --output-mode EMIT_ALL_SITES.
The analysis info shows something like this:
The corresponding vcf file shows (I added a space after the "GT:" to keep from happening):
The vcf entries correspond to actual indels in the simulated data. For example, at position 870 there is an insertion of CC, and 876 there is a deletion of CT. The problem is the the "ALT" column shows a "." where it should have the actual alternate allele (TCC and G in those two cases).
What can I do to 1) make the quality not zero; and 2) get the vcf file to correctly report the alternative allele?
I used wgsim from samtools to generate simulated paired-end fastq read files of varying depths of coverage, then used bwa to align the simulated reads to the phi X reference. I followed the GATK recommendations to realign around possible indels, then to recalibrate, then I ran the UnifiedGenotyper using the -glm INDEL option along with --output-mode EMIT_ALL_SITES.
The analysis info shows something like this:
INFO 12:20:08,764 UnifiedGenotyper - Visited bases 5416
INFO 12:20:08,765 UnifiedGenotyper - Callable bases 7
INFO 12:20:08,765 UnifiedGenotyper - Confidently called bases 0
INFO 12:20:08,765 UnifiedGenotyper - % callable bases of all loci 0.129
INFO 12:20:08,765 UnifiedGenotyper - % confidently called bases of all loci 0.000
INFO 12:20:08,766 UnifiedGenotyper - % confidently called bases of callable loci 0.000
INFO 12:20:08,766 UnifiedGenotyper - Actual calls made 7
INFO 12:20:08,766 TraversalEngine - Total runtime 1.08 secs, 0.02 min, 0.00 hours
INFO 12:20:08,765 UnifiedGenotyper - Callable bases 7
INFO 12:20:08,765 UnifiedGenotyper - Confidently called bases 0
INFO 12:20:08,765 UnifiedGenotyper - % callable bases of all loci 0.129
INFO 12:20:08,765 UnifiedGenotyper - % confidently called bases of all loci 0.000
INFO 12:20:08,766 UnifiedGenotyper - % confidently called bases of callable loci 0.000
INFO 12:20:08,766 UnifiedGenotyper - Actual calls made 7
INFO 12:20:08,766 TraversalEngine - Total runtime 1.08 secs, 0.02 min, 0.00 hours
The corresponding vcf file shows (I added a space after the "GT:" to keep from happening):
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT somesample
phix.fa 869 . T . 0 LowQual DP=183;MQ=48.18;MQ0=0;QD=0.00 GT: DP:PL ./.:183:0,0,0
phix.fa 875 . GCT . 0 LowQual DP=190;MQ=49.29;MQ0=0;QD=0.00 GT: DP:PL ./.:190:0,0,0
phix.fa 1408 . T . 0 LowQual DP=209;MQ=52.38;MQ0=0;QD=0.00 GT: DP:PL ./.:209:0,0,0
phix.fa 1409 . A . 0 LowQual DP=213;MQ=52.17;MQ0=0;QD=0.00 GT: DP:PL ./.:213:0,0,0
phix.fa 1949 . G . 0 LowQual DP=205;MQ=57.70;MQ0=0;QD=0.00 GT: DP:PL ./.:205:0,0,0
phix.fa 1950 . A . 0 LowQual DP=205;MQ=57.82;MQ0=0;QD=0.00 GT: DP:PL ./.:205:0,0,0
phix.fa 4225 . AC . 0 LowQual DP=179;MQ=57.51;MQ0=0;QD=0.00 GT: DP:PL ./.:179:0,0,0
phix.fa 869 . T . 0 LowQual DP=183;MQ=48.18;MQ0=0;QD=0.00 GT: DP:PL ./.:183:0,0,0
phix.fa 875 . GCT . 0 LowQual DP=190;MQ=49.29;MQ0=0;QD=0.00 GT: DP:PL ./.:190:0,0,0
phix.fa 1408 . T . 0 LowQual DP=209;MQ=52.38;MQ0=0;QD=0.00 GT: DP:PL ./.:209:0,0,0
phix.fa 1409 . A . 0 LowQual DP=213;MQ=52.17;MQ0=0;QD=0.00 GT: DP:PL ./.:213:0,0,0
phix.fa 1949 . G . 0 LowQual DP=205;MQ=57.70;MQ0=0;QD=0.00 GT: DP:PL ./.:205:0,0,0
phix.fa 1950 . A . 0 LowQual DP=205;MQ=57.82;MQ0=0;QD=0.00 GT: DP:PL ./.:205:0,0,0
phix.fa 4225 . AC . 0 LowQual DP=179;MQ=57.51;MQ0=0;QD=0.00 GT: DP:PL ./.:179:0,0,0
What can I do to 1) make the quality not zero; and 2) get the vcf file to correctly report the alternative allele?
Comment