Hi all,
I need help to understand vcf format. i have visited all sites (1000genomes, vcf specifications etc) still need some understanding,
Called SNPs with GATK,
GT means genotype but what is phased and unphased? what does 0/1 or 1/1 means for GT?
what is the meaning of PL, HaplotypeScore, SB?
Is AF depends on AC and AN value?
Should DP (11) of INFO tag and sum of DP (1,10:ref reads, alt reads) of FORMAT tag always be the same?
I need help to understand vcf format. i have visited all sites (1000genomes, vcf specifications etc) still need some understanding,
Called SNPs with GATK,
##fileformat=VCFv4.0
##FILTER=<ID=LowQual,Description="Low quality">
##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth (only filtered reads used for calling)">
##FORMAT=<ID=GQ,Number=1,Type=Float,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PL,Number=3,Type=Float,Description="Normalized, Phred-scaled likelihoods for AA,AB,BB genotypes where A=ref and B=alt; not applicable if site is not biallelic">
##INFO=<ID=AC,Number=.,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=DS,Number=0,Type=Flag,Description="Were any of the samples downsampled?">
##INFO=<ID=Dels,Number=1,Type=Float,Description="Fraction of Reads Containing Spanning Deletions">
##INFO=<ID=HRun,Number=1,Type=Integer,Description="Largest Contiguous Homopolymer Run of Variant Allele In Either Direction">
##INFO=<ID=HaplotypeScore,Number=1,Type=Float,Description="Consistency of the site with two (and only two) segregating haplotypes">
##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">
##INFO=<ID=MQ0,Number=1,Type=Integer,Description="Total Mapping Quality Zero Reads">
##INFO=<ID=QD,Number=1,Type=Float,Description="Variant Confidence/Quality by Depth">
##INFO=<ID=SB,Number=1,Type=Float,Description="Strand Bias">
##UnifiedGenotyper="analysis_type=UnifiedGenotyper input_file=[../../samtools-0.1.12a/samfiles/q20/GKUNU9Q04_chr1_q20_RG_sort.bam] sample_metadata=[] read_buffer_size=null phone_home=STANDARD read_filter=[] intervals=null excludeIntervals=null reference_sequence=chr1.fa rodBind=[] rodToIntervalTrackName=null BTI_merge_rule=UNION DBSNP=null downsampling_type=null downsample_to_fraction=null downsample_to_coverage=50 baq=OFF baqGapOpenPenalty=40.0 performanceLog=null useOriginalQualities=false defaultBaseQualities=-1 validation_strictness=SILENT unsafe=null num_threads=1 interval_merging=ALL read_group_black_list=null processingTracker=null restartProcessingTracker=false processingTrackerStatusFile=null processingTrackerID=-1 logging_level=INFO log_to_file=null quiet_output_mode=false debug_mode=false help=false genotype_likelihoods_model=SNP p_nonref_model=EXACT heterozygosity=0.001 pcr_error_rate=1.0E-4 sites_only=false genotype=false output_all_callable_bases=false standard_min_confidence_threshold_for_calling=50.0 standard_min_confidence_threshold_for_emitting=10.0 trigger_min_confidence_threshold_for_calling=30.0 trigger_min_confidence_threshold_for_emitting=30.0 noSLOD=false assume_single_sample_reads=null min_base_quality_score=17 min_mapping_quality_score=20 max_mismatches_in_40bp_window=3 use_reads_with_bad_mates=false max_deletion_fraction=0.05 get_indel_alleles_from_vcf=false min_indel_count_for_genotyping=5 indel_heterozygosity=1.25E-4 out=org.broadinstitute.sting.gatk.io.stubs.VCFWriterStub NO_HEADER=org.broadinstitute.sting.gatk.io.stubs.VCFWriterStub verbose_mode=null metrics_file=null annotation=[]"
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT buffalo
chr1 10740313 . A G 188.30 PASS AC=2;AF=1.00;AN=2;DP=11;Dels=0.00;HRun=1;HaplotypeScore=6.9635;MQ=26.82;MQ0=0;QD=17.12;SB=-72.04;sumGLbyD=20.12 GT:AD: DP :GQ:PL 1/1:1,10:7:21.05:221,21,0
chr1 147668098 . A T 12.05 LowQual AC=1;AF=0.50;AN=2;DP=6;Dels=0.00;HRun=0;HaplotypeScore=3.7429;MQ=121.66;MQ0=0;QD=2.01;SB=-0.01;sumGLbyD=7.25 GT:AD: DP: GQ :PL 0/1:0,6:1:1.76:42,3,0
##FILTER=<ID=LowQual,Description="Low quality">
##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Read Depth (only filtered reads used for calling)">
##FORMAT=<ID=GQ,Number=1,Type=Float,Description="Genotype Quality">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=PL,Number=3,Type=Float,Description="Normalized, Phred-scaled likelihoods for AA,AB,BB genotypes where A=ref and B=alt; not applicable if site is not biallelic">
##INFO=<ID=AC,Number=.,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
##INFO=<ID=AF,Number=.,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
##INFO=<ID=DP,Number=1,Type=Integer,Description="Total Depth">
##INFO=<ID=DS,Number=0,Type=Flag,Description="Were any of the samples downsampled?">
##INFO=<ID=Dels,Number=1,Type=Float,Description="Fraction of Reads Containing Spanning Deletions">
##INFO=<ID=HRun,Number=1,Type=Integer,Description="Largest Contiguous Homopolymer Run of Variant Allele In Either Direction">
##INFO=<ID=HaplotypeScore,Number=1,Type=Float,Description="Consistency of the site with two (and only two) segregating haplotypes">
##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">
##INFO=<ID=MQ0,Number=1,Type=Integer,Description="Total Mapping Quality Zero Reads">
##INFO=<ID=QD,Number=1,Type=Float,Description="Variant Confidence/Quality by Depth">
##INFO=<ID=SB,Number=1,Type=Float,Description="Strand Bias">
##UnifiedGenotyper="analysis_type=UnifiedGenotyper input_file=[../../samtools-0.1.12a/samfiles/q20/GKUNU9Q04_chr1_q20_RG_sort.bam] sample_metadata=[] read_buffer_size=null phone_home=STANDARD read_filter=[] intervals=null excludeIntervals=null reference_sequence=chr1.fa rodBind=[] rodToIntervalTrackName=null BTI_merge_rule=UNION DBSNP=null downsampling_type=null downsample_to_fraction=null downsample_to_coverage=50 baq=OFF baqGapOpenPenalty=40.0 performanceLog=null useOriginalQualities=false defaultBaseQualities=-1 validation_strictness=SILENT unsafe=null num_threads=1 interval_merging=ALL read_group_black_list=null processingTracker=null restartProcessingTracker=false processingTrackerStatusFile=null processingTrackerID=-1 logging_level=INFO log_to_file=null quiet_output_mode=false debug_mode=false help=false genotype_likelihoods_model=SNP p_nonref_model=EXACT heterozygosity=0.001 pcr_error_rate=1.0E-4 sites_only=false genotype=false output_all_callable_bases=false standard_min_confidence_threshold_for_calling=50.0 standard_min_confidence_threshold_for_emitting=10.0 trigger_min_confidence_threshold_for_calling=30.0 trigger_min_confidence_threshold_for_emitting=30.0 noSLOD=false assume_single_sample_reads=null min_base_quality_score=17 min_mapping_quality_score=20 max_mismatches_in_40bp_window=3 use_reads_with_bad_mates=false max_deletion_fraction=0.05 get_indel_alleles_from_vcf=false min_indel_count_for_genotyping=5 indel_heterozygosity=1.25E-4 out=org.broadinstitute.sting.gatk.io.stubs.VCFWriterStub NO_HEADER=org.broadinstitute.sting.gatk.io.stubs.VCFWriterStub verbose_mode=null metrics_file=null annotation=[]"
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT buffalo
chr1 10740313 . A G 188.30 PASS AC=2;AF=1.00;AN=2;DP=11;Dels=0.00;HRun=1;HaplotypeScore=6.9635;MQ=26.82;MQ0=0;QD=17.12;SB=-72.04;sumGLbyD=20.12 GT:AD: DP :GQ:PL 1/1:1,10:7:21.05:221,21,0
chr1 147668098 . A T 12.05 LowQual AC=1;AF=0.50;AN=2;DP=6;Dels=0.00;HRun=0;HaplotypeScore=3.7429;MQ=121.66;MQ0=0;QD=2.01;SB=-0.01;sumGLbyD=7.25 GT:AD: DP: GQ :PL 0/1:0,6:1:1.76:42,3,0
what is the meaning of PL, HaplotypeScore, SB?
Is AF depends on AC and AN value?
Should DP (11) of INFO tag and sum of DP (1,10:ref reads, alt reads) of FORMAT tag always be the same?
Comment