Hello All,
I run Dindel and now I am trying to understand the output. I did not find an example describing each column in the output.
Can someone please point me to the right place or help me understand the glf and VCF formats.
Thanks!
EHC
(1) glf file:
-what are the column mark as nBQT, nmmBQT, mLogBQ, nMMLeft, nMMRight, glf?
-Which is the likelihood score? Can one infer its quality?
-What is the dip.map line?
-Why some positions appear multiple time? Are these multiple indels?
msg index analysis_type tid lpos rpos center_position realigned_position was_candidate_in_window ref_all nref_all num_reads post_prob_variant qual est_freq logZ hapfreqs indidx msq numOffAll num_indel num_cover_forward num_cover_reverse num_unmapped_realigned var_coverage_forward var_coverage_reverse nBQT nmmBQT mLogBQ nMMLeft nMMRight glf
ok 17 dip.map Chr1 11843 11962 11903 11903 1 NA -C 6 NA 0.000429941 NA NA NA 0 0 NA NA 0 0 0 0 1 NA NA NA NA NA 0/1:0.000429941
ok 17 dip Chr1 11843 11962 11903 11892 0 NA R=>T 6 NA NA NA -67.6043 NA 0 24.1039 6 4 0 0 0 0 0 200 0 -6.88 0 0 0/0:-67.6037,0/1:-69.906,1/1:-69.9063
ok 17 dip Chr1 11843 11962 11903 11900 0 NA R=>C 6 NA NA NA -67.6043 NA 0 24.1039 6 4 0 0 0 0 0 200 0 -6.88 0 0 0/0:-67.6037,0/1:-69.906,1/1:-69.9063
ok 17 dip Chr1 11843 11962 11903 11903 1 NA -C 6 NA NA NA -67.6043 NA 0 24.1039 6 4 0 0 0 0 1 200 0 -6.88 0 0 0/0:-67.6037,0/1:-67.6134,1/1:-67.6238
ok 17 dip Chr1 11843 11962 11903 11913 0 NA R=>A 6 NA NA NA -67.6043 NA 0 24.1039 6 4 0 0 0 0 0 200 0 -6.88 0 0 0/0:-67.6037,0/1:-69.906,1/1:-69.9063
ok 18 dip.map Chr1 13262 13381 13321 13292 0 NA R=>G 294 NA 2545.49 NA NA NA 0 0 NA NA 0 0 0 0 0 NA NA NA NA NA 1/1:15.3077
ok 18 dip.map Chr1 13262 13381 13321 13298 0 NA R=>G 294 NA 2545.49 NA NA NA 0 0 NA NA 0 0 0 0 0 NA NA NA NA NA 1/1:24.2244
ok 18 dip.map Chr1 13262 13381 13321 13301 0 NA R=>G 294 NA 2545.49 NA NA NA 0 0 NA NA 0 0 0 0 0 NA NA NA NA NA 1/1:39.2562
ok 18 dip.map Chr1 13262 13381 13321 13322 1 NA +AGTGAAAGTACCGGTCCATGGTTC 294 NA 2545.49 NA NA NA 0 29 NA NA 79 0 0 68 0 NA NA NA NA NA 1/1:56.3833
(2) variantCalls.VCF
- what is the last SAMPLE column. what does 1/1:124 mean?
- what is the meaning of GT:GQ? I saw this appear in header but still can not understand that.
(##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype quality">)
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE
Chr3 6048 . T TG 1712 PASS DP=93;NF=31;NR=11;NRS=32;NFS=12;HP=1 GT:GQ 1/1:124
Chr3 6535 . C CG 37 PASS DP=98;NF=1;NR=0;NRS=2;NFS=0;HP=3 GT:GQ 1/1:7
Chr3 7873 . C CT,CTTT 49 PASS DP=70;NF=4;NR=5;NRS=18;NFS=23;HP=1 GT:GQ 1/2:3
Chr3 8105 . TAAA T 435 PASS DP=69;NF=12;NR=6;NRS=14;NFS=6;HP=1 GT:GQ 1/1:60
Chr3 8703 . T TTTA 423 hp10 DP=77;NF=1;NR=7;NRS=3;NFS=12;HP=15 GT:GQ 1/1:16
I run Dindel and now I am trying to understand the output. I did not find an example describing each column in the output.
Can someone please point me to the right place or help me understand the glf and VCF formats.
Thanks!
EHC
(1) glf file:
-what are the column mark as nBQT, nmmBQT, mLogBQ, nMMLeft, nMMRight, glf?
-Which is the likelihood score? Can one infer its quality?
-What is the dip.map line?
-Why some positions appear multiple time? Are these multiple indels?
msg index analysis_type tid lpos rpos center_position realigned_position was_candidate_in_window ref_all nref_all num_reads post_prob_variant qual est_freq logZ hapfreqs indidx msq numOffAll num_indel num_cover_forward num_cover_reverse num_unmapped_realigned var_coverage_forward var_coverage_reverse nBQT nmmBQT mLogBQ nMMLeft nMMRight glf
ok 17 dip.map Chr1 11843 11962 11903 11903 1 NA -C 6 NA 0.000429941 NA NA NA 0 0 NA NA 0 0 0 0 1 NA NA NA NA NA 0/1:0.000429941
ok 17 dip Chr1 11843 11962 11903 11892 0 NA R=>T 6 NA NA NA -67.6043 NA 0 24.1039 6 4 0 0 0 0 0 200 0 -6.88 0 0 0/0:-67.6037,0/1:-69.906,1/1:-69.9063
ok 17 dip Chr1 11843 11962 11903 11900 0 NA R=>C 6 NA NA NA -67.6043 NA 0 24.1039 6 4 0 0 0 0 0 200 0 -6.88 0 0 0/0:-67.6037,0/1:-69.906,1/1:-69.9063
ok 17 dip Chr1 11843 11962 11903 11903 1 NA -C 6 NA NA NA -67.6043 NA 0 24.1039 6 4 0 0 0 0 1 200 0 -6.88 0 0 0/0:-67.6037,0/1:-67.6134,1/1:-67.6238
ok 17 dip Chr1 11843 11962 11903 11913 0 NA R=>A 6 NA NA NA -67.6043 NA 0 24.1039 6 4 0 0 0 0 0 200 0 -6.88 0 0 0/0:-67.6037,0/1:-69.906,1/1:-69.9063
ok 18 dip.map Chr1 13262 13381 13321 13292 0 NA R=>G 294 NA 2545.49 NA NA NA 0 0 NA NA 0 0 0 0 0 NA NA NA NA NA 1/1:15.3077
ok 18 dip.map Chr1 13262 13381 13321 13298 0 NA R=>G 294 NA 2545.49 NA NA NA 0 0 NA NA 0 0 0 0 0 NA NA NA NA NA 1/1:24.2244
ok 18 dip.map Chr1 13262 13381 13321 13301 0 NA R=>G 294 NA 2545.49 NA NA NA 0 0 NA NA 0 0 0 0 0 NA NA NA NA NA 1/1:39.2562
ok 18 dip.map Chr1 13262 13381 13321 13322 1 NA +AGTGAAAGTACCGGTCCATGGTTC 294 NA 2545.49 NA NA NA 0 29 NA NA 79 0 0 68 0 NA NA NA NA NA 1/1:56.3833
(2) variantCalls.VCF
- what is the last SAMPLE column. what does 1/1:124 mean?
- what is the meaning of GT:GQ? I saw this appear in header but still can not understand that.
(##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype quality">)
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT SAMPLE
Chr3 6048 . T TG 1712 PASS DP=93;NF=31;NR=11;NRS=32;NFS=12;HP=1 GT:GQ 1/1:124
Chr3 6535 . C CG 37 PASS DP=98;NF=1;NR=0;NRS=2;NFS=0;HP=3 GT:GQ 1/1:7
Chr3 7873 . C CT,CTTT 49 PASS DP=70;NF=4;NR=5;NRS=18;NFS=23;HP=1 GT:GQ 1/2:3
Chr3 8105 . TAAA T 435 PASS DP=69;NF=12;NR=6;NRS=14;NFS=6;HP=1 GT:GQ 1/1:60
Chr3 8703 . T TTTA 423 hp10 DP=77;NF=1;NR=7;NRS=3;NFS=12;HP=15 GT:GQ 1/1:16
Comment