Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • VCF Indel encoding question

    Hi,

    I'm having problems understanding a GATK output VCF. I have read the VCF standard, but I'm obviously missing something.

    I /think/ I understand how SNPs and short indels are represented, but clearly I do not. Below is an excerpt that illustrates sites which I do not understand. I suspect it may be something to do with GATK quality filters that I'm not understanding...

    The excerpt below was generated using

    GATK -l INFO -I my.bam -R my.fa -T UnifiedGenotyper -S LENIENT -nt 8 --heterozygosity 0.1 -o test.vcf --genotype_likelihoods_model BOTH --min_base_quality_score 10 --output_mode EMIT_ALL_SITES -ploidy 2

    Thanks!

    Darren

    -------------------------------------------------------
    Code:
    CH1	225	.	T	G	12.71	LowQual	AC=1;AF=0.500;AN=2;BaseQRankSum=1.978;DP=59;Dels=0.03;FS=0.000;HaplotypeScore=10.2840;MLEAC=1;MLEAF=0.500;MQ=70.25;MQ0=8;MQRankSum=-5.349;QD=0.22;ReadPosRankSum=-3.188	GT:AD:DP:GQ:PL	0/1:41,16:55:20:20,0,1435
    CH1	226	.	T	.	121.53	.	AN=2;DP=59;MQ=70.25;MQ0=8	GT:DP	0/0:43
    CH1	227	.	A	.	121.53	.	AN=2;DP=59;MQ=70.25;MQ0=8	GT:DP	0/0:43
    CH1	228	.	T	.	121.53	.	AN=2;DP=59;MQ=70.25;MQ0=8	GT:DP	0/0:43
    CH1	229	.	A	.	115.53	.	AN=2;DP=57;MQ=69.66;MQ0=8	GT:DP	0/0:38
    CH1	230	.	C	.	115.53	.	AN=2;DP=57;MQ=69.66;MQ0=8	GT:DP	0/0:38
    CH1	231	.	T	.	115.53	.	AN=2;DP=57;MQ=69.66;MQ0=8	GT:DP	0/0:36
    CH1	232	.	G	.	115.53	.	AN=2;DP=57;MQ=69.66;MQ0=8	GT:DP	0/0:36
    CH1	233	.	C	.	115.53	.	AN=2;DP=57;MQ=69.66;MQ0=8	GT:DP	0/0:37
    CH1	234	.	A	.	139.53	.	AN=2;DP=70;MQ=59.20;MQ0=14	GT:DP	0/0:63
    CH1	235	.	A	.	175.53	.	AN=2;DP=84;MQ=51.67;MQ0=15	GT:DP	0/0:79
    CH1	236	.	A	.	175.53	.	AN=2;DP=84;MQ=51.67;MQ0=15	GT:DP	0/0:79
    CH1	237	.	T	.	175.53	.	AN=2;DP=85;MQ=51.37;MQ0=16	GT:DP	0/0:80
    CH1	238	.	A	.	175.53	.	AN=2;DP=102;MQ=46.90;MQ0=28	GT:DP	0/0:97
    CH1	238	.	A	AGAAAGAAAGCTTGTA	83.73	.	AC=1;AF=0.500;AN=2;BaseQRankSum=6.172;DP=102;FS=0.000;MLEAC=1;MLEAF=0.500;MQ=46.90;MQ0=0;MQRankSum=-6.190;QD=0.05;ReadPosRankSum=-5.733	GT:AD:DP:GQ:PL	0/1:27,25:57:99:121,0,4853
    CH1	239	.	A	.	175.53	.	AN=2;DP=102;MQ=46.90;MQ0=28	GT:DP	0/0:101
    CH1	240	.	T	.	175.53	.	AN=2;DP=102;MQ=46.90;MQ0=28	GT:DP	0/0:98
    CH1	241	.	A	.	169.53	.	AN=2;DP=108;MQ=44.14;MQ0=29	GT:DP	0/0:107
    CH1	242	.	T	.	169.53	.	AN=2;DP=109;MQ=43.94;MQ0=29	GT:DP	0/0:103
    CH1	242	.	T	.	118.27	.	AN=2;DP=109;MQ=43.94;MQ0=29	GT:AD:DP	0/0:27:55
    CH1	243	.	C	.	172.53	.	AN=2;DP=110;MQ=43.76;MQ0=29	GT:DP	0/0:108
    CH1	243	.	CTTTT	.	118.27	.	AN=2;DP=110;MQ=43.76;MQ0=29	GT:AD:DP	0/0:27:56
    CH1	244	.	T	.	91.53	.	AN=2;DP=110;MQ=43.76;MQ0=29	GT:DP	0/0:61
    CH1	245	.	T	.	91.53	.	AN=2;DP=110;MQ=43.76;MQ0=29	GT:DP	0/0:53
    CH1	246	.	T	.	73.53	.	AN=2;DP=110;MQ=43.76;MQ0=29	GT:DP	0/0:41
    CH1	247	.	T	.	91.53	.	AN=2;DP=110;MQ=43.76;MQ0=29	GT:DP	0/0:46
    CH1	248	.	A	.	172.53	.	AN=2;DP=116;MQ=42.61;MQ0=31	GT:DP	0/0:100
    CH1	249	.	A	.	172.53	.	AN=2;DP=116;MQ=42.61;MQ0=31	GT:DP	0/0:100
    CH1	250	.	T	.	172.53	.	AN=2;DP=117;MQ=42.43;MQ0=32	GT:DP	0/0:101
    CH1	251	.	T	.	169.53	.	AN=2;DP=117;MQ=42.43;MQ0=32	GT:DP	0/0:96
    CH1	251	.	T	.	118.27	.	AN=2;DP=117;MQ=42.43;MQ0=32	GT:AD:DP	0/0:27:56
    CH1	252	.	C	.	172.53	.	AN=2;DP=117;MQ=42.43;MQ0=32	GT:DP	0/0:113
    CH1	253	.	C	.	172.53	.	AN=2;DP=117;MQ=42.43;MQ0=32	GT:DP	0/0:110
    CH1	254	.	T	.	172.53	.	AN=2;DP=117;MQ=42.43;MQ0=32	GT:DP	0/0:111
    CH1	255	.	T	.	172.53	.	AN=2;DP=117;MQ=42.43;MQ0=32	GT:DP	0/0:111
    CH1	256	.	T	.	172.53	.	AN=2;DP=117;MQ=42.43;MQ0=32	GT:DP	0/0:111
    Line 1 is a SNP
    Lines 14 and 15 are an indel that I do understand
    Lines 19 and 20 I do /not/ understand
    Lines 21 and 22 I do /not/ understand
    ---------------------------------------------------

  • #2
    Darren, maybe you should re-post with the lines numbered. You can use the Unix "nl" command to do this.

    Comment

    Latest Articles

    Collapse

    • seqadmin
      Understanding Genetic Influence on Infectious Disease
      by seqadmin




      During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

      Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
      09-09-2024, 10:59 AM
    • seqadmin
      Addressing Off-Target Effects in CRISPR Technologies
      by seqadmin






      The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...
      08-27-2024, 04:44 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Topics Statistics Last Post
    Started by seqadmin, Today, 06:25 AM
    0 responses
    13 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, Yesterday, 01:02 PM
    0 responses
    12 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 09-18-2024, 06:39 AM
    0 responses
    14 views
    0 likes
    Last Post seqadmin  
    Started by seqadmin, 09-11-2024, 02:44 PM
    0 responses
    14 views
    0 likes
    Last Post seqadmin  
    Working...
    X