Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • najoshi
    Member
    • Feb 2010
    • 20

    snpeff effect annotation confusion

    So I am trying to figure out why this particular complex variant was annotated with an effect of "start_lost" when I don't see any evidence of the start codon being changed. Here is the full vcf line:

    Code:
    chr1	20717668	.	ACGGCAGCGCACGACAACCTTCACCGCCTCGGAGGCCATGGCGCCGCGCCCAGGACCAA	GCGGCAGCGCACGACAACCTTCACCGCCTCGGAGGCCATGGCGCCGCGCCCAGGACCAC	566.598	.	AB=0.347826;ABP=12.2627;AC=5;AF=0.5;AN=10;AO=16;CIGAR=1X57M1X;DP=46;DPB=46.1525;DPRA=0;EPP=5.18177;EPPR=4.16842;GTI=0;LEN=59;MEANALT=1;MQM=60;MQMR=60;NS=5;NUMALT=1;ODDS=5.38955;PAIRED=0.5625;PAIREDR=0.466667;PAO=9;PQA=333;PQR=0;PRO=0;QA=586;QR=1103;RO=30;RPL=13;RPP=16.582;RPPR=31.9633;RPR=3;RUN=1;SAF=7;SAP=3.55317;SAR=9;SRF=8;SRP=17.1973;SRR=22;TYPE=complex;ANN=GCGGCAGCGCACGACAACCTTCACCGCCTCGGAGGCCATGGCGCCGCGCCCAGGACCAC|start_lost|HIGH|KIF17|KIF17|transcript|NM_020816.4|protein_coding|1/15|c.-20_39delTTGGTCCTGGGCGCGGCGCCATGGCCTCCGAGGCGGTGAAGGTTGTCGTGCGCTGCCGTinsGTGGTCCTGGGCGCGGCGCCATGGCCTCCGAGGCGGTGAAGGTTGTCGTGCGCTGCCGC|p.MetAlaSerGluAlaValLysValValValArgCysArg1?|340/3961|1/3090|1/1029||,GCGGCAGCGCACGACAACCTTCACCGCCTCGGAGGCCATGGCGCCGCGCCCAGGACCAC|start_lost|HIGH|KIF17|KIF17|transcript|NM_001122819.3|protein_coding|1/15|c.-20_39delTTGGTCCTGGGCGCGGCGCCATGGCCTCCGAGGCGGTGAAGGTTGTCGTGCGCTGCCGTinsGTGGTCCTGGGCGCGGCGCCATGGCCTCCGAGGCGGTGAAGGTTGTCGTGCGCTGCCGC|p.MetAlaSerGluAlaValLysValValValArgCysArg1?|340/3958|1/3087|1/1028||,GCGGCAGCGCACGACAACCTTCACCGCCTCGGAGGCCATGGCGCCGCGCCCAGGACCAC|5_prime_UTR_variant|MODIFIER|KIF17|KIF17|transcript|NM_020816.4|protein_coding|1/15|c.-20_39delTTGGTCCTGGGCGCGGCGCCATGGCCTCCGAGGCGGTGAAGGTTGTCGTGCGCTGCCGTinsGTGGTCCTGGGCGCGGCGCCATGGCCTCCGAGGCGGTGAAGGTTGTCGTGCGCTGCCGC||||||,GCGGCAGCGCACGACAACCTTCACCGCCTCGGAGGCCATGGCGCCGCGCCCAGGACCAC|5_prime_UTR_variant|MODIFIER|KIF17|KIF17|transcript|NM_001122819.3|protein_coding|1/15|c.-20_39delTTGGTCCTGGGCGCGGCGCCATGGCCTCCGAGGCGGTGAAGGTTGTCGTGCGCTGCCGTinsGTGGTCCTGGGCGCGGCGCCATGGCCTCCGAGGCGGTGAAGGTTGTCGTGCGCTGCCGC||||||,GCGGCAGCGCACGACAACCTTCACCGCCTCGGAGGCCATGGCGCCGCGCCCAGGACCAC|5_prime_UTR_variant|MODIFIER|KIF17|KIF17|transcript|NM_001287212.2|protein_coding|1/15|c.-447_-389delTTGGTCCTGGGCGCGGCGCCATGGCCTCCGAGGCGGTGAAGGTTGTCGTGCGCTGCCGTinsGTGGTCCTGGGCGCGGCGCCATGGCCTCCGAGGCGGTGAAGGTTGTCGTGCGCTGCCGC|||||2098|,GCGGCAGCGCACGACAACCTTCACCGCCTCGGAGGCCATGGCGCCGCGCCCAGGACCAC|downstream_gene_variant|MODIFIER|SH2D5|SH2D5|transcript|NM_001103161.2|protein_coding||c.*4066_*4124delTTGGTCCTGGGCGCGGCGCCATGGCCTCCGAGGCGGTGAAGGTTGTCGTGCGCTGCCGTinsGTGGTCCTGGGCGCGGCGCCATGGCCTCCGAGGCGGTGAAGGTTGTCGTGCGCTGCCGC|||||2063|,GCGGCAGCGCACGACAACCTTCACCGCCTCGGAGGCCATGGCGCCGCGCCCAGGACCAC|downstream_gene_variant|MODIFIER|SH2D5|SH2D5|transcript|XM_011541459.2|protein_coding||c.*4066_*4124delTTGGTCCTGGGCGCGGCGCCATGGCCTCCGAGGCGGTGAAGGTTGTCGTGCGCTGCCGTinsGTGGTCCTGGGCGCGGCGCCATGGCCTCCGAGGCGGTGAAGGTTGTCGTGCGCTGCCGC|||||2063|,GCGGCAGCGCACGACAACCTTCACCGCCTCGGAGGCCATGGCGCCGCGCCCAGGACCAC|downstream_gene_variant|MODIFIER|SH2D5|SH2D5|transcript|XM_011541460.2|protein_coding||c.*4066_*4124delTTGGTCCTGGGCGCGGCGCCATGGCCTCCGAGGCGGTGAAGGTTGTCGTGCGCTGCCGTinsGTGGTCCTGGGCGCGGCGCCATGGCCTCCGAGGCGGTGAAGGTTGTCGTGCGCTGCCGC|||||2063|,GCGGCAGCGCACGACAACCTTCACCGCCTCGGAGGCCATGGCGCCGCGCCCAGGACCAC|downstream_gene_variant|MODIFIER|SH2D5|SH2D5|transcript|XM_011541462.2|protein_coding||c.*4066_*4124delTTGGTCCTGGGCGCGGCGCCATGGCCTCCGAGGCGGTGAAGGTTGTCGTGCGCTGCCGTinsGTGGTCCTGGGCGCGGCGCCATGGCCTCCGAGGCGGTGAAGGTTGTCGTGCGCTGCCGC|||||2063|,GCGGCAGCGCACGACAACCTTCACCGCCTCGGAGGCCATGGCGCCGCGCCCAGGACCAC|downstream_gene_variant|MODIFIER|SH2D5|SH2D5|transcript|XM_011541461.2|protein_coding||c.*4066_*4124delTTGGTCCTGGGCGCGGCGCCATGGCCTCCGAGGCGGTGAAGGTTGTCGTGCGCTGCCGTinsGTGGTCCTGGGCGCGGCGCCATGGCCTCCGAGGCGGTGAAGGTTGTCGTGCGCTGCCGC|||||2063|,GCGGCAGCGCACGACAACCTTCACCGCCTCGGAGGCCATGGCGCCGCGCCCAGGACCAC|downstream_gene_variant|MODIFIER|SH2D5|SH2D5|transcript|NM_001103160.2|protein_coding||c.*4066_*4124delTTGGTCCTGGGCGCGGCGCCATGGCCTCCGAGGCGGTGAAGGTTGTCGTGCGCTGCCGTinsGTGGTCCTGGGCGCGGCGCCATGGCCTCCGAGGCGGTGAAGGTTGTCGTGCGCTGCCGC|||||2063|,GCGGCAGCGCACGACAACCTTCACCGCCTCGGAGGCCATGGCGCCGCGCCCAGGACCAC|intron_variant|MODIFIER|LOC107985528|LOC107985528|transcript|unknown_transcript_1|protein_coding|987/1510|c.167449+904178_167449+904236delTTGGTCCTGGGCGCGGCGCCATGGCCTCCGAGGCGGTGAAGGTTGTCGTGCGCTGCCGTinsGTGGTCCTGGGCGCGGCGCCATGGCCTCCGAGGCGGTGAAGGTTGTCGTGCGCTGCCGC||||||WARNING_TRANSCRIPT_MULTIPLE_STOP_CODONS;LOF=(KIF17|KIF17|3|0.67)	GT:DP:AD:RO:QR:AO:QA:GL	0/1:4:3,1:3:111:1:37:-2.49371,0,-9.14896	0/1:9:6,3:6:221:3:109:-13.5154,0,-16.9355	0/1:11:6,5:6:223:5:182:-16.4518,0,-16.8162	0/1:12:8,4:8:294:4:149:-22.2684,0,-21.995	0/1:10:7,3:7:254:3:109:-13.2144,0,-19.5986
    The first annotation looks like this:

    Code:
    GCGGCAGCGCACGACAACCTTCACCGCCTCGGAGGCCATGGCGCCGCGCCCAGGACCAC|start_lost|HIGH|KIF17|KIF17|transcript|NM_020816.4|protein_coding|1/15|c.-20_39delTTGGTCCTGGGCGCGGCGCCATGGCCTCCGAGGCGGTGAAGGTTGTCGTGCGCTGCCGTinsGTGGTCCTGGGCGCGGCGCCATGGCCTCCGAGGCGGTGAAGGTTGTCGTGCGCTGCCGC|p.MetAlaSerGluAlaValLysValValValArgCysArg1?|340/3961|1/3090|1/1029||
    For some reason it is calling it a deletion & insertion instead of just two separate SNPs. Looking at the two sequences:

    Code:
    TTGGTCCTGGGCGCGGCGCCATGGCCTCCGAGGCGGTGAAGGTTGTCGTGCGCTGCCGT
    GTGGTCCTGGGCGCGGCGCCATGGCCTCCGAGGCGGTGAAGGTTGTCGTGCGCTGCCGC
    you can see that all of the bases are the same except for the first and last bases. The start codon is in the middle, i.e. this variant crosses the 5'UTR to exon 1 boundary. However, the start codon does not change, so why is it being annotated as a "start_lost"?

Latest Articles

Collapse

  • SEQadmin2
    Nine Things a Sample Prep Scientist Thinks About Before Sequencing
    by SEQadmin2


    I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


    Here are nine questions we think about, in roughly the order they matter, before...
    06-18-2026, 07:11 AM
  • SEQadmin2
    From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
    by SEQadmin2


    Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


    The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
    ...
    06-02-2026, 10:05 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by SEQadmin2, 06-17-2026, 06:09 AM
0 responses
30 views
0 reactions
Last Post SEQadmin2  
Started by SEQadmin2, 06-09-2026, 11:58 AM
0 responses
44 views
0 reactions
Last Post SEQadmin2  
Started by SEQadmin2, 06-05-2026, 10:09 AM
0 responses
48 views
0 reactions
Last Post SEQadmin2  
Started by SEQadmin2, 06-04-2026, 08:59 AM
0 responses
50 views
0 reactions
Last Post SEQadmin2  
Working...