Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Insertions follow reference in vcf file

    Hi,
    I'm working with gvcf files, containing also non-variant positions. I have noticed some insertions that I'm not able to correctly interpret or handle. An example (simplified):
    Code:
    NC_013991.2 1403 . G GAA 176.32 . GT:AD:DP:GQ:PGT:PID:PL:PS ./. 0/0 0/0 ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. 1|1 0/0 ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. 0/0 ./. ./. 1/1 ./. ./. 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 ./. 0/0 0/0 0/0 0/0 ./. 0/0 ./. 0/0 ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./.
    NC_013991.2 1404 . A . . . GT:AD:DP:RGQ ./. 0/0 0/0 0/0 ./. ./. ./. ./. ./. ./. ./. 0/0 ./. ./. ./. ./. 0/0 ./. 0/0 0/0 0/0 ./. ./. ./. ./. ./. ./. 0/0 ./. ./. ./. ./. ./. ./. 0/0 0/0 0/0 0/0 ./. ./. 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 ./. 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 ./. 0/0 0/0 0/0 0/0 ./. ./. ./. 0/0 ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. 0/0 ./. ./. ./.
    NC_013991.2 1405 . A . . . GT:AD:DP:RGQ ./. 0/0 0/0 0/0 ./. ./. ./. ./. ./. ./. ./. 0/0 ./. ./. ./. ./. 0/0 ./. 0/0 0/0 0/0 ./. ./. ./. ./. ./. ./. 0/0 ./. ./. ./. ./. ./. ./. 0/0 0/0 0/0 0/0 ./. ./. 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 ./. 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 0/0 ./. 0/0 0/0 0/0 0/0 ./. ./. ./. 0/0 ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. ./. 0/0 ./. ./. ./.
    Here, the reference sequence is "GAA". Why, at site 1403, do I get an alternative allele "GAA"? There seem to be two samples at this position that are "1/1", thus homozygous for GAA. But they are also homozygous for A at the next positions. What thus is their genotype ? GAAAA, thus containing an insertion of AA?
    I've seen multiple cases like this, also with longer alternative alleles, and each time the alternative allele sequence exactly recapitulates the reference sequence. This seems like an error to me, but how to correct?
    The mapping was done either with bwa aln or bwa mem, and we've observed this behaviour both when using GATK4 and bcftools for genotyping.

    I would be very grateful if someone could clarify this.
    Thanks,
    Jos

Latest Articles

Collapse

  • seqadmin
    Advanced Methods for the Detection of Infectious Disease
    by seqadmin




    The recent pandemic caused worldwide health, economic, and social disruptions with its reverberations still felt today. A key takeaway from this event is the need for accurate and accessible tools for detecting and tracking infectious diseases. Timely identification is essential for early intervention, managing outbreaks, and preventing their spread. This article reviews several valuable tools employed in the detection and surveillance of infectious diseases.
    ...
    11-27-2023, 01:15 PM
  • seqadmin
    Strategies for Investigating the Microbiome
    by seqadmin




    Microbiome research has led to the discovery of important connections to human and environmental health. Sequencing has become a core investigational tool in microbiome research, a subject that we covered during a recent webinar. Our expert speakers shared a number of advancements including improved experimental workflows, research involving transmission dynamics, and invaluable analysis resources. This article recaps their informative presentations, offering insights...
    11-09-2023, 07:02 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 08:23 AM
0 responses
7 views
0 likes
Last Post seqadmin  
Started by seqadmin, 12-01-2023, 09:55 AM
0 responses
21 views
0 likes
Last Post seqadmin  
Started by seqadmin, 11-30-2023, 10:48 AM
0 responses
20 views
0 likes
Last Post seqadmin  
Started by seqadmin, 11-29-2023, 08:26 AM
0 responses
15 views
0 likes
Last Post seqadmin  
Working...
X