alignment view and vcf disagree

Mark

Member

Join Date: Nov 2008

Posts: 51
- Share
- Tweet
#1

alignment view and vcf disagree

09-28-2020, 05:53 AM

I have a pacbio CCS fastq file that I have aligned to a small (5037 bp) reference which is part of a sequenced BAC clone. I have aligned this data with multiple aligners (minimap2 and smalt), called variants with multiple variant callers (bbmap, freebayes, and gatk) and viewed the bam file with multiple viewers (igv and tablet). All produce similar results. I will focus here on minimap2 and callvariants (bbmap), as the problem likely stems from the variant caller and seqanswers is the preferred avenue for support for bbmap.
Here is what I did:

Code:

minimap2 -t 5 -ax asm20 reference.fa sample.ccs.fastq > out.sam samtools view -O BAM -o out.bam out.sam && samtools sort -o out.sort.bam out.bam && samtools index out.sort.bam callvariants.sh ploidy=1 in=out.sort.bam ref=reference.fa vcf=out.vcf

A partial reading of callvariants log output:

Code:

1 of 3311 variants passed primary filters (0.0302%). Type Count Rate AD Depth AF Score Qual Substitutions: 0 0.0% 0.0 0.0 0.000 0.0 0.0 Deletions: 1 100.0% 619.0 1000.0 0.619 23.0 34.0 Insertions: 0 0.0% 0.0 0.0 0.000 0.0 0.0 Variation Rate: 1/5037 Homozygous: 1 100.0%

The single called variant (a deletion at the site of a homopolymer) is reported in the vcf.

Code:

#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT out.sort reference 1401 . Tc T 23.66 PASS SN=0;STA=1401;STO=1402;TYP=DEL;R1P=338;R1M=281;R2P=0;R2M=0;AD=619;DP=1000;MCOV=-1;PPC=0;AF=0.6190;RAF=0.6190;LS=2996497;MQS=37140;MQM=60;BQS=21455;BQM=41;EDS=956749;EDM=4467;IDS=617766;IDM=999;NVC=0;FLG=0;CED=601;HMP=6;SB=0.9997 GT:DP:AD:AF:RAF:NVC:FLG:SB:SC:PF 1:1000:619:0.6190:0.6190:0:0:0.9997:23.66:PASS

To my interpretation of this, a single "c" has been lost from the poly-C which starts at this position. Moreover this deletions occurs in 619 of 1000 reads covering this position.
Now the strange part. When I view the alignment (out.sort.bam) in IGV (or Tablet), I see at most 1-2% indels at our near this site while the rest of the sequence is largely free of indels. Why is there this large discrepancy between what is observed in an alignment viewer and what is called by the variant caller (i.e. there are many more variant reads called than are observed on the viewer?

Thank you for your help

Last edited by Mark; 09-28-2020, 11:37 AM. Reason: Added [code] tags
Tags: bbmap, freebayes, igv, minimap2, smalt
Mark

Member

Join Date: Nov 2008

Posts: 51
- Share
- Tweet
#2

09-29-2020, 08:02 AM

Turns out the problem was, not surprisingly, in IGV. By default IGV will not show one base pair deletions for what it perceives as PacBio sequence. I had unchecked the "hide indels" box in the 'alignment' preferences but there is a equivalent checkbox in the ' Third Gen' preferences I had overlooked. Uncheck it and viola.
Comment

Previous template Next

Exploring the Dynamics of the Tumor Microenvironment

by seqadmin

The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
- Channel: Articles
07-08-2024, 03:19 PM

Topics	Statistics	Last Post
Gene Misexpression in the Healthy Human Population by seqadmin Started by seqadmin, Yesterday, 06:46 AM	0 responses 9 views 0 likes	Last Post by seqadmin Yesterday, 06:46 AM
New Method for Rapid Genetic Diagnosis of Mendelian Disorders by seqadmin Started by seqadmin, 07-24-2024, 11:09 AM	0 responses 26 views 0 likes	Last Post by seqadmin 07-24-2024, 11:09 AM
Advancing Nanopore Technology for Portable Sensing Devices by seqadmin Started by seqadmin, 07-19-2024, 07:20 AM	0 responses 159 views 0 likes	Last Post by seqadmin 07-19-2024, 07:20 AM
New RNA-Based Gene Writing Technology Achieves Precise Gene Integration by seqadmin Started by seqadmin, 07-16-2024, 05:49 AM	0 responses 127 views 0 likes	Last Post by seqadmin 07-16-2024, 05:49 AM

Seqanswers Leaderboard Ad

Announcement

alignment view and vcf disagree

Comment

Latest Articles

ad_right_rmr

News