Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Multi-sample VCF generated in GATK is not working with novoutil IUPAC

    Hi,

    We are trying to find SNPs in a few loci of interest in rhesus macaque. To that end, we have sequenced exome sequences and I've gone through the process of calling SNPs using GATK and I've come up with a multi-sample VCF file.

    Now we want to actually create alternative fasta references for each macaque at our loci of interest. I thought that Novocraft utility "novoutil iupac" would do exactly what I want it to do, which is (from the most recent version):
    Code:
    Novoutil iupac
          1. Add option ( -g ) to substitute genotype for the reference base. Without
             the -g option ambiguous bases are formed by combining the reference base
             and the genotype.
          2. Add option -i to substitute homozygous indel calls into the new reference.
             This is useful for reference guided assembly.
          3. Add option -q 99 that sets a lower quality limit (default 30) on usable VCF
             data lines.
          4. If the REF column of VCF file shows an N then IUPAC code is formed from the
             ALT alleles.
    However, when I run novoutil iupac with my VCF file and the reference genome, I get the following error:
    Code:
    # novoutil iupac (V2.08.02) - Build Jun 28 2012 @ 16:23:10
    # (C) 2008,2009,2010,2011 NovoCraft Technologies Sdn Bhd.
    Loading VCF file...
    novoutil: src/vcfsnps.cc:152: bool vcffile::hetero(tabbed&): Assertion `*gt == '|' || *gt == '/'' failed.
    So it clearly doesn't like something about my VCF file, which looks something like this:
    Code:
    ##fileformat=VCFv4.1
    ##FILTER=<ID=LowQual,Description="Low quality">
    ##FORMAT=<ID=AD,Number=.,Type=Integer,Description="Allelic depths for the ref and alt alleles in the order listed">
    ##FORMAT=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth (reads with MQ=255 or with bad mates are filtered)">
    ##FORMAT=<ID=GQ,Number=1,Type=Float,Description="Genotype Quality">
    ##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
    ##FORMAT=<ID=PL,Number=G,Type=Integer,Description="Normalized, Phred-scaled likelihoods for genotypes as defined in the VCF specification">
    ##INFO=<ID=AC,Number=A,Type=Integer,Description="Allele count in genotypes, for each ALT allele, in the same order as listed">
    ##INFO=<ID=AF,Number=A,Type=Float,Description="Allele Frequency, for each ALT allele, in the same order as listed">
    ##INFO=<ID=AN,Number=1,Type=Integer,Description="Total number of alleles in called genotypes">
    ##INFO=<ID=BaseQRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt Vs. Ref base qualities">
    ##INFO=<ID=DP,Number=1,Type=Integer,Description="Approximate read depth; some reads may have been filtered">
    ##INFO=<ID=DS,Number=0,Type=Flag,Description="Were any of the samples downsampled?">
    ##INFO=<ID=Dels,Number=1,Type=Float,Description="Fraction of Reads Containing Spanning Deletions">
    ##INFO=<ID=FS,Number=1,Type=Float,Description="Phred-scaled p-value using Fisher's exact test to detect strand bias">
    ##INFO=<ID=HRun,Number=1,Type=Integer,Description="Largest Contiguous Homopolymer Run of Variant Allele In Either Direction">
    ##INFO=<ID=HaplotypeScore,Number=1,Type=Float,Description="Consistency of the site with at most two segregating haplotypes">
    ##INFO=<ID=InbreedingCoeff,Number=1,Type=Float,Description="Inbreeding coefficient as estimated from the genotype likelihoods per-sample when compared against the Hardy-Weinberg expectation">
    ##INFO=<ID=MQ,Number=1,Type=Float,Description="RMS Mapping Quality">
    ##INFO=<ID=MQ0,Number=1,Type=Integer,Description="Total Mapping Quality Zero Reads">
    ##INFO=<ID=MQRankSum,Number=1,Type=Float,Description="Z-score From Wilcoxon rank sum test of Alt vs. Ref read mapping qualities">
    ##INFO=<ID=QD,Number=1,Type=Float,Description="Variant Confidence/Quality by Depth">
    ##INFO=<ID=ReadPosRankSum,Number=1,Type=Float,Description="Z-score from Wilcoxon rank sum test of Alt vs. Ref read position bias">
    ##INFO=<ID=SB,Number=1,Type=Float,Description="Strand Bias">
    ##UnifiedGenotyper="analysis_type=UnifiedGenotyper input_file=[sorted_realigned_RG_20120712_unmasked_lane_1_index_3.bam, sorted_realigned_RG_20120712_unmasked_lane_1_index_8.bam, sorted_realigned_RG_20120712_unmasked_lane_2_index_1.bam, sorted_realigned_RG_20120712_unmasked_lane_2_index_2.bam, sorted_realigned_RG_20120712_unmasked_lane_5_index_6.bam, sorted_realigned_RG_20120712_unmasked_lane_5_index_9.bam, sorted_realigned_RG_20120712_unmasked_lane_6_index_4.bam, sorted_realigned_RG_20120712_unmasked_lane_6_index_5.bam, sorted_realigned_RG_20120712_unmasked_lane_3_index_7.bam] read_buffer_size=null phone_home=STANDARD gatk_key=null read_filter=[] intervals=null excludeIntervals=null interval_set_rule=UNION interval_merging=ALL reference_sequence=/vrc1_data/douek_lab/reference_sets/novoalign/alt_macaque/rhesus_TR_IG_unmasked.fa nonDeterministicRandomSeed=false downsampling_type=BY_SAMPLE downsample_to_fraction=null downsample_to_coverage=250 baq=OFF baqGapOpenPenalty=40.0 performanceLog=null useOriginalQualities=false BQSR=null quantize_quals=-1 defaultBaseQualities=-1 validation_strictness=SILENT unsafe=null num_threads=1 num_cpu_threads=null num_io_threads=null num_bam_file_handles=null read_group_black_list=null pedigree=[] pedigreeString=[] pedigreeValidationType=STRICT allow_intervals_with_unindexed_bam=false logging_level=INFO log_to_file=null help=false genotype_likelihoods_model=BOTH p_nonref_model=EXACT heterozygosity=0.0010 pcr_error_rate=1.0E-4 genotyping_mode=DISCOVERY output_mode=EMIT_VARIANTS_ONLY standard_min_confidence_threshold_for_calling=50.0 standard_min_confidence_threshold_for_emitting=30.0 noSLOD=false annotateNDA=false alleles=(RodBinding name= source=UNBOUND) min_base_quality_score=17 max_deletion_fraction=0.05 max_alternate_alleles=3 min_indel_count_for_genotyping=5 min_indel_fraction_per_sample=0.25 indel_heterozygosity=1.25E-4 indelGapContinuationPenalty=10 indelGapOpenPenalty=45 indelHaplotypeSize=80 noBandedIndel=false indelDebug=false ignoreSNPAlleles=false dbsnp=(RodBinding name= source=UNBOUND) comp=[] out=org.broadinstitute.sting.gatk.io.stubs.VCFWriterStub NO_HEADER=org.broadinstitute.sting.gatk.io.stubs.VCFWriterStub sites_only=org.broadinstitute.sting.gatk.io.stubs.VCFWriterStub debug_file=null metrics_file=null annotation=[] excludeAnnotation=[] filter_mismatching_base_and_quals=false"
    ##contig=<ID=IgH1,length=610659>
    ##contig=<ID=IgH2,length=260169>
    ##contig=<ID=IgK1,length=977354>
    ##contig=<ID=IgK2,length=4152696>
    ##contig=<ID=IgL1,length=20629>
    ##contig=<ID=IgL2,length=7376870>
    ##contig=<ID=IgL3,length=2230>
    ##contig=<ID=IgL4,length=7654437>
    ##contig=<ID=IgL5,length=8461>
    ##contig=<ID=IgL6,length=8824>
    ##contig=<ID=IgL7,length=1082>
    ##contig=<ID=IgL8,length=4740>
    ##contig=<ID=IgL9,length=13131495>
    ##contig=<ID=TRA1,length=23168555>
    ##contig=<ID=TRB1,length=19618912>
    ##contig=<ID=TRD1,length=1136629>
    ##contig=<ID=TRG1,length=17450015>
    ##reference=file:///vrc1_data/douek_lab/reference_sets/novoalign/alt_macaque/rhesus_TR_IG_unmasked.fa
    #CHROM	POS	ID	REF	ALT	QUAL	FILTER	INFO	FORMAT	monkey1_3	monkey1_8	monkey2_1	monkey2_2	monkey3_7	monkey5_6	monkey5_9	monkey6_4	monkey6_5
    TRG1	2855	.	G	T	75.71	PASS	AC=6;AF=1.00;AN=6;DP=3;Dels=0.00;FS=0.000;HRun=3;HaplotypeScore=0.0000;MQ=41.02;MQ0=0;QD=25.24;SB=-0.02	GT:AD:DP:GQ:PL	./.	1/1:0,1:1:3.01:34,3,0	./.	1/1:0,1:1:3.01:34,3,0	./.	./.	./.	./.	1/1:0,1:1:3.01:42,3,0
    TRG1	3864	.	A	G	42.75	LowQual	AC=4;AF=0.50;AN=8;BaseQRankSum=0.000;DP=4;Dels=0.00;FS=0.000;HRun=1;HaplotypeScore=0.0000;MQ=150.00;MQ0=0;MQRankSum=-1.026;QD=21.37;ReadPosRankSum=1.026;SB=2.99	GT:AD:DP:GQ:PL	./.	1/1:0,1:1:3.01:42,3,0	./.	1/1:0,1:1:3.01:42,3,0	0/0:1,0:1:3.01:0,3,41	./.	0/0:1,0:1:3.01:0,3,42	./.	./.
    TRG1	4614	.	T	C	110.43	PASS	AC=7;AF=0.70;AN=10;BaseQRankSum=0.000;DP=6;Dels=0.00;FS=0.000;HRun=2;HaplotypeScore=0.0000;MQ=101.00;MQ0=0;MQRankSum=1.380;QD=22.09;ReadPosRankSum=0.000;SB=-30.01	GT:AD:DP:GQ:PL	./.	0/1:1,1:2:32.56:35,0,33	1/1:0,1:1:3.01:41,3,0	./.	./.	0/0:1,0:1:3.01:0,3,34	1/1:0,1:1:3.01:37,3,0	./.	1/1:0,1:1:3.01:42,3,0
    TRG1	4628	.	C	T	34.82	LowQual	AC=3;AF=0.75;AN=4;BaseQRankSum=-0.736;DP=3;Dels=0.00;FS=0.000;HRun=2;HaplotypeScore=0.0000;MQ=51.96;MQ0=0;MQRankSum=0.736;QD=11.61;ReadPosRankSum=0.736;SB=-29.60	GT:AD:DP:GQ:PL	./.	0/1:1,1:2:35.21:35,0,35	./.	./.	./.	1/1:0,1:1:3.01:34,3,0	./.	./.	./.
    TRG1	5257	.	T	A,G	75.47	PASS	AC=3,2;AF=0.38,0.25;AN=8;BaseQRankSum=0.622;DP=114;Dels=0.00;FS=0.000;HaplotypeScore=0.1835;MQ=13.79;MQ0=4;MQRankSum=-2.795;QD=1.54;ReadPosRankSum=-0.487;SB=-24.83	GT:AD:DP:GQ:PL	./.	1/1:2,4,14:21:3.01:34,3,0,34,3,34	./.	0/0:2,2,10:14:3.01:0,3,34,3,34,34	0/1:1,5,14:20:25.32:73,0,25,76,31,107	2/2:2,3,3:8:3.01:42,42,42,3,3,0	./.	./.	./.
    TRG1	5268	.	C	T	61.81	PASS	AC=3;AF=0.30;AN=10;BaseQRankSum=-0.431;DP=99;Dels=0.00;FS=0.000;HRun=1;HaplotypeScore=0.5441;MQ=14.90;MQ0=2;MQRankSum=2.302;QD=1.63;ReadPosRankSum=0.825;SB=-24.38	GT:AD:DP:GQ:PL	0/0:11,1:12:3:0,3,25	1/1:16,1:18:3.01:34,3,0	./.	0/0:9,0:10:3.01:0,3,34	0/1:17,3:20:25.32:69,0,25	0/0:8,0:8:3.01:0,3,42	./.	./.	./.
    TRG1	5275	.	G	C	62.34	PASS	AC=3;AF=0.30;AN=10;BaseQRankSum=-0.945;DP=56;Dels=0.00;FS=0.000;HRun=0;HaplotypeScore=1.0882;MQ=19.71;MQ0=1;MQRankSum=2.439;QD=2.97;ReadPosRankSum=0.154;SB=-24.38	GT:AD:DP:GQ:PL	0/0:6,1:8:3:0,3,25	1/1:9,1:11:3.01:34,3,0	./.	0/0:9,0:9:3.01:0,3,34	0/1:8,2:10:25.32:70,0,25	0/0:5,0:5:3.01:0,3,41	./.	./.	./.
    Does anyone have any experience with this sort of issue?

    Thanks,
    Sam

Latest Articles

Collapse

  • seqadmin
    The Impact of AI in Genomic Medicine
    by seqadmin



    Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
    02-26-2024, 02:07 PM
  • seqadmin
    Multiomics Techniques Advancing Disease Research
    by seqadmin


    New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

    A major leap in the field has
    ...
    02-08-2024, 06:33 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by seqadmin, Yesterday, 06:12 AM
0 responses
17 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-23-2024, 04:11 PM
0 responses
67 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-21-2024, 08:52 AM
0 responses
73 views
0 likes
Last Post seqadmin  
Started by seqadmin, 02-20-2024, 08:57 AM
0 responses
62 views
0 likes
Last Post seqadmin  
Working...
X