Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • a11msp
    Member
    • Jun 2010
    • 26

    GATK error with a VCF including missing genotypes

    Hello!

    UPDATE: solved - was not to do with missing genotypes, but rather with triallelic SNPs that the tool cannot process.

    I'm trying to phase my genotypes based on the data for 8 trios of a sequenced organism using PhaseByTransmission, which fails with messages as listed below.

    $ java -Xmx20g -jar ~/n/GenomeAnalysisTK-1.5-31-gadad76b/GenomeAnalysisTK.jar -R mygenome_sorted.fa -T PhaseByTransmission -V all_sorted_withHeader.vcf -ped Sample_info.ped -o all_phased_by_transmission.vcf

    INFO 20:54:02,919 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO 20:54:02,920 HelpFormatter - Please view our documentation at http://www.broadinstitute.org/gsa/wiki
    INFO 20:54:02,920 HelpFormatter - For support, please view our support site at http://getsatisfaction.com/gsa
    INFO 20:54:02,920 HelpFormatter - Program Args: -R mygenome_sorted.fa -T PhaseByTransmission -V all_sorted_withHeader.vcf -ped Sample_info.ped -o all_phased_by_transmission.vcf
    INFO 20:54:02,921 HelpFormatter - Date/Time: 2012/04/22 20:54:02
    INFO 20:54:02,921 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 20:54:02,921 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 20:54:02,995 RodBindingArgumentTypeDescriptor - Dynamically determined type of all_sorted_withHeader.vcf to be VCF
    INFO 20:54:03,002 GenomeAnalysisEngine - Strictness is SILENT
    INFO 20:54:04,551 RMDTrackBuilder - Creating Tribble index in memory for file all_sorted_withHeader.vcf
    INFO 20:56:55,532 RMDTrackBuilder - Writing Tribble index to disk for file all_sorted_withHeader.vcf.idx
    INFO 20:57:00,377 PedReader - Reading PED file Sample_info.ped with missing fields: []
    INFO 20:57:00,489 PedReader - Phenotype is other? false
    INFO 20:57:01,371 TraversalEngine - [INITIALIZATION COMPLETE; TRAVERSAL STARTING]
    INFO 20:57:01,371 TraversalEngine - Location processed.sites runtime per.1M.sites completed total.runtime remaining
    INFO 20:57:04,604 GATKRunReport - Uploaded run statistics report to AWS S3
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR stack trace
    java.lang.NumberFormatException: For input string: "."
    at java.lang.NumberFormatException.forInputString(NumberFormatException.java:48)
    at java.lang.Integer.parseInt(Integer.java:449)
    at java.lang.Integer.parseInt(Integer.java:499)
    at org.broadinstitute.sting.utils.variantcontext.GenotypeLikelihoods.parsePLsIntoLikelihoods(GenotypeLikelihoods.java:153)
    at org.broadinstitute.sting.utils.variantcontext.GenotypeLikelihoods.getAsVector(GenotypeLikelihoods.java:80)
    at org.broadinstitute.sting.utils.variantcontext.GenotypeLikelihoods.getAsMap(GenotypeLikelihoods.java:105)
    at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.getLikelihoodsAsMapSafeNull(PhaseByTransmission.java:519)
    at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.phaseTrioGenotypes(PhaseByTransmission.java:562)
    at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.map(PhaseByTransmission.java:762)
    at org.broadinstitute.sting.gatk.walkers.phasing.PhaseByTransmission.map(PhaseByTransmission.java:74)
    at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:78)
    at org.broadinstitute.sting.gatk.traversals.TraverseLoci.traverse(TraverseLoci.java:18)
    at org.broadinstitute.sting.gatk.executive.LinearMicroScheduler.execute(LinearMicroScheduler.java:63)
    at org.broadinstitute.sting.gatk.GenomeAnalysisEngine.execute(GenomeAnalysisEngine.java:246)
    at org.broadinstitute.sting.gatk.CommandLineExecutable.execute(CommandLineExecutable.java:128)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:236)
    at org.broadinstitute.sting.commandline.CommandLineProgram.start(CommandLineProgram.java:146)
    at org.broadinstitute.sting.gatk.CommandLineGATK.main(CommandLineGATK.java:92)
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A GATK RUNTIME ERROR has occurred (version 1.5-31-gadad76b):
    ##### ERROR
    ##### ERROR Please visit the wiki to see if this is a known problem
    ##### ERROR If not, please post the error, with stack trace, to the GATK forum
    ##### ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
    ##### ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
    ##### ERROR
    ##### ERROR MESSAGE: For input string: "."
    ##### ERROR ------------------------------------------------------------------------------------------

    I suspect this is because of missing genotypes in lines like this:

    #CHROM POS ID REF ALT QUAL FILTER INFO FORMAT ERS074168 ERS074167 ERS074166 ERS074171 ERS074170 ERS074169 ERS074174 ERS074173 ERS074172 ERS074177 ERS074176 ERS074175 ERS074180 ERS074179 ERS074178 ERS074183 ERS074182 ERS074181 ERS074186 ERS074185 ERS074184 ERS074189 ERS074188 ERS074187
    1 160248 . T C 40.00 . AC1=6;AC=6;AF1=1;AN=6;DP4=0,0,3,0;DP=6;FQ=-28.1;MQ=29;SF=2;VDB=0.0046 GT:GQ:PL . . . . 1/1:3:0,0,0 1/1:3:0,0,0 1/1:10:72,9,0 . . . . . . . . . . . . . . .

    Is there a different way I should format missing genotypes?

    Many thanks!
    Last edited by a11msp; 04-23-2012, 07:12 AM.

Latest Articles

Collapse

  • SEQadmin2
    Nine Things a Sample Prep Scientist Thinks About Before Sequencing
    by SEQadmin2


    I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


    Here are nine questions we think about, in roughly the order they matter, before...
    Today, 07:11 AM
  • SEQadmin2
    From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
    by SEQadmin2


    Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


    The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
    ...
    06-02-2026, 10:05 AM
  • SEQadmin2
    Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
    by SEQadmin2


    With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


    Introduction

    Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
    05-22-2026, 06:42 AM

ad_right_rmr

Collapse

News

Collapse

Topics Statistics Last Post
Started by SEQadmin2, Yesterday, 06:09 AM
0 responses
16 views
0 reactions
Last Post SEQadmin2  
Started by SEQadmin2, 06-09-2026, 11:58 AM
0 responses
34 views
0 reactions
Last Post SEQadmin2  
Started by SEQadmin2, 06-05-2026, 10:09 AM
0 responses
41 views
0 reactions
Last Post SEQadmin2  
Started by SEQadmin2, 06-04-2026, 08:59 AM
0 responses
48 views
0 reactions
Last Post SEQadmin2  
Working...