Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • [SNP calling, GATK] this contig isn't present in the Fasta sequence dictionar

    Hi all

    I've tried snp calling using GATK unifiedgenotyper but got the error message - it seems that there's a problem in fasta sequence dictionary. I am currently following <Exome analysis> in How-to Wiki section (http://seqanswers.com/wiki/How-to/ex..._recalibration) However, it seems that some of procedures does not use the latest program. So, I changed few things for those.

    Please give any advice, it will be really appreciate.

    Here is command
    $ java -Xmx4g -jar /usr/local/bin/gatk/GenomeAnalysisTK-1.4-30-gf2ef8d1/GenomeAnalysisTK.jar -glm BOTH -R hg19.fa -T UnifiedGenotyper -I input.marked.realigned.fixed.recal.bam -D dbsnp132.txt -o snps.vcf -metrics snps.metrics -stand_call_conf 50.0 -stand_emit_conf 10.0 -dcov 1000 -A DepthOfCoverage -A AlleleBalance -L target_intervals.bed
    and error message
    INFO 20:44:34,610 RodBindingArgumentTypeDescriptor - Dynamically determined type of target_intervals.bed to be BED
    INFO 20:44:34,647 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 20:44:34,648 HelpFormatter - The Genome Analysis Toolkit (GATK) v1.4-30-gf2ef8d1, Compiled 2012/02/17 20:18:04
    INFO 20:44:34,648 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO 20:44:34,648 HelpFormatter - Please view our documentation at http://www.broadinstitute.org/gsa/wiki
    INFO 20:44:34,648 HelpFormatter - For support, please view our support site at http://getsatisfaction.com/gsa
    INFO 20:44:34,649 HelpFormatter - Program Args: -glm BOTH -R hg19.fa -T UnifiedGenotyper -I input.marked.realigned.fixed.recal.bam -D dbsnp132.txt -o snps.vcf -metrics snps.metrics -stand_call_conf 50.0 -stand_emit_conf 10.0 -dcov 1000 -A DepthOfCoverage -A AlleleBalance -L target_intervals.bed
    INFO 20:44:34,649 HelpFormatter - Date/Time: 2012/02/24 20:44:34
    INFO 20:44:34,649 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 20:44:34,649 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 20:44:34,669 RodBindingArgumentTypeDescriptor - Dynamically determined type of dbsnp132.txt to be VCF
    INFO 20:44:34,682 GenomeAnalysisEngine - Strictness is SILENT
    INFO 20:44:34,737 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
    INFO 20:44:34,752 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.01
    INFO 20:44:34,762 RMDTrackBuilder - Loading Tribble index from disk for file dbsnp132.txt
    INFO 20:44:38,171 GATKRunReport - Uploaded run statistics report to AWS S3
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A USER ERROR has occurred (version 1.4-30-gf2ef8d1):
    ##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
    ##### ERROR Please do not post this error to the GATK forum
    ##### ERROR
    ##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
    ##### ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
    ##### ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
    ##### ERROR
    ##### ERROR MESSAGE: Badly formed genome loc: Contig chr6_apd_hap1 given as location, but this contig isn't present in the Fasta sequence dictionary
    ##### ERROR ------------------------------------------------------------------------------------------

  • #2
    Originally posted by sehrrot View Post
    ...
    ##### ERROR MESSAGE: Badly formed genome loc: Contig chr6_apd_hap1 given as location, but this contig isn't present in the Fasta sequence dictionary
    Does chr6_apd_hap1 appear in your reference file (hg19.fa)? If so, try regenerating the dictionary file. You can use picard's CreateSequenceDictionary.jar, or, probably, just delete hg19.dict and gatk will automatically create it.

    If chr6_apd_hap1 isn't in your reference file, then you must have used a difference reference for mapping your reads. Make sure to use the same one in calling variants as used in mapping.

    Comment


    • #3
      The other possible problem can be the hg19.fa.fai file. If it does not contain the contig name, GATK won't produce proper hg19.dict file.

      Comment


      • #4
        Hi,

        the wiki was written ages ago and its no surprise that its pretty much out of date now. I'm even surprised that you managed to reach there. Anyways, a quick work-around for this problem would be to edit the .bed file that you downloaded from UCSC using sed e.g.

        Code:
        sed '/chr6_apd_hap1/d' target_intervals.bed > target_intervals2.bed
        mv target_intervals2.bed target_intervals.bed
        Last edited by ndbrown6; 05-30-2012, 10:24 PM.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Essential Discoveries and Tools in Epitranscriptomics
          by seqadmin




          The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
          04-22-2024, 07:01 AM
        • seqadmin
          Current Approaches to Protein Sequencing
          by seqadmin


          Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
          04-04-2024, 04:25 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, 04-11-2024, 12:08 PM
        0 responses
        59 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 10:19 PM
        0 responses
        57 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-10-2024, 09:21 AM
        0 responses
        51 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 04-04-2024, 09:00 AM
        0 responses
        55 views
        0 likes
        Last Post seqadmin  
        Working...
        X