Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • [SNP calling, GATK] this contig isn't present in the Fasta sequence dictionar

    Hi all

    I've tried snp calling using GATK unifiedgenotyper but got the error message - it seems that there's a problem in fasta sequence dictionary. I am currently following <Exome analysis> in How-to Wiki section (http://seqanswers.com/wiki/How-to/ex..._recalibration) However, it seems that some of procedures does not use the latest program. So, I changed few things for those.

    Please give any advice, it will be really appreciate.

    Here is command
    $ java -Xmx4g -jar /usr/local/bin/gatk/GenomeAnalysisTK-1.4-30-gf2ef8d1/GenomeAnalysisTK.jar -glm BOTH -R hg19.fa -T UnifiedGenotyper -I input.marked.realigned.fixed.recal.bam -D dbsnp132.txt -o snps.vcf -metrics snps.metrics -stand_call_conf 50.0 -stand_emit_conf 10.0 -dcov 1000 -A DepthOfCoverage -A AlleleBalance -L target_intervals.bed
    and error message
    INFO 20:44:34,610 RodBindingArgumentTypeDescriptor - Dynamically determined type of target_intervals.bed to be BED
    INFO 20:44:34,647 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 20:44:34,648 HelpFormatter - The Genome Analysis Toolkit (GATK) v1.4-30-gf2ef8d1, Compiled 2012/02/17 20:18:04
    INFO 20:44:34,648 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO 20:44:34,648 HelpFormatter - Please view our documentation at http://www.broadinstitute.org/gsa/wiki
    INFO 20:44:34,648 HelpFormatter - For support, please view our support site at http://getsatisfaction.com/gsa
    INFO 20:44:34,649 HelpFormatter - Program Args: -glm BOTH -R hg19.fa -T UnifiedGenotyper -I input.marked.realigned.fixed.recal.bam -D dbsnp132.txt -o snps.vcf -metrics snps.metrics -stand_call_conf 50.0 -stand_emit_conf 10.0 -dcov 1000 -A DepthOfCoverage -A AlleleBalance -L target_intervals.bed
    INFO 20:44:34,649 HelpFormatter - Date/Time: 2012/02/24 20:44:34
    INFO 20:44:34,649 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 20:44:34,649 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 20:44:34,669 RodBindingArgumentTypeDescriptor - Dynamically determined type of dbsnp132.txt to be VCF
    INFO 20:44:34,682 GenomeAnalysisEngine - Strictness is SILENT
    INFO 20:44:34,737 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
    INFO 20:44:34,752 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.01
    INFO 20:44:34,762 RMDTrackBuilder - Loading Tribble index from disk for file dbsnp132.txt
    INFO 20:44:38,171 GATKRunReport - Uploaded run statistics report to AWS S3
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A USER ERROR has occurred (version 1.4-30-gf2ef8d1):
    ##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
    ##### ERROR Please do not post this error to the GATK forum
    ##### ERROR
    ##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
    ##### ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
    ##### ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
    ##### ERROR
    ##### ERROR MESSAGE: Badly formed genome loc: Contig chr6_apd_hap1 given as location, but this contig isn't present in the Fasta sequence dictionary
    ##### ERROR ------------------------------------------------------------------------------------------

  • #2
    Originally posted by sehrrot View Post
    ...
    ##### ERROR MESSAGE: Badly formed genome loc: Contig chr6_apd_hap1 given as location, but this contig isn't present in the Fasta sequence dictionary
    Does chr6_apd_hap1 appear in your reference file (hg19.fa)? If so, try regenerating the dictionary file. You can use picard's CreateSequenceDictionary.jar, or, probably, just delete hg19.dict and gatk will automatically create it.

    If chr6_apd_hap1 isn't in your reference file, then you must have used a difference reference for mapping your reads. Make sure to use the same one in calling variants as used in mapping.

    Comment


    • #3
      The other possible problem can be the hg19.fa.fai file. If it does not contain the contig name, GATK won't produce proper hg19.dict file.

      Comment


      • #4
        Hi,

        the wiki was written ages ago and its no surprise that its pretty much out of date now. I'm even surprised that you managed to reach there. Anyways, a quick work-around for this problem would be to edit the .bed file that you downloaded from UCSC using sed e.g.

        Code:
        sed '/chr6_apd_hap1/d' target_intervals.bed > target_intervals2.bed
        mv target_intervals2.bed target_intervals.bed
        Last edited by ndbrown6; 05-30-2012, 10:24 PM.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Advanced Tools Transforming the Field of Cytogenomics
          by seqadmin


          At the intersection of cytogenetics and genomics lies the exciting field of cytogenomics. It focuses on studying chromosomes at a molecular scale, involving techniques that analyze either the whole genome or particular DNA sequences to examine variations in structure and behavior at the chromosomal or subchromosomal level. By integrating cytogenetic techniques with genomic analysis, researchers can effectively investigate chromosomal abnormalities related to diseases, particularly...
          09-26-2023, 06:26 AM
        • seqadmin
          How RNA-Seq is Transforming Cancer Studies
          by seqadmin



          Cancer research has been transformed through numerous molecular techniques, with RNA sequencing (RNA-seq) playing a crucial role in understanding the complexity of the disease. Maša Ivin, Ph.D., Scientific Writer at Lexogen, and Yvonne Goepel Ph.D., Product Manager at Lexogen, remarked that “The high-throughput nature of RNA-seq allows for rapid profiling and deep exploration of the transcriptome.” They emphasized its indispensable role in cancer research, aiding in biomarker...
          09-07-2023, 11:15 PM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 07:14 AM
        0 responses
        11 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 09-29-2023, 09:38 AM
        0 responses
        13 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 09-27-2023, 06:57 AM
        0 responses
        14 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 09-26-2023, 07:53 AM
        0 responses
        31 views
        0 likes
        Last Post seqadmin  
        Working...
        X