Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • [SNP calling, GATK] this contig isn't present in the Fasta sequence dictionar

    Hi all

    I've tried snp calling using GATK unifiedgenotyper but got the error message - it seems that there's a problem in fasta sequence dictionary. I am currently following <Exome analysis> in How-to Wiki section (http://seqanswers.com/wiki/How-to/ex..._recalibration) However, it seems that some of procedures does not use the latest program. So, I changed few things for those.

    Please give any advice, it will be really appreciate.

    Here is command
    $ java -Xmx4g -jar /usr/local/bin/gatk/GenomeAnalysisTK-1.4-30-gf2ef8d1/GenomeAnalysisTK.jar -glm BOTH -R hg19.fa -T UnifiedGenotyper -I input.marked.realigned.fixed.recal.bam -D dbsnp132.txt -o snps.vcf -metrics snps.metrics -stand_call_conf 50.0 -stand_emit_conf 10.0 -dcov 1000 -A DepthOfCoverage -A AlleleBalance -L target_intervals.bed
    and error message
    INFO 20:44:34,610 RodBindingArgumentTypeDescriptor - Dynamically determined type of target_intervals.bed to be BED
    INFO 20:44:34,647 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 20:44:34,648 HelpFormatter - The Genome Analysis Toolkit (GATK) v1.4-30-gf2ef8d1, Compiled 2012/02/17 20:18:04
    INFO 20:44:34,648 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO 20:44:34,648 HelpFormatter - Please view our documentation at http://www.broadinstitute.org/gsa/wiki
    INFO 20:44:34,648 HelpFormatter - For support, please view our support site at http://getsatisfaction.com/gsa
    INFO 20:44:34,649 HelpFormatter - Program Args: -glm BOTH -R hg19.fa -T UnifiedGenotyper -I input.marked.realigned.fixed.recal.bam -D dbsnp132.txt -o snps.vcf -metrics snps.metrics -stand_call_conf 50.0 -stand_emit_conf 10.0 -dcov 1000 -A DepthOfCoverage -A AlleleBalance -L target_intervals.bed
    INFO 20:44:34,649 HelpFormatter - Date/Time: 2012/02/24 20:44:34
    INFO 20:44:34,649 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 20:44:34,649 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 20:44:34,669 RodBindingArgumentTypeDescriptor - Dynamically determined type of dbsnp132.txt to be VCF
    INFO 20:44:34,682 GenomeAnalysisEngine - Strictness is SILENT
    INFO 20:44:34,737 SAMDataSource$SAMReaders - Initializing SAMRecords in serial
    INFO 20:44:34,752 SAMDataSource$SAMReaders - Done initializing BAM readers: total time 0.01
    INFO 20:44:34,762 RMDTrackBuilder - Loading Tribble index from disk for file dbsnp132.txt
    INFO 20:44:38,171 GATKRunReport - Uploaded run statistics report to AWS S3
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A USER ERROR has occurred (version 1.4-30-gf2ef8d1):
    ##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
    ##### ERROR Please do not post this error to the GATK forum
    ##### ERROR
    ##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
    ##### ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
    ##### ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
    ##### ERROR
    ##### ERROR MESSAGE: Badly formed genome loc: Contig chr6_apd_hap1 given as location, but this contig isn't present in the Fasta sequence dictionary
    ##### ERROR ------------------------------------------------------------------------------------------

  • #2
    Originally posted by sehrrot View Post
    ...
    ##### ERROR MESSAGE: Badly formed genome loc: Contig chr6_apd_hap1 given as location, but this contig isn't present in the Fasta sequence dictionary
    Does chr6_apd_hap1 appear in your reference file (hg19.fa)? If so, try regenerating the dictionary file. You can use picard's CreateSequenceDictionary.jar, or, probably, just delete hg19.dict and gatk will automatically create it.

    If chr6_apd_hap1 isn't in your reference file, then you must have used a difference reference for mapping your reads. Make sure to use the same one in calling variants as used in mapping.

    Comment


    • #3
      The other possible problem can be the hg19.fa.fai file. If it does not contain the contig name, GATK won't produce proper hg19.dict file.

      Comment


      • #4
        Hi,

        the wiki was written ages ago and its no surprise that its pretty much out of date now. I'm even surprised that you managed to reach there. Anyways, a quick work-around for this problem would be to edit the .bed file that you downloaded from UCSC using sed e.g.

        Code:
        sed '/chr6_apd_hap1/d' target_intervals.bed > target_intervals2.bed
        mv target_intervals2.bed target_intervals.bed
        Last edited by ndbrown6; 05-30-2012, 10:24 PM.

        Comment

        Latest Articles

        Collapse

        • seqadmin
          Exploring the Dynamics of the Tumor Microenvironment
          by seqadmin




          The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
          07-08-2024, 03:19 PM
        • seqadmin
          Exploring Human Diversity Through Large-Scale Omics
          by seqadmin


          In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
          06-25-2024, 06:43 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by seqadmin, Yesterday, 06:53 AM
        0 responses
        12 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 07-10-2024, 07:30 AM
        0 responses
        34 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 07-03-2024, 09:45 AM
        0 responses
        204 views
        0 likes
        Last Post seqadmin  
        Started by seqadmin, 07-03-2024, 08:54 AM
        0 responses
        213 views
        0 likes
        Last Post seqadmin  
        Working...
        X