Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Hi.

    Thanks for all your help.

    I tried what you told me to do, and now the error is the following:

    java -Xmx4g -jar /GenoStorage/Software/GATK/GenomeAnalysisTK.jar \ -R /GenoStorage/Genomas/hg19/hg19RefGenome.fa \ -knownSites:name,VCF /GenoStorage/BasesDados/ucsc_hg19/snp132CodingDbSnp.txt \ -I sample02187A_align_sorted.bam \ -T CountCovariates \ -cov ReadGroupCovariate \ -cov QualityScoreCovariate \ -cov CycleCovariate \ -cov DinucCovariate \ -recalFile sample02187A.recal_data.csv
    INFO 13:39:44,563 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 13:39:44,565 HelpFormatter - The Genome Analysis Toolkit (GATK) v1.3-21-gcb284ee, Compiled 2011/11/29 16:46:58
    INFO 13:39:44,566 HelpFormatter - Copyright (c) 2010 The Broad Institute
    INFO 13:39:44,566 HelpFormatter - Please view our documentation at http://www.broadinstitute.org/gsa/wiki
    INFO 13:39:44,566 HelpFormatter - For support, please view our support site at http://getsatisfaction.com/gsa
    INFO 13:39:44,567 HelpFormatter - Program Args: -R /GenoStorage/Genomas/hg19/hg19RefGenome.fa -knownSites:name,VCF /GenoStorage/BasesDados/ucsc_hg19/snp132CodingDbSnp.txt -I sample02187A_align_sorted.bam -T CountCovariates -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -recalFile sample02187A.recal_data.csv
    INFO 13:39:44,568 HelpFormatter - Date/Time: 2012/02/06 13:39:44
    INFO 13:39:44,568 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 13:39:44,568 HelpFormatter - ---------------------------------------------------------------------------------
    INFO 13:39:44,590 GenomeAnalysisEngine - Strictness is SILENT
    INFO 13:39:44,688 RMDTrackBuilder - Creating Tribble index in memory for file /GenoStorage/BasesDados/ucsc_hg19/snp132CodingDbSnp.txt
    INFO 13:39:51,195 GATKRunReport - Uploaded run statistics report to AWS S3
    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A USER ERROR has occurred (version 1.3-21-gcb284ee):
    ##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
    ##### ERROR Please do not post this error to the GATK forum
    ##### ERROR
    ##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
    ##### ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
    ##### ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
    ##### ERROR
    ##### ERROR MESSAGE: Your input file has a malformed header: We never saw the required CHROM header line (starting with one #) for the input VCF file
    ##### ERROR ------------------------------------------------------------------------------------------


    I downloaded the file from UCSC

    Comment


    • #17
      Originally posted by ulz_peter View Post
      Ok, that's something else. GATK wants to have the chromosomes ordered this way: chr1, chr2, chr3, ..., chrX, chrY, chrM. It seems your reference fasta file contains the chromosomes in a lexikographical ordering chr10 directly after chr1. When you reorder your SAM file ot orders the reads according to the order in your reference fasta file. You could download the single-chromosome fasta files from UCSC (http://hgdownload.cse.ucsc.edu/golde...9/chromosomes/) and order them using cat like that:
      Code:
      cat chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrX chrY chrM > hg19.fa
      It might be working to just do the ReorderSam program again with the newly sorted reference fasta file, otherwise you'd need to repeat alignment. In case you're planning a pipeline you might want to have the reference file in order to save the ReorderSam step...

      Hope that helps,
      Peter
      Hello Peter,
      I did order my reference in this way "cat chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrX chrY chrM > hg19.fa" but the problem is that the GATK latest bundle (2.3) has the "dbsnp_137.hg19.vcf" and the "1000G_phase1.indels.hg19.vcf" files that are ordered with the chrM at the beginning!!!!..
      I tried to reorder the vcf with the vcfsorter http://code.google.com/p/vcfsorter/....and took far to long to do half of the file (24h) ....then again I redone cat chrM chr1 chr2 chr3 chr4 chr5 chr6 chr7 chr8 chr9 chr10 chr11 chr12 chr13 chr14 chr15 chr16 chr17 chr18 chr19 chr20 chr21 chr22 chrX chrY > hg19.fa but I wasn't able to reorder the already consructed Bam file with ReorderSam.jar and gave me an error after constructing the Sequencedictionary with CreateSequenceDictionary.jar :

      " java.lang.IllegalArgumentException: File is not a supported reference file type: /home/cox/ex_storage/cromosomi/hg19_2/Sequencedictionary.bam


      What do you think I should do now ?

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Best Practices for Single-Cell Sequencing Analysis
        by seqadmin



        While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
        06-06-2024, 07:15 AM
      • seqadmin
        Latest Developments in Precision Medicine
        by seqadmin



        Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

        Somatic Genomics
        “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
        05-24-2024, 01:16 PM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 06:58 AM
      0 responses
      13 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 06-06-2024, 08:18 AM
      0 responses
      20 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 06-06-2024, 08:04 AM
      0 responses
      18 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 06-03-2024, 06:55 AM
      0 responses
      13 views
      0 likes
      Last Post seqadmin  
      Working...
      X