Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GATK error because of the order of reference chr.

    I commanded like this

    java -Xmx1g -jar ../GATK/GenomeAnalysisTK.jar -I P1_novo.reordered.sorted.bam -T RealignerTargetCreator -R ../reference/hg19_ucsc.fa -o P1.intervals --known ../reference/snp.vcf

    then, the following error message occurred.

    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A USER ERROR has occurred (version 1.4-21-g30b937d):
    ##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
    ##### ERROR Please do not post this error to the GATK forum
    ##### ERROR
    ##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
    ##### ERROR Visit our wiki for extensive documentation http://www.broadinstitute.org/gsa/wiki
    ##### ERROR Visit our forum to view answers to commonly asked questions http://getsatisfaction.com/gsa
    ##### ERROR
    ##### ERROR MESSAGE: Lexicographically sorted human genome sequence detected in reads.
    ##### ERROR For safety's sake the GATK requires human contigs in karyotypic order: 1, 2, ..., 10, 11, ..., 20, 21, 22, X, Y with M either leading or trailing these contigs.
    ##### ERROR This is because all distributed GATK resources are sorted in karyotypic order, and your processing will fail when you need to use these files.
    ##### ERROR You can use the ReorderSam utility to fix this problem: http://www.broadinstitute.org/gsa/wi...php/ReorderSam
    ##### ERROR reads contigs = [chr1, chr10, chr11, chr11_gl000202_random, chr12, chr13, chr14, chr15, chr16, chr17, chr17_ctg5_hap1, chr17_gl000203_random, chr17_gl000204_random, chr17_gl000205_random, chr17_gl000206_random, chr18, chr18_gl000207_random, chr19, chr19_gl000208_random, chr19_gl000209_random, chr1_gl000191_random, chr1_gl000192_random, chr2, chr20, chr21, chr21_gl000210_random, chr22, chr3, chr4, chr4_ctg9_hap1, chr4_gl000193_random, chr4_gl000194_random, chr5, chr6, chr6_apd_hap1, chr6_cox_hap2, chr6_dbb_hap3, chr6_mann_hap4, chr6_mcf_hap5, chr6_qbl_hap6, chr6_ssto_hap7, chr7, chr7_gl000195_random, chr8, chr8_gl000196_random, chr8_gl000197_random, chr9, chr9_gl000198_random, chr9_gl000199_random, chr9_gl000200_random, chr9_gl000201_random, chrM, chrUn_gl000211, chrUn_gl000212, chrUn_gl000213, chrUn_gl000214, chrUn_gl000215, chrUn_gl000216, chrUn_gl000217, chrUn_gl000218, chrUn_gl000219, chrUn_gl000220, chrUn_gl000221, chrUn_gl000222, chrUn_gl000223, chrUn_gl000224, chrUn_gl000225, chrUn_gl000226, chrUn_gl000227, chrUn_gl000228, chrUn_gl000229, chrUn_gl000230, chrUn_gl000231, chrUn_gl000232, chrUn_gl000233, chrUn_gl000234, chrUn_gl000235, chrUn_gl000236, chrUn_gl000237, chrUn_gl000238, chrUn_gl000239, chrUn_gl000240, chrUn_gl000241, chrUn_gl000242, chrUn_gl000243, chrUn_gl000244, chrUn_gl000245, chrUn_gl000246, chrUn_gl000247, chrUn_gl000248, chrUn_gl000249, chrX, chrY]

    I downloaded reference file from ucsc and catered them by this order:

    grep chr ../reference/hg19_ucsc.fa
    >chr1
    >chr2
    >chr3
    >chr4
    >chr5
    >chr6
    >chr7
    >chr8
    >chr9
    >chr10
    >chr11
    >chr12
    >chr13
    >chr14
    >chr15
    >chr16
    >chr17
    >chr18
    >chr19
    >chr20
    >chr21
    >chr22
    >chrX
    >chrY
    >chrM
    >chrUn_gl000211
    >chrUn_gl000212
    >chrUn_gl000213
    >chrUn_gl000214
    >chrUn_gl000215
    >chrUn_gl000216
    >chrUn_gl000217
    >chrUn_gl000218
    >chrUn_gl000219
    >chrUn_gl000220
    >chrUn_gl000221
    >chrUn_gl000222
    >chrUn_gl000223
    >chrUn_gl000224
    >chrUn_gl000225
    >chrUn_gl000226
    >chrUn_gl000227
    >chrUn_gl000228
    >chrUn_gl000229
    >chrUn_gl000230
    >chrUn_gl000231
    >chrUn_gl000232
    >chrUn_gl000233
    >chrUn_gl000234
    >chrUn_gl000235
    >chrUn_gl000236
    >chrUn_gl000237
    >chrUn_gl000238
    >chrUn_gl000239
    >chrUn_gl000240
    >chrUn_gl000241
    >chrUn_gl000242
    >chrUn_gl000243
    >chrUn_gl000244
    >chrUn_gl000245
    >chrUn_gl000246
    >chrUn_gl000247
    >chrUn_gl000248
    >chrUn_gl000249

    as the instruction says, I commanded reordersam tool of picard like this:

    java -jar ../picard/ReorderSam.jar I=P1_novo.sorted.bam O=P1_novo.reordered.sorted.bam REFERENCE=../reference/hg19_ucsc.fa

    however, the result of GATK with changed bam file makes same error message.

    is there any solution?

  • #2
    Hi,
    I really feel your pain as I have struggled with the same thing a good few times.
    You could save yourself a lot of bother b getting both your reference fastq and the dbsnp vcf file from gatk, they will more likely play together
    Chris

    Comment


    • #3
      Hi, thankyou for your reply.

      would you let me know where can i downlaod dbsnp and fastq files from GATK?

      I need dbsnp 135 and hg19 reference.

      as i know, the data from GATK bundle is dbsnp 131?129? and hg18 reference.

      is it possible to download the recent data from GATK?

      please link the site

      Comment


      • #4
        If you are planning to use the Broad bundle,
        I reckon the bundle for hg19 is present.

        1. Have you tried downloading from the following ftp yet?
        ftp://[email protected]/1.2/hg19/

        dbsnp version of the Broad bundle hg19, as I know it, is dbsnp132.
        However, if there is no specific reason to use dbsnp135 (or I might be wrong!), I don't think there would be any problem to use dbsnp132...?

        2. Also, you must make sure your reference chromosome order and vcf chromosome order are the same.
        (Personally I recall struggling because dbsnp132_b37 had "MT" on the top of chromosomeID list.)
        Last edited by alexbmp; 03-20-2012, 01:48 AM.

        Comment


        • #5
          thank you

          Thank you alex!
          but I have some questions...

          1. as you might see in my reference file, chromosomes were ordered with this order(chr1~chr22,chrX/YMT,chrUn~).
          however, after I run the novoalign, the error message says that it has weird chromosome order -> chr1, chr10~19, chr2, chr20~~~~
          how can i handle this problem? it's out of my hand to fix aligning program.

          2. Do I need to include chrUn~ sequences in my reference fasta file?
          these chrUn~ are not included in VCF file, aren't they?
          if I include them, the calling snp step will bother me again???

          Comment


          • #6
            I haven't used NovoAlign, so don't fully trust me

            1-1. If you build alignment index before alignment, check if your index is in chromosomal order (chr1, chr2, chr3, ..., chrX, chrY, chrM or the equivalent).

            1-2. If it is, check if your alignment program output options that emits chromosome ID headers in un-coordinated or lexicographical (chr1, chr10, chr11, ..., chrM, chrX, chrY) fashion. I haven't seen this kind of alignment output option yet; I highly suspect your index file is ordered lexicographically as written, as in 1-1 (I had the same error).

            2. If you are talking about contigs (or not-fully-assembled chromosome fragments), I think it is good to include them in your alignment step.

            I reckon physically existing sequence from such contigs will be mapped there, probably decreasing your error rate. Thinking about it, I'm not sure of this (but I'll write my thoughts anyway. Somebody please correct me.)

            I also think you can just exclude SNPs from contigs if their existence bugs you.
            Contigs are not fully assembled chromosomes in the first place.

            Did I understand your questions fully?

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Best Practices for Single-Cell Sequencing Analysis
              by seqadmin



              While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
              06-06-2024, 07:15 AM
            • seqadmin
              Latest Developments in Precision Medicine
              by seqadmin



              Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

              Somatic Genomics
              “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
              05-24-2024, 01:16 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 07:23 AM
            0 responses
            8 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 06-17-2024, 06:54 AM
            0 responses
            12 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 06-14-2024, 07:24 AM
            0 responses
            24 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 06-13-2024, 08:58 AM
            0 responses
            18 views
            0 likes
            Last Post seqadmin  
            Working...
            X