Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GATK UnifiedGenotyper with reducebam error

    I am using GATK UnifiedGenotyper on multiple bam files for target enrichment sequencing. I get an error which although is a "user error", I can't seem to work it out! Any help would be greatly appreciated.

    I am using Bam files that have been reduced by ReduceReads, I have many bam files, but even when I try the command with just a few I get the same error. Here is an example;

    java -Xmx20g -jar GenomeAnalysisTK.jar -T UnifiedGenotyper \
    -R human_g1k_v37.fasta \
    -D dbsnp_131_b37.final.rod \
    -L baitgroupfile.picard \
    -I sample1.reduced.bam \
    -I sample2.reduced.bam \
    -I sample3.reduced.bam \
    -o out.vcf \
    -stand_call_conf 50.0 \
    -stand_emit_conf 10.0 \
    -G Standard \
    -metrics out.metrics

    here is the error;


    ##### ERROR ------------------------------------------------------------------------------------------
    ##### ERROR A USER ERROR has occurred (version 2.4-3-g2a7af43):
    ##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
    ##### ERROR Please do not post this error to the GATK forum
    ##### ERROR
    ##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
    ##### ERROR Visit our website and forum for extensive documentation and answers to
    ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
    ##### ERROR
    ##### ERROR MESSAGE: Invalid command line: No tribble type was provided on the command line and the type of the file could not be determined dynamically. Please add a
    n explicit type tag :NAME listing the correct type from among the supported types:
    ##### ERROR Name FeatureType Documentation
    ##### ERROR BCF2 VariantContext http://www.broadinstitute.org/gatk/g...BCF2Codec.html
    ##### ERROR VCF VariantContext http://www.broadinstitute.org/gatk/g..._VCFCodec.html
    ##### ERROR VCF3 VariantContext http://www.broadinstitute.org/gatk/g...VCF3Codec.html




  • #2
    I think you need to provide a vcf file format for dbSNP file or mention which format it is in this option:

    -D dbsnp_131_b37.final.rod \

    Comment


    • #3
      Thanks for your help! Yes I needed;

      -D:dbsnp,vcf dbsnp_132.b37.vcf

      Comment


      • #4
        Calling variants from multiple BAM files

        Hello everybody,

        I am pretty new in bioinformatics and I am still learning. Sorry, if my problem is explain here, but I did not find it.. Right know I have some troubles. I have 96 FASTQ files from MiSeq (it is 96x2 - pair-read) and corresponding 96 BAM files. One FASTQ file represent one patient (it was amplicon sequencing workflow - BRCA1,2). I would like to use GATK to find SNPs and annotate them all together. I know how to use GATK -T UnifiedGenotyper for call variants, but when I create BAM.list I still have on my output one vcf file (and I could not assign each vcf to coressponding BAM) :-( so my question is, if I can use my BAM list (each sample have specify name) and get on output the vcf files (with the same name of my input BAM file). so finally I have lets say 96 BAMs and corresponding 96 VCF files with same name. and then use GATK for annotation.

        I hope my question is clear. I am not programmer, so if you can show just example of syntaxes? Or if you have some advice?

        Thank you very much for your time,

        Paul.

        Comment


        • #5
          Hi Paul,
          If I understand correctly you are trying to call SNPs from 96 bam files, but get one vcf file with only one individual?
          You should be calling all your BAM files together to get one VCF, but all your 96 individuals should be contained in that one vcf. If you only get genotypes for one person, then your BAM file headers might be incorrect without the sample label. Check your BAM files and see if the read group section have the sample ID.
          Here is an good explanation of the BAM/SAM header



          Hope that helps!

          Comment


          • #6
            Hello mimi lupton,

            first thank you for fast response :-)

            Ok, thats right - I have 96 BAM files (it is 96 individual patients) and when I create BAM list ( each row in my list is path/to/my/vcf/file) - and when I use GATK for call variants I will get just one single vcf file on my output - and I dont know how to split it to 96 single vcf files :-( read group is different at each BAM file.

            And I would like to keep naming in my files - so lets say - I have patient1.BAM, patient2.BAM ... patient96.bam > patient1.vcf, patient2.vcf ... patient96.vcf :-)

            It takes long time to rename each sample to original name :-)

            Thank you for help!!

            Paul.

            Comment


            • #7
              Ok, right now I have multiple vcf file, but I dont know, how to separate it by my input BAMs :-( And I dont know how to annotate my multiple vcf file :-( Please help me somebody !!


              Thank you!!

              Comment


              • #8
                Hi Paul,

                to separate out individuals from you VCF you can use VCFtools



                Annovar is good for annotation;

                Comment

                Latest Articles

                Collapse

                • seqadmin
                  Essential Discoveries and Tools in Epitranscriptomics
                  by seqadmin




                  The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                  04-22-2024, 07:01 AM
                • seqadmin
                  Current Approaches to Protein Sequencing
                  by seqadmin


                  Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                  04-04-2024, 04:25 PM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by seqadmin, 04-25-2024, 11:49 AM
                0 responses
                20 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-24-2024, 08:47 AM
                0 responses
                20 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-11-2024, 12:08 PM
                0 responses
                62 views
                0 likes
                Last Post seqadmin  
                Started by seqadmin, 04-10-2024, 10:19 PM
                0 responses
                61 views
                0 likes
                Last Post seqadmin  
                Working...
                X