Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • fongchun
    Member
    • May 2011
    • 55

    Originally posted by Jane M View Post
    Not yet...
    I think I might have figured out at least part of your question with regard to the file recal_data.grp. If you look at the GATK methods and workflow page under the "Base Quality Score Recalibrator" section, it shows the recal_data.grp being used as part of the -BQSR parameter:

    Code:
    java -jar GenomeAnalysisTK.jar \
       -T PrintReads \
       -R reference.fasta \
       -I input.bam \
       -BQSR recalibration_report.grp \
       -o output.bam
    \

    Interesting thing is the documentation for the PrintReads program doesn't include the -BQSR parameter...

    Comment

    • AJERYC
      Member
      • Jan 2012
      • 26

      Originally posted by Jane M View Post
      Thank you AJERYC!

      Because of some troubles with my version of dbSNP, I haven't managed to run:


      but I am still wondering if I should run the PrintReads step since I only have one bam file and if my recalibrated bam file will be the recal_data.grp file. Any idea?
      I'm not sure if we are running the same version of GATK. For Quality score recalibration I use the following instructions

      java -Xmx16G -jar gatk/GenomeAnalysisTK.jar -I input.marked.realigned.fixed.bam -R hg19/hg19.fa -T CountCovariates -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -recalFile input.recal_data.csv -knownSites:dbsnp,VCF dbsnp135.hg19.vcf

      java -Xmx16G -jar gatk/GenomeAnalysisTK.jar \-l INFO \-R hg19.fa \-I input.marked.realigned.fixed.bam \-T TableRecalibration \--out input.marked.realigned.fixed.recal.bam \-recalFile input.recal_data.csv


      You can see I get 2 files, one is the bam file and the other one is the recal_data (that you get in the first instruction. Maybe you are missing the second instruction and that is why you dont get the bam file.

      Comment

      • Jane M
        Senior Member
        • Aug 2011
        • 239

        Originally posted by AJERYC View Post
        I'm not sure if we are running the same version of GATK. For Quality score recalibration I use the following instructions

        java -Xmx16G -jar gatk/GenomeAnalysisTK.jar -I input.marked.realigned.fixed.bam -R hg19/hg19.fa -T CountCovariates -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -recalFile input.recal_data.csv -knownSites:dbsnp,VCF dbsnp135.hg19.vcf

        java -Xmx16G -jar gatk/GenomeAnalysisTK.jar \-l INFO \-R hg19.fa \-I input.marked.realigned.fixed.bam \-T TableRecalibration \--out input.marked.realigned.fixed.recal.bam \-recalFile input.recal_data.csv


        You can see I get 2 files, one is the bam file and the other one is the recal_data (that you get in the first instruction. Maybe you are missing the second instruction and that is why you dont get the bam file.
        The point is that we are not using the same version. You probably have a version before v2.0 and have a version after 2.0. From this 2.0 version, CountCovariates and TableRecalibration do not exist anymore. That's a pity because the process was rather clear. The csv file generated at the CountCovariates step is then used at the TableRecalibration step...
        Last edited by Jane M; 09-07-2012, 02:05 AM.

        Comment

        • Jane M
          Senior Member
          • Aug 2011
          • 239

          Originally posted by fongchun View Post
          I think I might have figured out at least part of your question with regard to the file recal_data.grp. If you look at the GATK methods and workflow page under the "Base Quality Score Recalibrator" section, it shows the recal_data.grp being used as part of the -BQSR parameter:

          Code:
          java -jar GenomeAnalysisTK.jar \
             -T PrintReads \
             -R reference.fasta \
             -I input.bam \
             -BQSR recalibration_report.grp \
             -o output.bam
          \

          Interesting thing is the documentation for the PrintReads program doesn't include the -BQSR parameter...
          Ah, interesting.. I only noticed this information about PrintReads (http://www.broadinstitute.org/gatk/g...ntReads.html):
          java -Xmx2g -jar GenomeAnalysisTK.jar \
          -R ref.fasta \
          -T PrintReads \
          -o output.bam \
          -I input1.bam \
          -I input2.bam \
          --read_filter MappingQualityZero
          I didn't check where you suggested me. And here it's much clearer:
          java -jar GenomeAnalysisTK.jar \
          -T PrintReads \
          -R reference.fasta \
          -I input.bam \
          -BQSR recalibration_report.grp \
          -o output.bam
          The grp file is used and there is an output bam file
          Thanks fongchun!

          Comment

          • rahilsethi
            Member
            • May 2010
            • 22

            GATK -dcov option???

            I have additional question to raonyguimaraes's post
            Does anyone know in details about GATK -dcov option in UnifiedGenotyper. I tried to look in GATK Manual but could not find much about it other than the following information:
            -dcov [50 for 4x, 200 for >30x WGS or Whole exome]
            in the link:


            Also if not specified what default value this option takes?

            If you anyone knows about it could you please send me the link to the information resource?

            Thanks in advance

            Comment

            • Jane M
              Senior Member
              • Aug 2011
              • 239

              I am wondering if the step of variant quality score recalibration, after the variant calling is still in use. If I remember well, I read somewhere that it was no more performed. In addition, in the publications that I read recently, this step is not mentioned. Do you know why it has been abandoned?
              Or what was the interest in the first place to recalibrate the quality of the variant bases after the variant calling, since there was the quality score recalibration before variant calling ?

              Comment

              • Jane M
                Senior Member
                • Aug 2011
                • 239

                Originally posted by Jane M View Post
                I am wondering if the step of variant quality score recalibration, after the variant calling is still in use. If I remember well, I read somewhere that it was no more performed. In addition, in the publications that I read recently, this step is not mentioned. Do you know why it has been abandoned?
                Any suggestion?

                Comment

                • sdvie
                  Member
                  • Jul 2010
                  • 68

                  Originally posted by rahilsethi View Post
                  I have additional question to raonyguimaraes's post
                  Does anyone know in details about GATK -dcov option in UnifiedGenotyper. I tried to look in GATK Manual but could not find much about it other than the following information:
                  -dcov [50 for 4x, 200 for >30x WGS or Whole exome]
                  in the link:


                  Also if not specified what default value this option takes?

                  If you anyone knows about it could you please send me the link to the information resource?

                  Thanks in advance
                  We had some discussion on this in the GATK forum here. Maybe that is of interest to you.

                  cheers,
                  Sophia

                  Comment

                  • Jane M
                    Senior Member
                    • Aug 2011
                    • 239

                    Concerning the sam to bam conversion and suppression of PCR duplicates steps, are there any reason to prefer Picard to samtools?
                    I tried SortSam from Picard and it seems to take much more time than samtools view + samtools sort.
                    I think I will use samtools, but I would like to know if there are advantages when using Picard.
                    Thank you

                    Comment

                    • ddaneels
                      Member
                      • Mar 2012
                      • 20

                      I get the following error when using GATK to perform local realignment around indels.

                      Anyone an idea what went wrong?

                      Code:
                      E:\EXOME DATA ANALYSIS\1 Unzipped fastq>java -jar GenomeAnalysisTK.jar -T Realig
                      nerTargetCreator -R hg19.fa -o Ot2363.bam.list -I Ot2363.marked.bam
                      INFO  13:45:07,701 HelpFormatter - ---------------------------------------------
                      -----------------------------------
                      INFO  13:45:07,710 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.1-9-gb9
                      0951c, Compiled 2012/09/19 21:18:53
                      INFO  13:45:07,710 HelpFormatter - Copyright (c) 2010 The Broad Institute
                      INFO  13:45:07,710 HelpFormatter - For support and documentation go to http://ww
                      w.broadinstitute.org/gatk
                      INFO  13:45:07,712 HelpFormatter - Program Args: -T RealignerTargetCreator -R hg
                      19.fa -o Ot2363.bam.list -I Ot2363.marked.bam
                      INFO  13:45:07,712 HelpFormatter - Date/Time: 2012/09/20 13:45:07
                      INFO  13:45:07,712 HelpFormatter - ---------------------------------------------
                      -----------------------------------
                      INFO  13:45:07,713 HelpFormatter - ---------------------------------------------
                      -----------------------------------
                      INFO  13:45:07,720 GenomeAnalysisEngine - Strictness is SILENT
                      INFO  13:45:07,723 ReferenceDataSource - Index file E:\EXOME DATA ANALYSIS\1 Unz
                      ipped fastq\hg19.fa.fai does not exist. Trying to create it now.
                      PROGRESS UPDATE: file is 15 percent complete
                      PROGRESS UPDATE: file is 28 percent complete
                      PROGRESS UPDATE: file is 39 percent complete
                      PROGRESS UPDATE: file is 54 percent complete
                      PROGRESS UPDATE: file is 67 percent complete
                      PROGRESS UPDATE: file is 77 percent complete
                      PROGRESS UPDATE: file is 89 percent complete
                      PROGRESS UPDATE: file is 99 percent complete
                      ##### ERROR --------------------------------------------------------------------
                      ----------------------
                      ##### ERROR A USER ERROR has occurred (version 2.1-9-gb90951c):
                      ##### ERROR The invalid arguments or inputs must be corrected before the GATK ca
                      n proceed
                      ##### ERROR Please do not post this error to the GATK forum
                      ##### ERROR
                      ##### ERROR See the documentation (rerun with -h) for this tool to view allowabl
                      e command-line arguments.
                      ##### ERROR Visit our website and forum for extensive documentation and answers
                      to
                      ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
                      ##### ERROR
                      ##### ERROR MESSAGE: Couldn't write file E:\EXOME DATA ANALYSIS\1 Unzipped fastq
                      \hg19.fa.fai because exception The process cannot access the file because anothe
                      r process has locked a portion of the file
                      ##### ERROR --------------------------------------------------------------------
                      ----------------------

                      Comment

                      • AJERYC
                        Member
                        • Jan 2012
                        • 26

                        Originally posted by ddaneels View Post
                        I get the following error when using GATK to perform local realignment around indels.

                        Anyone an idea what went wrong?

                        Code:
                        E:\EXOME DATA ANALYSIS\1 Unzipped fastq>java -jar GenomeAnalysisTK.jar -T Realig
                        nerTargetCreator -R hg19.fa -o Ot2363.bam.list -I Ot2363.marked.bam
                        INFO  13:45:07,701 HelpFormatter - ---------------------------------------------
                        -----------------------------------
                        INFO  13:45:07,710 HelpFormatter - The Genome Analysis Toolkit (GATK) v2.1-9-gb9
                        0951c, Compiled 2012/09/19 21:18:53
                        INFO  13:45:07,710 HelpFormatter - Copyright (c) 2010 The Broad Institute
                        INFO  13:45:07,710 HelpFormatter - For support and documentation go to http://ww
                        w.broadinstitute.org/gatk
                        INFO  13:45:07,712 HelpFormatter - Program Args: -T RealignerTargetCreator -R hg
                        19.fa -o Ot2363.bam.list -I Ot2363.marked.bam
                        INFO  13:45:07,712 HelpFormatter - Date/Time: 2012/09/20 13:45:07
                        INFO  13:45:07,712 HelpFormatter - ---------------------------------------------
                        -----------------------------------
                        INFO  13:45:07,713 HelpFormatter - ---------------------------------------------
                        -----------------------------------
                        INFO  13:45:07,720 GenomeAnalysisEngine - Strictness is SILENT
                        INFO  13:45:07,723 ReferenceDataSource - Index file E:\EXOME DATA ANALYSIS\1 Unz
                        ipped fastq\hg19.fa.fai does not exist. Trying to create it now.
                        PROGRESS UPDATE: file is 15 percent complete
                        PROGRESS UPDATE: file is 28 percent complete
                        PROGRESS UPDATE: file is 39 percent complete
                        PROGRESS UPDATE: file is 54 percent complete
                        PROGRESS UPDATE: file is 67 percent complete
                        PROGRESS UPDATE: file is 77 percent complete
                        PROGRESS UPDATE: file is 89 percent complete
                        PROGRESS UPDATE: file is 99 percent complete
                        ##### ERROR --------------------------------------------------------------------
                        ----------------------
                        ##### ERROR A USER ERROR has occurred (version 2.1-9-gb90951c):
                        ##### ERROR The invalid arguments or inputs must be corrected before the GATK ca
                        n proceed
                        ##### ERROR Please do not post this error to the GATK forum
                        ##### ERROR
                        ##### ERROR See the documentation (rerun with -h) for this tool to view allowabl
                        e command-line arguments.
                        ##### ERROR Visit our website and forum for extensive documentation and answers
                        to
                        ##### ERROR commonly asked questions http://www.broadinstitute.org/gatk
                        ##### ERROR
                        ##### ERROR MESSAGE: Couldn't write file E:\EXOME DATA ANALYSIS\1 Unzipped fastq
                        \hg19.fa.fai because exception The process cannot access the file because anothe
                        r process has locked a portion of the file
                        ##### ERROR --------------------------------------------------------------------
                        ----------------------
                        I think the error is here:
                        Couldn't write file E:\EXOME DATA ANALYSIS\1 Unzipped fastq
                        check up for Linux write permissions of the directory, harddisk space...

                        Comment

                        • wwhlazio
                          Junior Member
                          • Oct 2012
                          • 4

                          Have you got any answer to this issue?

                          Have you got any answer to this issue?

                          Thanks!

                          Wen

                          Originally posted by blackgore View Post
                          In following the workflow mentioned above, I've come up against an error, and I'm wondering if I'm alone in this. Has anyone experienced difficulty with using CountCovariates tool, specifically with errors regarding accessing information from the input BAM file? I've tried this with several samples, but keep getting the same error, "Bad input: Could not find any usable data in the input BAM file(s)"

                          (for those interested, the BAM files in question are not empty, and work just fine with samtools view).



                          Code:
                          java -Xmx16g -jar /$Software/GenomeAnalysisTK-1.3-17-gc62082b/GenomeAnalysisTK.jar -T CountCovariates -R /$Genomes/Broad/Human/b37/human_g1k_v37.fasta -I $Projects/data/SampleA_bowtie.gatk.realign.bam -nt 8 -l INFO -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -log RECAL.log -recalFile RECAL.csv --knownSites $Genomes/Broad/Human/b37/dbsnp_132.b37.vcf
                          
                          INFO  14:01:25,870 HelpFormatter - ---------------------------------------------------------------------------------
                          INFO  14:01:25,875 HelpFormatter - The Genome Analysis Toolkit (GATK) v1.3-17-gc62082b, Compiled 2011/11/18 15:24:46
                          INFO  14:01:25,875 HelpFormatter - Copyright (c) 2010 The Broad Institute
                          INFO  14:01:25,876 HelpFormatter - Please view our documentation at [url]http://www.broadinstitute.org/gsa/wiki[/url]
                          INFO  14:01:25,876 HelpFormatter - For support, please view our support site at [url]http://getsatisfaction.com/gsa[/url]
                          INFO  14:01:25,877 HelpFormatter - Program Args: -T CountCovariates -R /$Genomes/Broad/Human/b37/human_g1k_v37.fasta -I $Projects/data/SampleA_bowtie.gatk.realign.bam -nt 8 -l INFO -cov ReadGroupCovariate -cov QualityScoreCovariate -cov CycleCovariate -cov DinucCovariate -log RECAL.log -recalFile RECAL.csv --knownSites $Genomes/Broad/Human/b37/dbsnp_132.b37.vcf
                          INFO  14:01:25,878 HelpFormatter - Date/Time: 2011/11/24 14:01:25
                          INFO  14:01:25,878 HelpFormatter - ---------------------------------------------------------------------------------
                          INFO  14:01:25,878 HelpFormatter - ---------------------------------------------------------------------------------
                          INFO  14:01:26,052 RodBindingArgumentTypeDescriptor - Dynamically determined type of $Genomes/Broad/Human/b37/dbsnp_132.b37.vcf to be VCF
                          INFO  14:01:26,064 GenomeAnalysisEngine - Strictness is SILENT
                          INFO  14:01:26,815 RMDTrackBuilder - Loading Tribble index from disk for file $Genomes/Broad/Human/b37/dbsnp_132.b37.vcf
                          INFO  14:01:30,532 MicroScheduler - Running the GATK in parallel mode with 8 concurrent threads
                          INFO  14:01:32,326 CountCovariatesWalker - The covariates being used here:
                          INFO  14:01:32,327 CountCovariatesWalker -      ReadGroupCovariate
                          INFO  14:01:32,327 CountCovariatesWalker -      QualityScoreCovariate
                          INFO  14:01:32,327 CountCovariatesWalker -      CycleCovariate
                          INFO  14:01:32,328 CountCovariatesWalker -      DinucCovariate
                          INFO  14:01:41,189 CountCovariatesWalker - Writing raw recalibration data...
                          INFO  14:01:44,145 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused
                          INFO  14:01:44,146 HttpMethodDirector - Retrying request
                          INFO  14:01:44,149 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused
                          INFO  14:01:44,149 HttpMethodDirector - Retrying request
                          INFO  14:01:44,152 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused
                          INFO  14:01:44,153 HttpMethodDirector - Retrying request
                          INFO  14:01:44,155 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused
                          INFO  14:01:44,155 HttpMethodDirector - Retrying request
                          INFO  14:01:44,158 HttpMethodDirector - I/O exception (java.net.ConnectException) caught when processing request: Connection refused
                          INFO  14:01:44,158 HttpMethodDirector - Retrying request
                          ##### ERROR ------------------------------------------------------------------------------------------
                          ##### ERROR A USER ERROR has occurred (version 1.3-17-gc62082b):
                          ##### ERROR The invalid arguments or inputs must be corrected before the GATK can proceed
                          ##### ERROR Please do not post this error to the GATK forum
                          ##### ERROR
                          ##### ERROR See the documentation (rerun with -h) for this tool to view allowable command-line arguments.
                          ##### ERROR Visit our wiki for extensive documentation [url]http://www.broadinstitute.org/gsa/wiki[/url]
                          ##### ERROR Visit our forum to view answers to commonly asked questions [url]http://getsatisfaction.com/gsa[/url]
                          ##### ERROR
                          ##### ERROR MESSAGE: Bad input: Could not find any usable data in the input BAM file(s).
                          ##### ERROR ------------------------------------------------------------------------------------------

                          Comment

                          • bunburillo
                            Junior Member
                            • Nov 2010
                            • 2

                            Hi you all and congratulations for this useful thread.

                            I am trying to reproduce the pipeline posted at the beginning of the post as an alternative way for SNP analysis.
                            I am not actually experienced in NGS but I have to deal with the results of exome analyses coming from MiSeq sequencer and I would like to improve (compare) the results obtained trhough the MiSeq machine (BWA, CASAVA...).
                            According to the tutorial posted by ulz_peter (thanks again) I have performed the initial reference genome indexing (hg19) with the last updated version of bwa (0.6.2) and I obtained 5 different files as a result

                            hg19.amb
                            hg19.ann
                            hg19.bwt
                            hg19.pac
                            hg19.sa

                            According to other threads (http://seqanswers.com/forums/showthread.php?t=20705) it seems that the expected number of resulting files is 8. May I continue with this five files or it should be better to work with an earlier version of bwa? just in order to be able to reproduce the pipeline here described.

                            On the other hand, and thinking on the next step in the pipe, according to the BWA alignment options suggested in the tutorial:

                            "the -I option tells BWA to use Illumina1.3+ qualities"

                            but if I am not misunderstood, Miseq fastq results are in Sanger format (Illumina 1.8+), so may I use the -I option or not?

                            I think I am asking for very basic things but you know, basic knowledge is crucial to understand complexity. So I´ll be grateful if anuone could help me. I promise to continue asking when I have a doubt.

                            Thanks in advance

                            Comment

                            • sirmark
                              Member
                              • Feb 2013
                              • 24

                              I think it's important add in manual and in the wiki to add that vcf file, hg19.fasta
                              are in GATK bundle to which it's possible to access with an ftp client:
                              GATK budle ftp with an ftp client
                              http://gatkforums.broadinstitute.org...lic-ftp-server


                              I think that it's an important step to add in wiki

                              Comment

                              • carolW
                                Senior Member
                                • Apr 2013
                                • 103

                                bwa index file of hg19

                                Hi,
                                As the index file of hg19 takes time, is it possible to download the built version from somewhere?

                                Thanks,

                                Carol,

                                Originally posted by ulz_peter View Post
                                Hi Folks,

                                As I was writing a short guide of Exome analysis in our Institute, I thought it might be of some use to others especially for newbies, who need some kind of starting point to get to analysis of exome data (pretty much like the RNA-seq manual I once read in an older thread...). Instead of explaining everything in 100 new threads one could then point to that manual...

                                It is the way we do exome analysis at our Institute, but I would be happy if people help improve the manual, add their knowledge and expand it, like a common knowledge base for exome-level analysis.

                                I attached the pdf version and a .doc version within a zip folder, as the filesize was too large for uploading the doc file alone.

                                The most updated version can be found in the SeqWiki (http://seqanswers.com/wiki/How-to/exome_analysis)
                                (just to make it clear, it is not regularly updated and it's only goal is to get people started on the use of tools often used in exome sequencing)

                                Any comments highly appreciated!

                                P.S. I added a (very) short visualization chapter

                                Comment

                                Latest Articles

                                Collapse

                                • SEQadmin2
                                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                  by SEQadmin2


                                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                  ...
                                  06-02-2026, 10:05 AM
                                • SEQadmin2
                                  Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                  by SEQadmin2


                                  With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                  Introduction

                                  Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                  05-22-2026, 06:42 AM
                                • SEQadmin2
                                  Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                  by SEQadmin2

                                  Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                  Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                  05-06-2026, 09:04 AM

                                ad_right_rmr

                                Collapse

                                News

                                Collapse

                                Topics Statistics Last Post
                                Started by SEQadmin2, Yesterday, 08:59 AM
                                0 responses
                                14 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-02-2026, 12:03 PM
                                0 responses
                                22 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 06-02-2026, 11:40 AM
                                0 responses
                                19 views
                                0 reactions
                                Last Post SEQadmin2  
                                Started by SEQadmin2, 05-28-2026, 11:40 AM
                                0 responses
                                32 views
                                0 reactions
                                Last Post SEQadmin2  
                                Working...