Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • UnifiedGenotyper - Actual calls made 0

    Hi all

    I am using Unified Genotyper of (GATK) to call variations from my exome dataset. Before calling variations I realigned and recalibrated the dataset as suggested by GATK pipeline. Surprisingly for one of the samples unified genotyper is running for required time and ends without any error BUT the vcf file generated has no called variations. The output only contains initial headers of vcf file and nothing else.
    The log file shows following:

    INFO 04:59:00,484 UnifiedGenotyper - Visited bases 3137891381
    INFO 04:59:00,484 UnifiedGenotyper - Callable bases 2860850500
    INFO 04:59:00,485 UnifiedGenotyper - Confidently called bases 2674460611
    INFO 04:59:00,485 UnifiedGenotyper - % callable bases of all loci 91.171
    INFO 04:59:00,485 UnifiedGenotyper - % confidently called bases of all loci 85.231
    INFO 04:59:00,485 UnifiedGenotyper - % confidently called bases of callable loci 93.485
    INFO 04:59:00,485 UnifiedGenotyper - Actual calls made 0
    INFO 04:59:00,486 TraversalEngine - Total runtime 13136.39 secs, 218.94 min, 3.65 hours
    INFO 04:59:00,486 TraversalEngine - 160540 reads were filtered out during traversal out of 34176517 total (0.47%)
    INFO 04:59:00,486 TraversalEngine - -> 71636 reads (0.21% of total) failing BadMateFilter
    INFO 04:59:00,486 TraversalEngine - -> 88904 reads (0.26% of total) failing UnmappedReadFilter
    INFO 04:59:19,240 GATKRunReport - Uploaded run statistics report to AWS S3

    Upon inspection, I found that complete vcf file is being generated before recalibration , but after recalibration it is malformed!

    Need help on this!!

    Thanks in advance.

  • #2
    UnifiedGenotyper doesn't directly do recalibration of variants. What's the exact command you're running it with?

    Comment


    • #3
      Below is the GATK command I used for doing recalibration (after indel realignment).

      nohup java -Xmx4g -jar /data1/GenomeAnalysisTK-1.5-0-g04cafff/GenomeAnalysisTK.jar -R /data1/ref_genome/gatk_ref/hg19_kayotypically.fasta -I ns002_merged_realigned.bam -T TableRecalibration -recalFile ns002_merged_countCovariates_before_reclbrtn.recal_data.csv -o ns002_merged_realigned_recalibrated.bam &

      The recalibrated .bam so generated when used for calling variations using UnifiedGenotyper makes malformed .vcf file.
      But if I run unified genotyper on only realigned .bam file (i.e before recalibration), I get proper .vcf file.

      Is this a problem of recalibration or sth else?

      Comment


      • #4
        That certainly sounds like a problem in recalibration, but the command you used above looks fine. What command are you using to generate the recaldata file?

        Comment


        • #5
          Following is the command used for generating recal file (Count Covariates):

          nohup java -Xmx4g -jar /data1/GenomeAnalysisTK-1.5-0-g04cafff/GenomeAnalysisTK.jar -R /data1/ref_genome/gatk_ref/hg19_kayotypically.fasta -knownSites /data1/ref_genome/gatk_ref/gatk_vqsr_recalibration_vcffiles/dbsnp_135.b37_FINAL.vcf -I nb005_merged_realigned_recalibrated.bam -T CountCovariates -cov ReadGroupCovariate -cov QualityScoreCovariate -cov DinucCovariate -cov CycleCovariate -recalFile nb005_countCovariates_AFTER_reclbrtn.recal_data.csv &

          everything is fine for all other samples!!
          please let me know how should I solve this.

          Thanks

          Comment


          • #6
            That's the command line for counting covariates after recalibration (for verification purposes only). I'm assuming you used a similar command for the first count covariates step.

            The main problem I can see here is that you're using dbsnp for b37, but the genome you're aligning against is hg19. I know b37 and hg19 are quite similar, but I'm not sure of the exact difference between them, so it's possible there's a misalignment between dbsnp and your genome, which would cause recalibration to seriously reduce your quality scores. You could try getting dbsnp for hg19 from the Broad Institute's FTP server and rerunning countcovariates and tablerecalibration with that.

            That's the only thing I can think of, unfortunately. Let me know if you still have problems!

            Comment


            • #7
              ok, I ll try doing that.
              but why is it happening only with this one sample?? everything is fine for rest all samples!!
              Also in log file it shows no.of callable bases, no. of confident calls etc.., if it was downgrading the quality scores so much then it shouldn't even have showed these statistics! Its just NOT WRITTING the calls in file!!


              INFO 04:59:00,484 UnifiedGenotyper - Visited bases 3137891381
              INFO 04:59:00,484 UnifiedGenotyper - Callable bases 2860850500
              INFO 04:59:00,485 UnifiedGenotyper - Confidently called bases 2674460611
              INFO 04:59:00,485 UnifiedGenotyper - % callable bases of all loci 91.171
              INFO 04:59:00,485 UnifiedGenotyper - % confidently called bases of all loci 85.231
              INFO 04:59:00,485 UnifiedGenotyper - % confidently called bases of callable loci 93.485
              INFO 04:59:00,485 UnifiedGenotyper - Actual calls made 0

              can that be a bug in the program??
              I ll try doing as you said.

              Thanks

              Comment


              • #8
                Originally posted by aan View Post
                ok, I ll try doing that.
                but why is it happening only with this one sample?? everything is fine for rest all samples!!
                Also in log file it shows no.of callable bases, no. of confident calls etc.., if it was downgrading the quality scores so much then it shouldn't even have showed these statistics! Its just NOT WRITTING the calls in file!!


                INFO 04:59:00,484 UnifiedGenotyper - Visited bases 3137891381
                INFO 04:59:00,484 UnifiedGenotyper - Callable bases 2860850500
                INFO 04:59:00,485 UnifiedGenotyper - Confidently called bases 2674460611
                INFO 04:59:00,485 UnifiedGenotyper - % callable bases of all loci 91.171
                INFO 04:59:00,485 UnifiedGenotyper - % confidently called bases of all loci 85.231
                INFO 04:59:00,485 UnifiedGenotyper - % confidently called bases of callable loci 93.485
                INFO 04:59:00,485 UnifiedGenotyper - Actual calls made 0

                can that be a bug in the program??
                I ll try doing as you said.

                Thanks


                Hi, aan

                Have you found the problem, I came across the same situation, everything looks fine, except for the "Actual calls made 0".

                "
                INFO 14:09:58,839 UnifiedGenotyper - Visited bases 3095677412
                INFO 14:09:58,846 UnifiedGenotyper - Callable bases 2861327131
                INFO 14:09:58,847 UnifiedGenotyper - Confidently called bases 2861327131
                INFO 14:09:58,847 UnifiedGenotyper - % callable bases of all loci 92.430
                INFO 14:09:58,847 UnifiedGenotyper - % confidently called bases of all loci 92.430
                INFO 14:09:58,848 UnifiedGenotyper - % confidently called bases of callable loci 100.000
                INFO 14:09:58,848 UnifiedGenotyper - Actual calls made 0
                INFO 14:09:58,848 TraversalEngine - Total runtime 8330.91 secs, 138.85 min, 2.31 hours
                "

                Comment


                • #9
                  hi

                  yes I was also facing the same problem. after a lot of troubleshooting I came to conclusion that Unified Genotyper is running fine, but the data is so bad that it does not pass the given thresholds of quality (of variants being called), that is the reason calls made are 0, ( because actually no call passed the quality check.)

                  To set it right , I analysed this sample right from alignment once again. After aligning again this error did not show up. actually there were variants called after running UG (although now there is another problem that I am facing with the same sample )

                  let me kno if it is not clear.

                  Comment


                  • #10
                    Hi, ann

                    thanks for your information! I also checked the quality, but hardly found any abnormal clues. Now I am trying to using a newer version of GATK to walk around this problem.
                    BTW, what version are you using?

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Understanding Genetic Influence on Infectious Disease
                      by seqadmin




                      During the COVID-19 pandemic, scientists observed that while some individuals experienced severe illness when infected with SARS-CoV-2, others were barely affected. These disparities left researchers and clinicians wondering what causes the wide variations in response to viral infections and what role genetics plays.

                      Jean-Laurent Casanova, M.D., Ph.D., Professor at Rockefeller University, is a leading expert in this crossover between genetics and infectious...
                      09-09-2024, 10:59 AM
                    • seqadmin
                      Addressing Off-Target Effects in CRISPR Technologies
                      by seqadmin






                      The first FDA-approved CRISPR-based therapy marked the transition of therapeutic gene editing from a dream to reality1. CRISPR technologies have streamlined gene editing, and CRISPR screens have become an important approach for identifying genes involved in disease processes2. This technique introduces targeted mutations across numerous genes, enabling large-scale identification of gene functions, interactions, and pathways3. Identifying the full range...
                      08-27-2024, 04:44 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Today, 06:25 AM
                    0 responses
                    13 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, Yesterday, 01:02 PM
                    0 responses
                    12 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 09-18-2024, 06:39 AM
                    0 responses
                    14 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 09-11-2024, 02:44 PM
                    0 responses
                    14 views
                    0 likes
                    Last Post seqadmin  
                    Working...
                    X