Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • GATK Haplotype Caller calls Indels in SOLID reads that IGV does not Display

    Hello everyone,

    I am Daniele and I'm a Junior Researcher in a private foundation in Rome.

    Before explaining my problem, let me say that I am pretty new to the SOLID technology and to Variant Calling in general so forgive me if the question sounds dumb, but I couldn't find an answer nowhere!

    That said, I am analyzing SOLID reads for a target resequencing experiments. The files were given me as BAM, already aligned to my reference genome using the lifescope suite provided by AB.

    I used a classical approach for variant calling, so i preprocessed the reads, marked duplicates with Picard, and run the GATK pipeline using the Best Practices for Variant Calling (so I recalibrate the base QSs and I realigned around INDELS. ).

    I have then used HaplotypeCaller for the variant calling and outputted the VCF files for my experiments.

    Thing is that HaplotypeCaller does call several InDels that, when i check my final bam file (the one i give to Haplotype caller for calling variants) are not presents.

    specifically, any InDel in my vcf is not seen in IGV, but some of them appears as single nucleotide variants when I unchecked the "Quality weight allele fraction" in the Alignment Panel inside the IGV preferences. I thought this was an IGV issue and played a little bit with the options, but I found no solution. Notably, the Genotype Quality of these position is always around 99 I checked around the web, but I cannot find any explanation to this behavior.

    Can someone provide some help?


    Thanks in advance!

    Daniele

  • #2
    Hello Daniele,

    To at least partly answer your question: HaplotypeCaller performs local realignment within the run, so your "final bam file" is actually not the one you input to HaplotypeCaller but something you don't see.. So HC might realign a region and find an indel, while in your input bam you see nothing or one ore more SNPs (which likely is caused by mismatches in an anyway wrong alignment).

    If you want to see how it looks like AFTER HaplotypeCaller has realigned the reads, you can rerun it using the flag:
    --bamOutput newbamfile.bam
    (takes quite a while to run).

    You can also make it print out all possible haplotypes to the bam with:
    --bamWriterType ALL_POSSIBLE_HAPLOTYPES
    and then see them in IGV (choose "color alignment by: tag" and then write "HC" in the box).

    Hope this helps at least a bit,
    Linnéa

    Comment


    • #3
      Hi Linnea, and thanks for the very quick reply!

      I actually found out a similar answer on the GATK forum after I posted it, but yours was concise and very explanatory, so thank you again!

      I'm now running the --bamOutput option on my samples in order to check ho HaplotypeCaller realigned the reads.

      However, something still does not add up, specifically, why the GenotypeQuality (GQ) of these Indels is always 99 (checked multiple times on multiple samples)?


      Thanks in advance!

      Daniele
      Last edited by wariobrega; 07-01-2015, 01:30 AM.

      Comment


      • #4
        No sorry, I can't explain that part, maybe someone else has an idea?

        But actually, why can't they just be real indels with a very high quality? (99 seems to be the highest quality you can get: "Because the most likely PL is always 0, GQ = second highest PL - 0. If the second most likely PL is greater than 99, we still assign a GQ of 99, so the highest value of GQ is 99." -from the GATK webpage). Maybe it will be clear after the realignment? (And sorry if I misunderstood something, I am really no indel expert..)

        Comment


        • #5
          GQ does not tell you anything about the variant quality. GQ tells you about how certain the HC is about the zygosity.

          About the deletions: do they disappear if you use this option in the variant calling step?

          --dontUseSoftClippedBases

          Comment


          • #6
            Originally posted by Zaag View Post
            GQ does not tell you anything about the variant quality. GQ tells you about how certain the HC is about the zygosity.

            About the deletions: do they disappear if you use this option in the variant calling step?

            --dontUseSoftClippedBases
            Hi Zaag and apologies for the late reply (I happened to be on holyday these last couple of weeks!),

            Part of the deletions disappeared after I used your options, although new ones reappeared. coll thing though, many of the FP InDels were among the one that were cleansed.

            I am now also trying to use Picard CleanSam to filter them BEFORE the GATK pipeline and compare the differences (also to see why these new InDels appear). Thanks a lot for your reply, was very helpful!

            Originally posted by Linnea View Post
            No sorry, I can't explain that part, maybe someone else has an idea?

            But actually, why can't they just be real indels with a very high quality? (99 seems to be the highest quality you can get: "Because the most likely PL is always 0, GQ = second highest PL - 0. If the second most likely PL is greater than 99, we still assign a GQ of 99, so the highest value of GQ is 99." -from the GATK webpage). Maybe it will be clear after the realignment? (And sorry if I misunderstood something, I am really no indel expert..)
            Hi Linnea! again, apoologies for the late reply.

            I am quite confident these InDels were not real beacuase the same regions were validated with Sanger before the experiment Also, seeing a lot of inDel nvery close to each other (3-4 bps at the most) and considering the nature of the disease, as long as the conservation of these regions makes me think these are FP. Again, I'm not an InDel expert as well, so we're on the same boat! Thanks a lot for your contribution though, it was really helpful!

            Daniele
            Last edited by wariobrega; 08-12-2015, 06:50 AM. Reason: Forgot to reply to Linnea!

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Latest Developments in Precision Medicine
              by seqadmin



              Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

              Somatic Genomics
              “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
              05-24-2024, 01:16 PM
            • seqadmin
              Recent Advances in Sequencing Analysis Tools
              by seqadmin


              The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
              05-06-2024, 07:48 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 05-24-2024, 07:15 AM
            0 responses
            15 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-23-2024, 10:28 AM
            0 responses
            18 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-23-2024, 07:35 AM
            0 responses
            21 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 05-22-2024, 02:06 PM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Working...
            X