Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Too many Variants called by HaplotypeCaller GATK

    Hi everybody,

    I am doing Exome analysis on 8 individual (family) and I just got something weird as I think. My pipeline is following the best practices of GATK but when I checked my last vcf file after calling with HaplotypeCaller walker, I found the file contains 2.1 million lines (am assuming these are all variants) . My vcf file is a merged file for all of the 8 samples together. As I knew Exome pipeline should usually get you around 20 thousand variants so am really confused now on what is wrong in my pipeline. Any ideas will be really appreciated.

  • #2
    Originally posted by drmaly View Post
    when I checked my last vcf file after calling with HaplotypeCaller walker, I found the file contains 2.1 million lines
    Hi there,

    If you consider the HaplotypeCaller output to be your final file, you're not really following Best Practices... As indicated in the GATK documentation, the output of HaplotypeCaller is a set of raw variants that is likely to include a lot of false positives. You have to filter them (either with VQSR or hard filters) to generate a callset with the desired level of sensitivity and specificity.

    Comment


    • #3
      vdauwera is right
      VCF output might differ if your commands are not proper from the begining (BWA to HaplotypeCaller to V). could you post your command and message you got before finishing the each job to help understanding what happened there

      jp.
      Originally posted by drmaly View Post
      Hi everybody,

      I am doing Exome analysis on 8 individual (family) and I just got something weird as I think. My pipeline is following the best practices of GATK but when I checked my last vcf file after calling with HaplotypeCaller walker, I found the file contains 2.1 million lines (am assuming these are all variants) . My vcf file is a merged file for all of the 8 samples together. As I knew Exome pipeline should usually get you around 20 thousand variants so am really confused now on what is wrong in my pipeline. Any ideas will be really appreciated.

      Comment

      Latest Articles

      Collapse

      • seqadmin
        Recent Advances in Sequencing Analysis Tools
        by seqadmin


        The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
        05-06-2024, 07:48 AM
      • seqadmin
        Essential Discoveries and Tools in Epitranscriptomics
        by seqadmin




        The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
        04-22-2024, 07:01 AM

      ad_right_rmr

      Collapse

      News

      Collapse

      Topics Statistics Last Post
      Started by seqadmin, Yesterday, 06:57 AM
      0 responses
      12 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 05-06-2024, 07:17 AM
      0 responses
      16 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 05-02-2024, 08:06 AM
      0 responses
      19 views
      0 likes
      Last Post seqadmin  
      Started by seqadmin, 04-30-2024, 12:17 PM
      0 responses
      24 views
      0 likes
      Last Post seqadmin  
      Working...
      X