Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • Jolin
    Member
    • Oct 2011
    • 10

    SNV calling using GATK with data from multiple lanes

    Hi,

    I am using exome sequencing data to call SNVs with unifiedgenotyper of GATK. I have two lanes for each sample. So I merged two bam files into one with two read groups. But in the VCF file, I got two columns like GT:ADP:GQ:PL 0/1:20,3:23:56:56,0,576 0/1:23,9:32:99:153,0,676.
    My questions are
    (1) whether GATK treated these two as two samples because there are two read groups?
    (2) does GATK called SNVs in these two lanes separately or merge the reads of them?
    (3) when I calculate the minor allele frequency, shall I use both columns of GT:ADP:GQ:PL?

    Eager to know the answer.
    Thank you in advance.
  • N311V
    Member
    • Jul 2013
    • 34

    #2
    In regards to your first question, I do think GATK UnifiedGenotyper would have treated each different read group as a different sample (http://gatkforums.broadinstitute.org...bout-bam-files).

    Is there a particular reason you're using the UnifiedGenotyper? HaplotypeCaller is it's successor (http://www.broadinstitute.org/gatk/g...-discovery-ovw).

    Comment

    • Jolin
      Member
      • Oct 2011
      • 10

      #3
      Hi N311V,

      Thank you very much. If they treat different read groups as different samples, then the read groups of each lane are supposed to be the same, right? But this is not mentioned at all in GATK website.

      I just called SNPs not indels. So unified genotyper seems to be faster. Did HaplotyperCaller run better than Unified Genotyper in your project?

      Comment

      • westerman
        Rick Westerman
        • Jun 2008
        • 1104

        #4
        From the GATK web page:

        The HaplotypeCaller is a more recent and sophisticated tool than the UnifiedGenotyper. Its ability to call SNPs is equivalent to that of the UnifiedGenotyper, and its ability to call indels is far superior. We recommend using HaplotypeCaller in all cases, with only a few exceptions:

        If you want to analyze more than 100 samples at a time (for performance reasons)
        If you are working with non-diploid organisms (UG can handle different levels of ploidy while HC cannot)
        If you are working with pooled samples (also due to the HC’s limitation regarding ploidy)
        In those cases, we recommend using UnifiedGenotyper instead of HaplotypeCaller.
        Personally I am not sure which is better. Getting different results bioinformatically is not a proof of correctness.

        Comment

        • athomson
          Junior Member
          • Feb 2013
          • 1

          #5
          Originally posted by N311V View Post
          In regards to your first question, I do think GATK UnifiedGenotyper would have treated each different read group as a different sample (http://gatkforums.broadinstitute.org...bout-bam-files).
          If you look at the desc of the SM tag in that page, its seems GATK would treat all read groups with the same SM as coming from the same sample

          GATK tools treat all read groups with the same SM value as containing sequencing data for the same sample. Therefore it's critical that the SM field be correctly specified, especially when using multi-sample tools like the Unified Genotyper.

          Comment

          • N311V
            Member
            • Jul 2013
            • 34

            #6
            Originally posted by Jolin View Post
            If they treat different read groups as different samples, then the read groups of each lane are supposed to be the same, right? But this is not mentioned at all in GATK website.
            I did read somewhere on the GATK website that each sample needs a unique read group, sorry don't have a link right now. To keep track of lane perhaps you could use picard tools AddOrReplaceReadGroups.jar and specify the library name as the lane.

            Originally posted by Jolin View Post
            I just called SNPs not indels. So unified genotyper seems to be faster. Did HaplotyperCaller run better than Unified Genotyper in your project?
            I was interested in SNPs and indels which made HaplotypeCaller an great all-in-one solution. Also, I was only interested in a couple of genes so speed was not a concern. I haven't compared the SNP results from HaplotypeCaller to UnifiedGenotyper so can't say if they're the same. I assume so but better check.

            Comment

            • Jolin
              Member
              • Oct 2011
              • 10

              #7
              Hi Westerman, Thank you. Actually our lab used Unified Genotyper all the time and did some PCR validation on the predicted SNVs. It seems that UG works well in SNV detection.

              Comment

              • Jolin
                Member
                • Oct 2011
                • 10

                #8
                Thanks a lot, N311V

                Comment

                Latest Articles

                Collapse

                • SEQadmin2
                  Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                  by SEQadmin2


                  I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                  Here are nine questions we think about, in roughly the order they matter, before...
                  06-18-2026, 07:11 AM
                • SEQadmin2
                  From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                  by SEQadmin2


                  Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                  The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                  ...
                  06-02-2026, 10:05 AM

                ad_right_rmr

                Collapse

                News

                Collapse

                Topics Statistics Last Post
                Started by SEQadmin2, Today, 05:37 AM
                0 responses
                5 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-26-2026, 11:10 AM
                0 responses
                16 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-17-2026, 06:09 AM
                0 responses
                50 views
                0 reactions
                Last Post SEQadmin2  
                Started by SEQadmin2, 06-09-2026, 11:58 AM
                0 responses
                109 views
                0 reactions
                Last Post SEQadmin2  
                Working...