Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • newbietonextgen
    Member
    • Nov 2010
    • 56

    Question for GATK experts.....

    I called SNPs using GATK. I have a question.

    If i call SNPs using a single bam file and i get a set of SNPs (850 SNPs).

    Now i called SNPs from multiple bam alignments (not merged bam files, but listing them in consecutive order to get a single VCF file), i get more SNPs for the same sample (2598 SNPs). How is this possible? What am i going wrong? I am using the same filter conditions etc.
  • d17
    Member
    • Sep 2008
    • 27

    #2
    What is probably happening is that the 2600 minus 850 SNPs in your sample (call it sample #1) that are only called in the multi-sample SNP calling run are SNPs that didn't have enough evidence to be called as SNPs in sample #1 alone, but that did show evidence of being SNPs in other samples. Seeing the site as a SNP in other samples affects the probability that it is called as a SNP in sample #1.

    Intuitively, the situation is as follows: If we run SNP calling on sample #1 alone and see a site that has a modest amount of evidence that it is a SNP, it will probably not pass the filtering thresholds. If we run SNP calling on a bunch of samples and see that the same site has strong evidence of being a SNP in a different sample we can be more confident that the site is truly a SNP in sample #1. This is I believe one of the main advantages of multi-sample calling.

    Comment

    • newbietonextgen
      Member
      • Nov 2010
      • 56

      #3
      Thanks d17. I understand as you say the depth of coverage increases when you use multi-sample for a location, thus increasing the number of SNPS in the resulting VCF file. Now the question is which one is correct (single sample or multi sample (I know both are correct, but what would one use for ASE)?

      Comment

      • prisnirath
        Member
        • May 2011
        • 19

        #4
        Hi there,
        I am using GATK to call SNPs from my sam files (from 454 data).
        I am using he following pipeline::
        SAM to BAM
        samtools import BRCA1_coding.fasta out1FR_bwasw.sam new_out1FR_bwasw.bam

        Sort BAM
        samtools sort new_out1FR_bwasw.bam new_out1FR_bwasw.sorted

        Index BAM
        samtools index new_out1FR_bwasw.sorted.bam new_out1FR_bwasw.sorted.bam.bai

        Identify target regions for realignment
        java -jar ~/bin/GenomeAnalysisTK-1.0.5777/GenomeAnalysisTK.jar -T RealignerTargetCreator -R BRCA1_coding.fasta -I new_out1FR_bwasw.sorted.bam -o new_out1FR_bwasw.intervals
        And I get a an interval file that has 4 locations which is a subset of the regions i identified using tablet viewer.
        Following note from command line.
        14:53:16,100 TraversalEngine - 0 reads were filtered out during traversal out of 565 total (0.00%)
        Then,
        Realign BAM to get better Indel calling
        java -jar ~/bin/GenomeAnalysisTK-1.0.5777/GenomeAnalysisTK.jar -T IndelRealigner -R BRCA1_coding.fasta -I new_out1FR_bwasw.sorted.bam -targetIntervals new_out1FR_bwasw.intervals -o new_out1FR_bwasw.sorted.realigned.bam
        Add or Replace read group
        java -jar ~/bin/picard-tools-1.45/AddOrReplaceReadGroups.jar I= new_out1FR_bwasw.sorted.realigned.bam O= new_out1FR_bwasw_new.sorted.realigned.bam SORT_ORDER=coordinate RGID=foo RGLB=bar RGPL=illumina RGSM=DePristo RGPU= GGDP4G001BFFBZ CREATE_INDEX=True
        Reindex the realigned BAM
        java -jar ~/bin/picard-tools-1.45/ReorderSam.jar I=new_out1FR_bwasw_new.sorted.realigned.bam O= new_out1FR_bwasw.resorted.realigned.bam REFERENCE= BRCA1_coding.fasta
        samtools index new_out1FR_bwasw.resorted.realigned.bam new_out1FR_bwasw.resorted.realigned.bam.bai
        Call SNPs
        java -jar ~/bin/GenomeAnalysisTK-1.0.5777/GenomeAnalysisTK.jar -T UnifiedGenotyper -R BRCA1_coding.fasta -I new_out1FR_bwasw.resorted.realigned.bam -o new_out1FR_bwasw.vcf.calls -stand_call_conf 30.0 -stand_emit_conf 10.0
        And I am getting only 1 SNP called of a very low quality and in the region of read depth 1.
        This region doesn't coincide with the intervals identified before.
        Also, when I compare the SNP called with my results from VarScan, there is no similarity.

        Can anyone please suggest how to improve SNP calling?
        Or is GATK not suitable for SNP calling in long read data from 454?

        Comment

        Latest Articles

        Collapse

        • GATTACAT
          Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
          by GATTACAT
          Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
          Today, 11:43 AM
        • SEQadmin2
          Nine Things a Sample Prep Scientist Thinks About Before Sequencing
          by SEQadmin2


          I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

          Here are nine questions we think about, in roughly the order they matter, before...
          06-18-2026, 07:11 AM
        • SEQadmin2
          From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
          by SEQadmin2


          Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


          The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
          ...
          06-02-2026, 10:05 AM

        ad_right_rmr

        Collapse

        News

        Collapse

        Topics Statistics Last Post
        Started by SEQadmin2, Yesterday, 05:37 AM
        0 responses
        9 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-26-2026, 11:10 AM
        0 responses
        18 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-17-2026, 06:09 AM
        0 responses
        52 views
        0 reactions
        Last Post SEQadmin2  
        Started by SEQadmin2, 06-09-2026, 11:58 AM
        0 responses
        110 views
        0 reactions
        Last Post SEQadmin2  
        Working...