Unconfigured Ad

**d17** · 05-08-2011, 10:41 AM

What is probably happening is that the 2600 minus 850 SNPs in your sample (call it sample #1) that are only called in the multi-sample SNP calling run are SNPs that didn't have enough evidence to be called as SNPs in sample #1 alone, but that did show evidence of being SNPs in other samples. Seeing the site as a SNP in other samples affects the probability that it is called as a SNP in sample #1.

Intuitively, the situation is as follows: If we run SNP calling on sample #1 alone and see a site that has a modest amount of evidence that it is a SNP, it will probably not pass the filtering thresholds. If we run SNP calling on a bunch of samples and see that the same site has strong evidence of being a SNP in a different sample we can be more confident that the site is truly a SNP in sample #1. This is I believe one of the main advantages of multi-sample calling.

**newbietonextgen** · 05-08-2011, 02:18 PM

Thanks d17. I understand as you say the depth of coverage increases when you use multi-sample for a location, thus increasing the number of SNPS in the resulting VCF file. Now the question is which one is correct (single sample or multi sample (I know both are correct, but what would one use for ASE)?

**prisnirath** · 06-06-2011, 07:34 AM

Hi there,
I am using GATK to call SNPs from my sam files (from 454 data).
I am using he following pipeline::

SAM to BAM
samtools import BRCA1_coding.fasta out1FR_bwasw.sam new_out1FR_bwasw.bam

Sort BAM
samtools sort new_out1FR_bwasw.bam new_out1FR_bwasw.sorted

Index BAM
samtools index new_out1FR_bwasw.sorted.bam new_out1FR_bwasw.sorted.bam.bai

Identify target regions for realignment
java -jar ~/bin/GenomeAnalysisTK-1.0.5777/GenomeAnalysisTK.jar -T RealignerTargetCreator -R BRCA1_coding.fasta -I new_out1FR_bwasw.sorted.bam -o new_out1FR_bwasw.intervals

And I get a an interval file that has 4 locations which is a subset of the regions i identified using tablet viewer.
Following note from command line.

14:53:16,100 TraversalEngine - 0 reads were filtered out during traversal out of 565 total (0.00%)

Then,

Realign BAM to get better Indel calling
java -jar ~/bin/GenomeAnalysisTK-1.0.5777/GenomeAnalysisTK.jar -T IndelRealigner -R BRCA1_coding.fasta -I new_out1FR_bwasw.sorted.bam -targetIntervals new_out1FR_bwasw.intervals -o new_out1FR_bwasw.sorted.realigned.bam
Add or Replace read group
java -jar ~/bin/picard-tools-1.45/AddOrReplaceReadGroups.jar I= new_out1FR_bwasw.sorted.realigned.bam O= new_out1FR_bwasw_new.sorted.realigned.bam SORT_ORDER=coordinate RGID=foo RGLB=bar RGPL=illumina RGSM=DePristo RGPU= GGDP4G001BFFBZ CREATE_INDEX=True
Reindex the realigned BAM
java -jar ~/bin/picard-tools-1.45/ReorderSam.jar I=new_out1FR_bwasw_new.sorted.realigned.bam O= new_out1FR_bwasw.resorted.realigned.bam REFERENCE= BRCA1_coding.fasta
samtools index new_out1FR_bwasw.resorted.realigned.bam new_out1FR_bwasw.resorted.realigned.bam.bai
Call SNPs
java -jar ~/bin/GenomeAnalysisTK-1.0.5777/GenomeAnalysisTK.jar -T UnifiedGenotyper -R BRCA1_coding.fasta -I new_out1FR_bwasw.resorted.realigned.bam -o new_out1FR_bwasw.vcf.calls -stand_call_conf 30.0 -stand_emit_conf 10.0

And I am getting only 1 SNP called of a very low quality and in the region of read depth 1.
This region doesn't coincide with the intervals identified before.
Also, when I compare the SNP called with my results from VarScan, there is no similarity.

Can anyone please suggest how to improve SNP calling?
Or is GATK not suitable for SNP calling in long read data from 454?

Topics	Statistics	Last Post
New AI Model Captures Long-Range Genomic Signals to Improve RNA Splice Site Prediction by SEQadmin2 Started by SEQadmin2, Yesterday, 05:37 AM	0 responses 9 views 0 reactions	Last Post by SEQadmin2 Yesterday, 05:37 AM
Large-Scale Protein Screen Uncovers Hidden Regulators of Alternative Polyadenylation by SEQadmin2 Started by SEQadmin2, 06-26-2026, 11:10 AM	0 responses 18 views 0 reactions	Last Post by SEQadmin2 06-26-2026, 11:10 AM
Whole-Genome Sequencing Traces Faroe Islands Ancestry to a North Atlantic Founder Population by SEQadmin2 Started by SEQadmin2, 06-17-2026, 06:09 AM	0 responses 52 views 0 reactions	Last Post by SEQadmin2 06-17-2026, 06:09 AM
Sequencing the Two-Toed Sloth Genome Reveals Jumping Genes Tied to Its Extreme Metabolism by SEQadmin2 Started by SEQadmin2, 06-09-2026, 11:58 AM	0 responses 110 views 0 reactions	Last Post by SEQadmin2 06-09-2026, 11:58 AM

Unconfigured Ad

Question for GATK experts.....

Comment

Comment

Comment

Latest Articles

ad_right_rmr

News