I'd like to introduce BAM-matcher (https://bitbucket.org/sacgf/bam-matcher), a simple tool for determining whether two BAM files contain reads sequenced from the same sample or patient by counting genotype matches at common SNPs.
We wrote this tool for our sequencing facility to look for mislabelled samples. It checks whether two BAM files came from the same patient/individual by comparing genotypes at sites with global minor allele frequencies ~0.5 (using 1000 Genomes Project data).
It works best when there are multiple samples from the same patient/individual, and bypasses the need for independent SNP array data.
The tool is very simple to use and is very fast (~2 minutes per sample pair, but ~1 second with cached data). For genotype calling, it use external variant callers (at the moment supports GATK, Freebayes, or VarScan2).
Download here: https://bitbucket.org/sacgf/bam-matcher
We wrote this tool for our sequencing facility to look for mislabelled samples. It checks whether two BAM files came from the same patient/individual by comparing genotypes at sites with global minor allele frequencies ~0.5 (using 1000 Genomes Project data).
It works best when there are multiple samples from the same patient/individual, and bypasses the need for independent SNP array data.
The tool is very simple to use and is very fast (~2 minutes per sample pair, but ~1 second with cached data). For genotype calling, it use external variant callers (at the moment supports GATK, Freebayes, or VarScan2).
Download here: https://bitbucket.org/sacgf/bam-matcher
Comment