I'm looking to use whole genomes we already have available as part of a GWAS with genotyping data from a SNP array.
How best can I extract the relevant SNPs from my whole genomes? I have a .bed file set up with all my SNP locations, and have been Googling like mad trying to find ways to do this. Am I best to extract these from my aligned .bam file using something like:
or
Or to just restrict the final combined vcf for the whole genomes with something like this:
I'm not sure what the advantages/disadvantages of doing it each stage would be, and how best I can get as close to a SNP array output as possible (ie with the variant present at each location).
Googling usually answers all my questions but not today!
How best can I extract the relevant SNPs from my whole genomes? I have a .bed file set up with all my SNP locations, and have been Googling like mad trying to find ways to do this. Am I best to extract these from my aligned .bam file using something like:
Code:
$ bedmap --echo --fraction-map 1 <(bam2bed < reads.bam) intervals.bed > answer.bed
Code:
$ samtools mpileup -ugf ref.fa -l intervals.bed sample1.bam | bcftools call -vmO z -o answer.vcf.gz
Code:
$ java -jar GenomeAnalysisTK.jar -T SelectVariants -R ref.fa -V all_samples.vcf -L intervals.bed -sn Sample1 -sn Sample2 -sn Sample3
Googling usually answers all my questions but not today!
Comment