I recently downloaded the reference (hg19) chromosome 10 fasta file from UCSC, and naively assumed that this reference would match the reference alleles in a vcf from 1000 genomes. For instance, if the vcf listed the reference allele as being a C at position 4, then the fasta file should have a C as the fourth base. This doesn't seem to be true - the first few million bases are all shifted by 9997 (yes, 9997). After about 5.7M bases or so, there is a long string of N's, and after that once again the vcf doesn't match the fasta. I tried trimming off all the N's in the fasta, but that didn't fix the problem.
Is there a fasta sequence out there that matches the vcf's provided by the 1000 Genomes? Is there a way of reliably mapping from one to the other?
Thanks!
Is there a fasta sequence out there that matches the vcf's provided by the 1000 Genomes? Is there a way of reliably mapping from one to the other?
Thanks!
Comment