Hello,
I'm trying to figure out the relation between overall allele frequency and the reported allele frequency per ethnic group (i.e., AFR, EUR, ASN) in the 1000 genome project.
I'm using the following frequency files downloaded from ftp://ftp.1000genomes.ebi.ac.uk/vol1...lease/20100804 (or the 'supporting' sub directory under that path).
ALL.2of4intersection.20100804.sites.vcf.gz
AFR.2of4intersection_allele_freq.20100804.sites.vcf.gz
ASN.2of4intersection_allele_freq.20100804.sites.vcf.gz
EUR.2of4intersection_allele_freq.20100804.sites.vcf.gz
There are several issues with the reported frequencies in these file that do not make much sense.
Below are some examples.
1. The frequencies for rs112872773 are: AF=0.120, AF_AFR=0.120, AF_ASN=0.120, AF_EUR=0.120 where each frequency is taken from the corresponding file. It is highly unlikely that the allele frequency is exactly the same across all groups.
2. For rs117704637, the frequencies are: AF=0.005, AF_AFR=0.00, AF_ASN=-1, AF_EUR=0.01, where '-1' indicates that the rsID does not appear in the file. Note that for AF_AFR the rsID does appear but with a frequency of 0.00. So, is there a reason for reporting zero frequency rather than not reporting anything?
3. For rs9269941 the frequencies are: AF=0.121, AF_AFR=0.06, AF_ASN=0.03, AF_EUR=0.06. Here there is a reported frequency for each group but the overall frequency is higher than any of the group ones.
I would appreciate any help in clarifying these issues or any other ways I could get that data.
Thank you very much,
Izhar
I'm trying to figure out the relation between overall allele frequency and the reported allele frequency per ethnic group (i.e., AFR, EUR, ASN) in the 1000 genome project.
I'm using the following frequency files downloaded from ftp://ftp.1000genomes.ebi.ac.uk/vol1...lease/20100804 (or the 'supporting' sub directory under that path).
ALL.2of4intersection.20100804.sites.vcf.gz
AFR.2of4intersection_allele_freq.20100804.sites.vcf.gz
ASN.2of4intersection_allele_freq.20100804.sites.vcf.gz
EUR.2of4intersection_allele_freq.20100804.sites.vcf.gz
There are several issues with the reported frequencies in these file that do not make much sense.
Below are some examples.
1. The frequencies for rs112872773 are: AF=0.120, AF_AFR=0.120, AF_ASN=0.120, AF_EUR=0.120 where each frequency is taken from the corresponding file. It is highly unlikely that the allele frequency is exactly the same across all groups.
2. For rs117704637, the frequencies are: AF=0.005, AF_AFR=0.00, AF_ASN=-1, AF_EUR=0.01, where '-1' indicates that the rsID does not appear in the file. Note that for AF_AFR the rsID does appear but with a frequency of 0.00. So, is there a reason for reporting zero frequency rather than not reporting anything?
3. For rs9269941 the frequencies are: AF=0.121, AF_AFR=0.06, AF_ASN=0.03, AF_EUR=0.06. Here there is a reported frequency for each group but the overall frequency is higher than any of the group ones.
I would appreciate any help in clarifying these issues or any other ways I could get that data.
Thank you very much,
Izhar
Comment