Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • ashkot
    Member
    • Nov 2011
    • 59

    Interpreting 1000 Genomes data

    Hi all, i generated a vcf file from 1000 Genomes data using samtools etc and then annotated it with annovar. Following are a few lines from the annotated file. I am trying to answer the question if this person has a certain SNP or not. For e.g. the first line states bases as T and C and het. Does this mean the person is TC for that location and thus het in the next coulmn. If that is true then on the second line we have AG but it states hom, how could this be true. Alternatively does this mean that only het/ hom should be used for interpretation and T and C are the reference and risk alleles. If this is true then if someone if hom how can i find out the genotype because hom could mean AA or GG.

    snp132 rs9276002 6 32694540 32694540 T C het 6.98 4 37
    snp132 rs9276003 6 32694550 32694550 A G hom 105 5 37
    snp132 rs9276004 6 32694564 32694564 C T hom 110 5 37
    snp132 rs9276005 6 32694567 32694567 T C hom 110 5 37
    snp132 rs9276006 6 32694582 32694582 G C hom 88.5 4 37
    snp132 rs9276007 6 32694604 32694604 A G hom 16.9 2 37
    snp132 rs9276008 6 32694633 32694633 C T hom 17.8 2 37
    snp132 rs9276009 6 32694641 32694641 T C hom 13.9 2 37
    snp132 rs9276010 6 32694686 32694686 A T het 8.65 3 37
    snp132 rs9276013 6 32694724 32694724 T A hom 16.9 2 37
    snp132 rs9276015 6 32694759 32694759 C G hom 10.2 2 37
    snp132 rs9276017 6 32695022 32695022 T C hom 34 4 60

    Appreciate any help.

    Thank you.
  • laura
    Senior Member
    • Sep 2008
    • 151

    #2
    It looks like the output you are reading gives you the reference then the alternative allele not the genotype

    Which individual are you looking at?

    Comment

    • ashkot
      Member
      • Nov 2011
      • 59

      #3
      I am looking at HG00096, the very first one in the ftp list. In the meantime I also looked at another sample I had and this sample does have the genotypes, see following lines.

      11 219398 . G A 45 . DP=9;AF1=0.5;AC1=1;DP4=3,2,4,0;MQ=43;FQ=47.9;PV4=0.44,1,0.097,1 GT:PL:GQ 0/1:75,0,92:78
      11 219452 . C G 72.3 . DP=5;AF1=1;AC1=2;DP4=0,0,4,1;MQ=46;FQ=-42 GT:PL:GQ 1/1:105,15,0:27
      11 220401 . C T 36 . DP=7;AF1=0.5;AC1=1;DP4=2,2,2,1;MQ=45;FQ=39;PV4=1,1,1,1 GT:PL:GQ 0/1:66,0,88:69
      11 220919 . T C 26 . DP=9;AF1=0.5;AC1=1;DP4=2,2,2,3;MQ=24;FQ=19.5;PV4=1,0.095,1,1 GT:PL:GQ 0/1:56,0,47:50
      11 221195 . T C 6.98 . DP=8;AF1=0.4999;AC1=1;DP4=0,5,2,1;MQ=35;FQ=9.53;PV4=0.11,0.46,1,1 GT:PL:GQ 0/1:36,0,77:37
      11 221322 . G T 26 . DP=8;AF1=0.5;AC1=1;DP4=1,3,2,1;MQ=51;FQ=28.8;PV4=0.49,1,1,0.31 GT:PL:GQ 0/1:56,0,69:59
      11 222620 . T C 23 . DP=7;AF1=0.5;AC1=1;DP4=2,2,3,0;MQ=41;FQ=26;PV4=0.43,1,0.22,0.35 GT:PL:GQ 0/1:53,0,82:56
      11 223119 . T C 40 . DP=3;AF1=1;AC1=2;DP4=0,0,2,1;MQ=37;FQ=-36 GT:PL:GQ 1/1:72,9,0:16
      11 225466 . T C 30 . DP=6;AF1=0.5;AC1=1;DP4=2,1,1,2;MQ=53;FQ=32.6;PV4=1,0.1,0.058,1 GT:PL:GQ 0/1:60,0,70:63
      11 230135 . T C 16.1 . DP=4;AF1=1;AC1=2;DP4=0,0,3,0;MQ=24;FQ=-36 GT:PL:GQ 1/1:48,9,0:15
      11 230368 . C T 20 . DP=6;AF1=0.5;AC1=1;DP4=2,1,2,1;MQ=46;FQ=22.8;PV4=1,0.089,1,0.44 GT:PL:GQ 0/1:50,0,63:53
      11 230751 . A G 6.21 . DP=3;AF1=0.5019;AC1=1;DP4=1,0,0,2;MQ=46;FQ=-7.1;PV4=0.33,0.35,1,0.066 GT:PL:GQ


      There is not a single location where the genotype is 0/0 i,e. REF/REF across the entire file.

      Can you please let me know if I may have missed something.

      Thanks,

      Comment

      • laura
        Senior Member
        • Sep 2008
        • 151

        #4
        I would check for certain that you are passing annovar the individual you think you are and then I would contact the annovar developers to point out a possible bug

        Comment

        • ashkot
          Member
          • Nov 2011
          • 59

          #5
          the data shown above is from the vcf file BEFORE it is input into annovar. I need some hel understanding genotype info from the 1K Genomes files. Is there any place i can look at?

          Comment

          • laura
            Senior Member
            • Sep 2008
            • 151

            #6
            There is general documentation about vcf files here

            1000genomes.org is your first and best source for all of the information you’re looking for. From general topics to more of what you would expect to find here, 1000genomes.org has it all. We hope you find what you are searching for!


            The vcftools community has a couple of mailing lists which you might find helpful

            Comment

            Latest Articles

            Collapse

            • GATTACAT
              Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
              by GATTACAT
              Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
              Yesterday, 11:43 AM
            • SEQadmin2
              Nine Things a Sample Prep Scientist Thinks About Before Sequencing
              by SEQadmin2


              I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

              Here are nine questions we think about, in roughly the order they matter, before...
              06-18-2026, 07:11 AM
            • SEQadmin2
              From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
              by SEQadmin2


              Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


              The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
              ...
              06-02-2026, 10:05 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, 06-30-2026, 05:37 AM
            0 responses
            9 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-26-2026, 11:10 AM
            0 responses
            18 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-17-2026, 06:09 AM
            0 responses
            52 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-09-2026, 11:58 AM
            0 responses
            110 views
            0 reactions
            Last Post SEQadmin2  
            Working...