Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • yuanzhi
    Member
    • Aug 2010
    • 19

    1000 genomes VCF format?

    I am trying to figure out the SNP genotype from the 1000 genomes VCF format

    Code:
    #CHROM POS     ID        REF ALT    QUAL FILTER INFO                              FORMAT      NA00001        NA00002        NA00003
    20     14370   rs6054257 G      A       29   PASS   NS=3;DP=14;AF=0.5;DB;H2           GT:GQ:DP:HQ 0|0:48:1:51,51 1|0:48:8:51,51 1/1:43:5:.,.
    20     17330   .         T      A       3    q10    NS=3;DP=11;AF=0.017               GT:GQ:DP:HQ 0|0:49:3:58,50 0|1:3:5:65,3   0/0:41:3
    20     1110696 rs6040355 A      G,T     67   PASS   NS=2;DP=10;AF=0.333,0.667;AA=T;DB GT:GQ:DP:HQ 1|2:21:6:23,27 2|1:2:0:18,2   2/2:35:4
    Is the SNP genotype just the column of "ALT"?

    Thanks
  • quinlana
    Senior Member
    • Sep 2008
    • 119

    #2
    There should be two files: one with the sites of polymorphism (the one you show here) and another with the genotypes at said sites. Look for *.genotypes.* or something like that.

    Comment

    • yuanzhi
      Member
      • Aug 2010
      • 19

      #3
      Hi, Quinlana

      Thanks for your answer! I am trying to figure out if I used the wrong term. When I said "SNP genotype", I meant "SNP call". For example, if the reference is A/A and the SNP call is T/T, "T/T" is the "SNP genotype" that I am looking for.

      Your "genotype" is the genotyping SNP call (not the sequencing SNP call), right?

      I apologize for my lack of knowledge of these terms. I have been trying to look for a book or something like that which can tell me the correct definition of SNP call, SNP genotype, base call, polymorphisms, and etc.

      Thanks again

      Comment

      • Jose Blanca
        Member
        • Aug 2009
        • 70

        #4
        You have plenty of information about the format at the 1000genomes vcf page.

        Comment

        • laura
          Senior Member
          • Sep 2008
          • 151

          #5
          The ALT column defines the possible alternative alleles, columns 10->n define a specific individuals genotype

          Comment

          • laura
            Senior Member
            • Sep 2008
            • 151

            #6
            As an update there 1000genomes website has recently changed its backend and therefore url structure

            The spec can now be found

            http://www.1000genomes.org/wiki/Analysis/Variant Call Format/vcf-variant-call-format-version-40

            Comment

            • johnadam33
              Member
              • Oct 2010
              • 26

              #7
              I am new to this field and I am trying to figure out how t use this data in vcf format from myself. I am looking to open these files so that I can look for some SNPs at a given location. Do u all know how to access the wiki page. Do we need to login and so how to register?

              Comment

              • laura
                Senior Member
                • Sep 2008
                • 151

                #8


                These are public pages within the wiki

                you shouldn't need to log in to see the

                Most of the 1000 genomes wiki is a internal project tracking wiki so logins are not provided to people outside the project

                Comment

                • johnadam33
                  Member
                  • Oct 2010
                  • 26

                  #9
                  Thanks a lot Laura. That helps. I guess I have to do some ground work in order to access them.

                  Comment

                  • johnadam33
                    Member
                    • Oct 2010
                    • 26

                    #10
                    Very Urgent and IMP

                    Can anyone tell me why the human ref sequence is diff in 1000 genome browser when compared to sequence at NCBI,Ensembl, and USCS browsers. I am looking at variant call data, the chr seq location has different base in 1000 genome browser than other three (all same).
                    Thanks,

                    Comment

                    • laura
                      Senior Member
                      • Sep 2008
                      • 151

                      #11
                      The pilot analysis for 1000 genomes was done using the NCBI36 assembly but the browsers are all now using the GRCh37 assembly which leads to different coordinates

                      The 1000 genomes main project uses GRCh37 and there are snps available from the ftp site for these but the browser has yet to be updated

                      Comment

                      • johnadam33
                        Member
                        • Oct 2010
                        • 26

                        #12
                        Thanks for the reply Lauara.
                        So how much difference is there? If I want to see for a location say chr1:10041132 (on GRCh37) with that of ncbi36 build, what should I do?

                        Comment

                        • laura
                          Senior Member
                          • Sep 2008
                          • 151

                          #13
                          For variants it is safest to use rs numbers which dbSNP track from one assembly to another.

                          To map specific positions though ensembl provides a tool

                          Comment

                          • vyellapa
                            Member
                            • Oct 2011
                            • 59

                            #14
                            Im looking for a link of genotypes vcf of the latest 1000 genome release and the corresponding reference. I can see one at the link below but it seems to the older release (629 individuals and vcf 4.0). Can I know where I can find the same file in the newer release.

                            ftp://ftp.1000genomes.ebi.ac.uk/vol1...notypes.vcf.gz

                            Thank you,
                            Teja

                            Comment

                            • laura
                              Senior Member
                              • Sep 2008
                              • 151

                              #15
                              The final call set for Phase 1 (1092 individuals) you can find here:

                              ftp://ftp.1000genomes.ebi.ac.uk/vol1...ted_call_sets/

                              Please have a look at the README:

                              ftp://ftp.1000genomes.ebi.ac.uk/vol1...l_set_20120621

                              All calls are relative to the GRCh37 / hg19 genome assembly.

                              Comment

                              Latest Articles

                              Collapse

                              • GATTACAT
                                Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by GATTACAT
                                Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                                07-01-2026, 11:43 AM
                              • SEQadmin2
                                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                                by SEQadmin2


                                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                                Here are nine questions we think about, in roughly the order they matter, before...
                                06-18-2026, 07:11 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, 07-02-2026, 11:08 AM
                              0 responses
                              7 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-30-2026, 05:37 AM
                              0 responses
                              12 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-26-2026, 11:10 AM
                              0 responses
                              20 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-17-2026, 06:09 AM
                              0 responses
                              54 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...