Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • csoong
    Member
    • Jun 2009
    • 74

    dbSNP132 bug (human)

    Has anyone noticed the VCF version of dbSNP132 bug on reporting indels?
    the called location doesn't match the A/T/C/G letter sequence on the genome. However, SNP calls were fine. I thought I threw this out to see if anyone knows if there's a fixed version or if dbSNP is on its way fixing the bug. Thanks.
  • csoong
    Member
    • Jun 2009
    • 74

    #2
    the above posst may sound a little vague...
    so here is a record from the VCF file released by dbSNP for ver.132

    1 121071 rs60992425 T TTAC . . dbSNPBuildID=129;VP=050000000009000000000200;WGT=1;VC=INDEL;CFL


    If you go to the reference, location chr1: 121071 is A, not T.
    This is not a 0/1-base issue since the snp records (single base variations) are correctly annotated.
    Any thought?

    Comment

    • Richard Finney
      Senior Member
      • Feb 2009
      • 701

      #3
      It's annotated as -/TAC in UCSC tables.
      What's the URL for the VCF file?

      Comment

      • csoong
        Member
        • Jun 2009
        • 74

        #4
        followed the follwoing URL:

        Comment

        • Richard Finney
          Senior Member
          • Feb 2009
          • 701

          #5
          Looks like you ran into the first INDEL in the file. Indels are different than other "SNPS".

          See documentation:

          From VCF version 4.0 documentation at 1000genomes.org

          Section 3. Data lines, "Fixed fields". subsection 4. REF reference base(s):

          # REF reference base(s): Each base must be one of A,C,G,T,N. Bases should be in uppercase. Multiple bases are permitted. The value in the POS field refers to the position of the first base in the String. For InDels, the reference String must include the base before the event (which must be reflected in the POS field). (String, Required).

          Comment

          • csoong
            Member
            • Jun 2009
            • 74

            #6
            Yes. That's why the VCF record is incorrect since the POS field was A while the reference String did not include it, instead it read T. Right?

            Comment

            • Richard Finney
              Senior Member
              • Feb 2009
              • 701

              #7
              I think the VCF is right. It's the base before the insert, in this case it's a T.

              Comment

              • csoong
                Member
                • Jun 2009
                • 74

                #8
                I can't agree.

                "The value in the POS field refers to the position of the first base in the String:"
                In the record the STRING's first base is not the base at position POS.

                "For InDels, the reference String must include the base before the event (which must be reflected in the POS field)"
                again, the reference before the event should be included. At the same time the previous rule should not be defied. since POS refered to A but the STRING's first base is something else.

                Comment

                • csoong
                  Member
                  • Jun 2009
                  • 74

                  #9
                  The reason I am bringing this out is basically because calling indels could be tricky. Therefore , the field is emerging a consensus to call indels at its left most position possible. That is what programs like DINDEL is doing and what dbSNP132 intended to follow (since it published in VCF 4.0). But the calling between dbSNP132 and DINDEL is different ( I've run DINDEL on some of the indel records, and it correctly, I assume, prints the record with the correct position).

                  Comment

                  • mard
                    Member
                    • Jan 2010
                    • 21

                    #10
                    I've just run into this issue now and found your post csoong. For SNVs the REF base in my sample's vcf file match what is in the dbSNP132 vcf but the indels seem to be 1 base off (for the ones I've checked so far) e.g. from the dbSNP132 file REF is given as CC:
                    10 101478000 rs68137778 C CC . . dbSNPBuildID=130;VP=050000080001000000000200;WGT=1;VC=INDEL;INT

                    but for my sample it's TC (called used GATK)
                    10_101478000 . T TC

                    And if you check that position in both Ensembl and UCSC genome browsers it's also TC.

                    Did you find out anything more about this issue csoong?

                    Comment

                    • csoong
                      Member
                      • Jun 2009
                      • 74

                      #11
                      Luckily, dbSNP just released a fixed version in march (after I gone through all the hassle to get around the problem). They addressed the issue in the march release. I checked a few indel entries and they seemed fixed. Let me know what you find. Thanks.

                      Comment

                      • mard
                        Member
                        • Jan 2010
                        • 21

                        #12
                        Ok, thanks for the info.

                        Where did you get the fixed vcf file from?
                        I followed your link above but it brings me to the same 00-All.vcf.gz file I already have (dated Nov 2010):
                        ftp://ftp.ncbi.nih.gov/snp/organisms...9606/VCF/v4.0/

                        Comment

                        Latest Articles

                        Collapse

                        • seqadmin
                          New Genomics Tools and Methods Shared at AGBT 2025
                          by seqadmin


                          This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

                          The Headliner
                          The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
                          03-03-2025, 01:39 PM
                        • seqadmin
                          Investigating the Gut Microbiome Through Diet and Spatial Biology
                          by seqadmin




                          The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
                          02-24-2025, 06:31 AM

                        ad_right_rmr

                        Collapse

                        News

                        Collapse

                        Topics Statistics Last Post
                        Started by seqadmin, 03-20-2025, 05:03 AM
                        0 responses
                        17 views
                        0 reactions
                        Last Post seqadmin  
                        Started by seqadmin, 03-19-2025, 07:27 AM
                        0 responses
                        18 views
                        0 reactions
                        Last Post seqadmin  
                        Started by seqadmin, 03-18-2025, 12:50 PM
                        0 responses
                        19 views
                        0 reactions
                        Last Post seqadmin  
                        Started by seqadmin, 03-03-2025, 01:15 PM
                        0 responses
                        185 views
                        0 reactions
                        Last Post seqadmin  
                        Working...