Seqanswers Leaderboard Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • swNGS
    Member
    • Nov 2011
    • 83

    Ideas how to annotate vcf with local variant freq info?

    I'm hoping for some inspiration on something that I suspect is not too tricky, but I would like some pointers.

    I have a targeted sequencing panel for a particular phenotypic group of disorders in humans, which is run on a regular basis by our group. For each sample, we then filter a functionally annotated variant dataset (in excel, it's a small amount of variants numbering in the 100's per sample). *I'm using Annovar for the annotation part.

    What I want to do is annotate either the original vcf per sample, or the Annovar annotated datasets variant list with the frequency that a given variant has been identified by our group FROM ALL PREVIOUSLY SEQUENCED SAMPLES. I would envisage that this could be achieved by either holding all variants locally in some sort of database structure and somehow using that to annotate the pre-annovar vcf, or somehow recursively loop through all vcfs for previous samples to generate allele frequency info and then use this to annotate the pre-Annovar vcf.

    Any ideas would be appreciated !

    Thanks
  • jhersheson
    Junior Member
    • May 2013
    • 3

    #2
    I am trying to do something similar with our exome files to filter out sequencing artifacts. It should be pretty easy with your target seq data.
    What I would do is generate a unique signature for each variant - something like Chr_Star_Ref_Obs in the annovar annotated file. Then create a concatenated list of all of these signatures and use awk (or excel) to report the count of each variant in that list and hence the frequency. You can then just keep the unique signatures and use that list to annotate the annovar files with the variant freq in your dataset.

    If anyone has any suggestions for doing something similar with exome files I'm all ears!

    Comment

    • swNGS
      Member
      • Nov 2011
      • 83

      #3
      That sounds promising, but I think the more long term option is to database all variant results, at this stage I think this would have to be just the bare necessities in the vcf, ie chr pos ref alt +sampleID, so that in addition to potentially being able to annotate a 'new to the assay' vcf with an additional column of something that would say eg 2/150 (seen 2x out of 150 samples), I could then potentially figure out which other samples it had featured in.

      I'm inclined to not work with annotated data to store variant frequencies since the format/content/application used to annotated will/may change over time.

      I think the solution could be the same for exome since its essentially just another flavour of targeted sequencing, albeit a bit bigger than the the 250kb I'm dealing with !

      Comment

      • jhersheson
        Junior Member
        • May 2013
        • 3

        #4
        My grasp of scripting and databases is pretty limited so I tend to go for the easiest although not always the most efficient method. If you keep one file with all of your sample data in, you should be able to just use filters in excel to show you which samples the variants are in. I guess it depends on how many variants are in each file, excel is fine for about 1million rows but you could always filter the text file in the command line if it was any bigger.

        Re using annotated data, there isnt really much difference between using the annovar file and the vcf file as you are only going to be using the chr_start_ref_obs fields which are the same in both files. Annovar just uses the variant calls found within the vcf file albeit in a different format as input
        Last edited by jhersheson; 05-18-2013, 01:30 PM.

        Comment

        • Nino
          Member
          • Mar 2013
          • 27

          #5
          I seem to have trouble posting a new thread so since this the most relevant place for me to post my question. I am trying to create a vcf file (obviously in vcf format) of the spop gene in human from hg19. If anyone has some insight on how to do this and how to create a thread on seqanswers please let me know.

          Thanks,
          Nino

          Comment

          • GenoMax
            Senior Member
            • Feb 2008
            • 7142

            #6
            Originally posted by Nino View Post
            If anyone has some insight on how to do this and how to create a thread on seqanswers please let me know.

            Thanks,
            Nino
            Select "Forums" link (on left) from main SEQanswers page.
            Choose the correct "Forum" you want to post in and click on the name (e.g. General).
            At top left on the Forum page, choose "New Thread".

            Comment

            Latest Articles

            Collapse

            • seqadmin
              New Genomics Tools and Methods Shared at AGBT 2025
              by seqadmin


              This year’s Advances in Genome Biology and Technology (AGBT) General Meeting commemorated the 25th anniversary of the event at its original venue on Marco Island, Florida. While this year’s event didn’t include high-profile musical performances, the industry announcements and cutting-edge research still drew the attention of leading scientists.

              The Headliner
              The biggest announcement was Roche stepping back into the sequencing platform market. In the years since...
              03-03-2025, 01:39 PM
            • seqadmin
              Investigating the Gut Microbiome Through Diet and Spatial Biology
              by seqadmin




              The human gut contains trillions of microorganisms that impact digestion, immune functions, and overall health1. Despite major breakthroughs, we’re only beginning to understand the full extent of the microbiome’s influence on health and disease. Advances in next-generation sequencing and spatial biology have opened new windows into this complex environment, yet many questions remain. This article highlights two recent studies exploring how diet influences microbial...
              02-24-2025, 06:31 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, 03-20-2025, 05:03 AM
            0 responses
            17 views
            0 reactions
            Last Post seqadmin  
            Started by seqadmin, 03-19-2025, 07:27 AM
            0 responses
            18 views
            0 reactions
            Last Post seqadmin  
            Started by seqadmin, 03-18-2025, 12:50 PM
            0 responses
            19 views
            0 reactions
            Last Post seqadmin  
            Started by seqadmin, 03-03-2025, 01:15 PM
            0 responses
            185 views
            0 reactions
            Last Post seqadmin  
            Working...