Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • intersect VCF files

    Hi:

    I like bedtools. but now GATK pipelines are spewing VCF files and I want to intersect 3 files.

    For instance I have 3 VCF files F1, F2 and F3.

    I want to intersect
    newfile = intersect(F1, F2)

    then I want to :

    newsecondfile = intersect(newfile, F3)

    How can I do this on VCF files. I tried vcftools, it is not handy with all the gz files and perl.

    I like something a BEDtools type.

    Any suggestions.

    thanks
    Adrian

  • #2
    IntersectBed claims to accept .vcf files. I'm pretty sure I've used it myself to do just that.

    What I've also done is used mpileup in samtools to take in multiple .bams files together. The downside is that it doesn't keep all the information for each sample together, but it at least give you GT, PL and GQ values for each sample. You can filter by the GT or the PL to find SNPs that are or aren't in whatever combination of samples you want.

    Comment


    • #3
      There is also vcfutils vcf-isec:

      Comment


      • #4
        Has anyone got vcf-isec to work?

        I bgzipped my vcfs and tabix'ed them..

        here I try it on the same vcf:

        vcf-isec -c 26530.snv.vcf.gz 26530.snv.vcf.gz

        but I get:
        Can't use string ("silent") as a HASH ref while "strict refs" in use at /net/home/leparc/bin/VCFtools/perl/Vcf.pm line 542.

        Also, why all the trouble with bgzipping and tabix indexing... it's a lot of hassle just to do something so simple.

        Comment


        • #5
          I've successfully run vcf-isec to compare two related individuals:

          vcf-isec -n +2 -f file1.vcf.gz file2.vcf.gz > file3.vcf.gz

          Comment


          • #6
            Have you tried vcftools?

            Comment


            • #7
              There is also vcfintersect: https://github.com/ekg/vcflib#vcfintersect

              It works with both BED files and VCF files, and can generate inverse intersections (allowing you to find things that are not in one file).

              Comment


              • #8
                Hello,

                Would anyone happen to know how to merge a set a vcf files where you have at least 20% to at most 90% of all candidates reported across all files into one new file?

                Thanks,
                Nino
                Last edited by Nino; 02-20-2014, 11:24 AM. Reason: forgot a word

                Comment


                • #9
                  I can also suggest R. Convert your VCF to tab files, and then intersect the positions where variants are called.

                  Comment

                  Latest Articles

                  Collapse

                  • seqadmin
                    Essential Discoveries and Tools in Epitranscriptomics
                    by seqadmin




                    The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                    Yesterday, 07:01 AM
                  • seqadmin
                    Current Approaches to Protein Sequencing
                    by seqadmin


                    Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                    04-04-2024, 04:25 PM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by seqadmin, 04-11-2024, 12:08 PM
                  0 responses
                  56 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 10:19 PM
                  0 responses
                  52 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-10-2024, 09:21 AM
                  0 responses
                  45 views
                  0 likes
                  Last Post seqadmin  
                  Started by seqadmin, 04-04-2024, 09:00 AM
                  0 responses
                  55 views
                  0 likes
                  Last Post seqadmin  
                  Working...
                  X