Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • adrian
    Member
    • Oct 2009
    • 90

    intersect VCF files

    Hi:

    I like bedtools. but now GATK pipelines are spewing VCF files and I want to intersect 3 files.

    For instance I have 3 VCF files F1, F2 and F3.

    I want to intersect
    newfile = intersect(F1, F2)

    then I want to :

    newsecondfile = intersect(newfile, F3)

    How can I do this on VCF files. I tried vcftools, it is not handy with all the gz files and perl.

    I like something a BEDtools type.

    Any suggestions.

    thanks
    Adrian
  • swbarnes2
    Senior Member
    • May 2008
    • 910

    #2
    IntersectBed claims to accept .vcf files. I'm pretty sure I've used it myself to do just that.

    What I've also done is used mpileup in samtools to take in multiple .bams files together. The downside is that it doesn't keep all the information for each sample together, but it at least give you GT, PL and GQ values for each sample. You can filter by the GT or the PL to find SNPs that are or aren't in whatever combination of samples you want.

    Comment

    • n00c
      Member
      • Nov 2009
      • 12

      #3
      There is also vcfutils vcf-isec:

      Comment

      • NGSfan
        Senior Member
        • Apr 2009
        • 181

        #4
        Has anyone got vcf-isec to work?

        I bgzipped my vcfs and tabix'ed them..

        here I try it on the same vcf:

        vcf-isec -c 26530.snv.vcf.gz 26530.snv.vcf.gz

        but I get:
        Can't use string ("silent") as a HASH ref while "strict refs" in use at /net/home/leparc/bin/VCFtools/perl/Vcf.pm line 542.

        Also, why all the trouble with bgzipping and tabix indexing... it's a lot of hassle just to do something so simple.

        Comment

        • dwmohr
          Junior Member
          • Aug 2008
          • 6

          #5
          I've successfully run vcf-isec to compare two related individuals:

          vcf-isec -n +2 -f file1.vcf.gz file2.vcf.gz > file3.vcf.gz

          Comment

          • GW_OK
            Senior Member
            • Sep 2009
            • 411

            #6
            Have you tried vcftools?

            Comment

            • ekg
              Member
              • Apr 2010
              • 36

              #7
              There is also vcfintersect: https://github.com/ekg/vcflib#vcfintersect

              It works with both BED files and VCF files, and can generate inverse intersections (allowing you to find things that are not in one file).

              Comment

              • Nino
                Member
                • Mar 2013
                • 27

                #8
                Hello,

                Would anyone happen to know how to merge a set a vcf files where you have at least 20% to at most 90% of all candidates reported across all files into one new file?

                Thanks,
                Nino
                Last edited by Nino; 02-20-2014, 11:24 AM. Reason: forgot a word

                Comment

                • AdrianP
                  Senior Member
                  • Apr 2011
                  • 130

                  #9
                  I can also suggest R. Convert your VCF to tab files, and then intersect the positions where variants are called.

                  Comment

                  Latest Articles

                  Collapse

                  • SEQadmin2
                    Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                    by SEQadmin2


                    I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                    Here are nine questions we think about, in roughly the order they matter, before...
                    06-18-2026, 07:11 AM
                  • SEQadmin2
                    From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                    by SEQadmin2


                    Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                    The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                    ...
                    06-02-2026, 10:05 AM

                  ad_right_rmr

                  Collapse

                  News

                  Collapse

                  Topics Statistics Last Post
                  Started by SEQadmin2, Yesterday, 11:10 AM
                  0 responses
                  7 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-17-2026, 06:09 AM
                  0 responses
                  42 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-09-2026, 11:58 AM
                  0 responses
                  104 views
                  0 reactions
                  Last Post SEQadmin2  
                  Started by SEQadmin2, 06-05-2026, 10:09 AM
                  0 responses
                  125 views
                  0 reactions
                  Last Post SEQadmin2  
                  Working...