Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • d_chall
    Junior Member
    • Jan 2010
    • 3

    Separate multi-allelic VCF lines to multiple rows

    The latest VCF formats (4.1+) allow for a single loci to cover multiple rows in the file when there are multiple alleles. The old standard of specifying multiple alleles in the same line is also valid. Unfortunately some analysis requires one standard and some the other. Are there any tools/scripts available which can take a VCF file with multiple alleles on one line and split them out to separate lines including the genotypes in the sample columns?
    Thanks!
  • mebbert
    Junior Member
    • Jul 2012
    • 7

    #2
    Did you find a tool for this? I'm looking too.

    Comment

    • d_chall
      Junior Member
      • Jan 2010
      • 3

      #3
      No I didn't, I ended having to write my own custom ruby script to do it.

      Comment

      • ekg
        Member
        • Apr 2010
        • 36

        #4
        I wrote something in C++ (https://github.com/ekg/vcflib/blob/m...eakmulti.cpp):

        % vcfbreakmulti --help
        usage: vcfbreakmulti [options] [file]

        If multiple alleles are specified in a single record, break the record into
        multiple lines, preserving allele-specific INFO fields.

        Comment

        • Elmaffo
          Junior Member
          • Nov 2011
          • 3

          #5
          @ekg

          I tried to compile vcflibs but I got some errors. Below the output of the make command, sorry it is in Italian but I can repeat with english language if needed.

          bw
          Andrea

          ---------------------------

          elmaffo@arc-HP8200i7 ~/Scaricati/vcflib $ make
          cd tabixpp && make
          make[1]: ingresso nella directory "/home/elmaffo/Scaricati/vcflib/tabixpp"
          make[2]: ingresso nella directory "/home/elmaffo/Scaricati/vcflib/tabixpp"
          gcc -c -g -Wall -O2 -fPIC -D_FILE_OFFSET_BITS=64 -D_USE_KNETFILE bgzf.c -o bgzf.o
          bgzf.c: In function ‘bgzf_close’:
          bgzf.c:630:8: warning: variable ‘count’ set but not used [-Wunused-but-set-variable]
          gcc -c -g -Wall -O2 -fPIC -D_FILE_OFFSET_BITS=64 -D_USE_KNETFILE kstring.c -o kstring.o
          gcc -c -g -Wall -O2 -fPIC -D_FILE_OFFSET_BITS=64 -D_USE_KNETFILE knetfile.c -o knetfile.o
          knetfile.c: In function ‘khttp_connect_file’:
          knetfile.c:418:2: warning: ignoring return value of ‘write’, declared with attribute warn_unused_result [-Wunused-result]
          knetfile.c: In function ‘kftp_send_cmd’:
          knetfile.c:239:2: warning: ignoring return value of ‘write’, declared with attribute warn_unused_result [-Wunused-result]
          gcc -c -g -Wall -O2 -fPIC -D_FILE_OFFSET_BITS=64 -D_USE_KNETFILE index.c -o index.o
          gcc -c -g -Wall -O2 -fPIC -D_FILE_OFFSET_BITS=64 -D_USE_KNETFILE bedidx.c -o bedidx.o
          ar -cru libtabix.a bgzf.o kstring.o knetfile.o index.o bedidx.o
          ranlib libtabix.a
          gcc -c -g -Wall -O2 -fPIC -D_FILE_OFFSET_BITS=64 -D_USE_KNETFILE main.c -o main.o
          gcc -g -Wall -O2 -fPIC -o tabix main.o -lm -lz -L. -ltabix
          ./libtabix.a(bgzf.o): nella funzione "deflate_block":
          /home/elmaffo/Scaricati/vcflib/tabixpp/bgzf.c:311: riferimento non definito a "deflate"
          /home/elmaffo/Scaricati/vcflib/tabixpp/bgzf.c:313: riferimento non definito a "deflateEnd"
          /home/elmaffo/Scaricati/vcflib/tabixpp/bgzf.c:305: riferimento non definito a "deflateInit2_"
          /home/elmaffo/Scaricati/vcflib/tabixpp/bgzf.c:329: riferimento non definito a "deflateEnd"
          /home/elmaffo/Scaricati/vcflib/tabixpp/bgzf.c:345: riferimento non definito a "crc32"
          /home/elmaffo/Scaricati/vcflib/tabixpp/bgzf.c:346: riferimento non definito a "crc32"
          ./libtabix.a(bgzf.o): nella funzione "inflate_block":
          /home/elmaffo/Scaricati/vcflib/tabixpp/bgzf.c:380: riferimento non definito a "inflateInit2_"
          /home/elmaffo/Scaricati/vcflib/tabixpp/bgzf.c:385: riferimento non definito a "inflate"
          /home/elmaffo/Scaricati/vcflib/tabixpp/bgzf.c:391: riferimento non definito a "inflateEnd"
          /home/elmaffo/Scaricati/vcflib/tabixpp/bgzf.c:387: riferimento non definito a "inflateEnd"
          ./libtabix.a(bedidx.o): nella funzione "ks_getuntil":
          /home/elmaffo/Scaricati/vcflib/tabixpp/bedidx.c:11: riferimento non definito a "gzread"
          ./libtabix.a(bedidx.o): nella funzione "bed_read":
          /home/elmaffo/Scaricati/vcflib/tabixpp/bedidx.c:103: riferimento non definito a "gzdopen"
          ./libtabix.a(bedidx.o): nella funzione "ks_getc":
          /home/elmaffo/Scaricati/vcflib/tabixpp/bedidx.c:11: riferimento non definito a "gzread"
          ./libtabix.a(bedidx.o): nella funzione "bed_read":
          /home/elmaffo/Scaricati/vcflib/tabixpp/bedidx.c:138: riferimento non definito a "gzclose"
          /home/elmaffo/Scaricati/vcflib/tabixpp/bedidx.c:103: riferimento non definito a "gzopen64"
          collect2: error: ld returned 1 exit status
          make[2]: *** [tabix] Errore 1
          make[2]: uscita dalla directory "/home/elmaffo/Scaricati/vcflib/tabixpp"
          make[1]: *** [all-recur] Errore 1
          make[1]: uscita dalla directory "/home/elmaffo/Scaricati/vcflib/tabixpp"
          make: *** [tabixpp/tabix.o] Errore 2

          Comment

          • ekg
            Member
            • Apr 2010
            • 36

            #6
            @Andrea

            Non ti preoccupare, parlo italiano.

            Mi sembra che si manca zlib: http://stackoverflow.com/questions/1...late-with-zlib

            Zlib e' installato nella tua sistema?

            Comment

            • Elmaffo
              Junior Member
              • Nov 2011
              • 3

              #7
              @ekg

              here is the list of Zlib-related packages on my Ubuntu 12.10 box:

              Installed: zlib1g, zlib1g-dev, zlib1g:i386
              not installed: zlib-bin, zlib-gst, zlib1g-dbg, zlibc,

              do I need anyone of the "not installed"?

              thanks
              Andrea

              Comment

              • Elmaffo
                Junior Member
                • Nov 2011
                • 3

                #8
                @ekg

                Found out the issue was related to tabixcpp, as specified by guillermo-carrasco in this thread (check out the last messages in the thread):



                I edited the Makefile in the tabixpp folder as suggested in the thread. Everything compiled.

                Bye

                Comment

                • d_chall
                  Junior Member
                  • Jan 2010
                  • 3

                  #9
                  Here you go

                  For anyone who is interested, I ended up writing a couple of scripts for splitting and merging multi-allelic lines.
                  They are available in the "utils" directory of the Atlas2 trunk.
                  http://sourceforge.net/projects/atlas2/

                  Comment

                  • mamons
                    Member
                    • Nov 2011
                    • 10

                    #10
                    Here is another small tool to do the same thing, written in python:

                    Simple vcf parser, based on PyVCF. Contribute to moonso/vcf_parser development by creating an account on GitHub.

                    Comment

                    Latest Articles

                    Collapse

                    • GATTACAT
                      Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                      by GATTACAT
                      Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
                      Yesterday, 11:43 AM
                    • SEQadmin2
                      Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                      by SEQadmin2


                      I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                      Here are nine questions we think about, in roughly the order they matter, before...
                      06-18-2026, 07:11 AM
                    • SEQadmin2
                      From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                      by SEQadmin2


                      Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                      The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                      ...
                      06-02-2026, 10:05 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by SEQadmin2, 06-30-2026, 05:37 AM
                    0 responses
                    9 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-26-2026, 11:10 AM
                    0 responses
                    18 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-17-2026, 06:09 AM
                    0 responses
                    52 views
                    0 reactions
                    Last Post SEQadmin2  
                    Started by SEQadmin2, 06-09-2026, 11:58 AM
                    0 responses
                    110 views
                    0 reactions
                    Last Post SEQadmin2  
                    Working...