Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Separate multi-allelic VCF lines to multiple rows

    The latest VCF formats (4.1+) allow for a single loci to cover multiple rows in the file when there are multiple alleles. The old standard of specifying multiple alleles in the same line is also valid. Unfortunately some analysis requires one standard and some the other. Are there any tools/scripts available which can take a VCF file with multiple alleles on one line and split them out to separate lines including the genotypes in the sample columns?
    Thanks!

  • #2
    Did you find a tool for this? I'm looking too.

    Comment


    • #3
      No I didn't, I ended having to write my own custom ruby script to do it.

      Comment


      • #4
        I wrote something in C++ (https://github.com/ekg/vcflib/blob/m...eakmulti.cpp):

        % vcfbreakmulti --help
        usage: vcfbreakmulti [options] [file]

        If multiple alleles are specified in a single record, break the record into
        multiple lines, preserving allele-specific INFO fields.

        Comment


        • #5
          @ekg

          I tried to compile vcflibs but I got some errors. Below the output of the make command, sorry it is in Italian but I can repeat with english language if needed.

          bw
          Andrea

          ---------------------------

          elmaffo@arc-HP8200i7 ~/Scaricati/vcflib $ make
          cd tabixpp && make
          make[1]: ingresso nella directory "/home/elmaffo/Scaricati/vcflib/tabixpp"
          make[2]: ingresso nella directory "/home/elmaffo/Scaricati/vcflib/tabixpp"
          gcc -c -g -Wall -O2 -fPIC -D_FILE_OFFSET_BITS=64 -D_USE_KNETFILE bgzf.c -o bgzf.o
          bgzf.c: In function ‘bgzf_close’:
          bgzf.c:630:8: warning: variable ‘count’ set but not used [-Wunused-but-set-variable]
          gcc -c -g -Wall -O2 -fPIC -D_FILE_OFFSET_BITS=64 -D_USE_KNETFILE kstring.c -o kstring.o
          gcc -c -g -Wall -O2 -fPIC -D_FILE_OFFSET_BITS=64 -D_USE_KNETFILE knetfile.c -o knetfile.o
          knetfile.c: In function ‘khttp_connect_file’:
          knetfile.c:418:2: warning: ignoring return value of ‘write’, declared with attribute warn_unused_result [-Wunused-result]
          knetfile.c: In function ‘kftp_send_cmd’:
          knetfile.c:239:2: warning: ignoring return value of ‘write’, declared with attribute warn_unused_result [-Wunused-result]
          gcc -c -g -Wall -O2 -fPIC -D_FILE_OFFSET_BITS=64 -D_USE_KNETFILE index.c -o index.o
          gcc -c -g -Wall -O2 -fPIC -D_FILE_OFFSET_BITS=64 -D_USE_KNETFILE bedidx.c -o bedidx.o
          ar -cru libtabix.a bgzf.o kstring.o knetfile.o index.o bedidx.o
          ranlib libtabix.a
          gcc -c -g -Wall -O2 -fPIC -D_FILE_OFFSET_BITS=64 -D_USE_KNETFILE main.c -o main.o
          gcc -g -Wall -O2 -fPIC -o tabix main.o -lm -lz -L. -ltabix
          ./libtabix.a(bgzf.o): nella funzione "deflate_block":
          /home/elmaffo/Scaricati/vcflib/tabixpp/bgzf.c:311: riferimento non definito a "deflate"
          /home/elmaffo/Scaricati/vcflib/tabixpp/bgzf.c:313: riferimento non definito a "deflateEnd"
          /home/elmaffo/Scaricati/vcflib/tabixpp/bgzf.c:305: riferimento non definito a "deflateInit2_"
          /home/elmaffo/Scaricati/vcflib/tabixpp/bgzf.c:329: riferimento non definito a "deflateEnd"
          /home/elmaffo/Scaricati/vcflib/tabixpp/bgzf.c:345: riferimento non definito a "crc32"
          /home/elmaffo/Scaricati/vcflib/tabixpp/bgzf.c:346: riferimento non definito a "crc32"
          ./libtabix.a(bgzf.o): nella funzione "inflate_block":
          /home/elmaffo/Scaricati/vcflib/tabixpp/bgzf.c:380: riferimento non definito a "inflateInit2_"
          /home/elmaffo/Scaricati/vcflib/tabixpp/bgzf.c:385: riferimento non definito a "inflate"
          /home/elmaffo/Scaricati/vcflib/tabixpp/bgzf.c:391: riferimento non definito a "inflateEnd"
          /home/elmaffo/Scaricati/vcflib/tabixpp/bgzf.c:387: riferimento non definito a "inflateEnd"
          ./libtabix.a(bedidx.o): nella funzione "ks_getuntil":
          /home/elmaffo/Scaricati/vcflib/tabixpp/bedidx.c:11: riferimento non definito a "gzread"
          ./libtabix.a(bedidx.o): nella funzione "bed_read":
          /home/elmaffo/Scaricati/vcflib/tabixpp/bedidx.c:103: riferimento non definito a "gzdopen"
          ./libtabix.a(bedidx.o): nella funzione "ks_getc":
          /home/elmaffo/Scaricati/vcflib/tabixpp/bedidx.c:11: riferimento non definito a "gzread"
          ./libtabix.a(bedidx.o): nella funzione "bed_read":
          /home/elmaffo/Scaricati/vcflib/tabixpp/bedidx.c:138: riferimento non definito a "gzclose"
          /home/elmaffo/Scaricati/vcflib/tabixpp/bedidx.c:103: riferimento non definito a "gzopen64"
          collect2: error: ld returned 1 exit status
          make[2]: *** [tabix] Errore 1
          make[2]: uscita dalla directory "/home/elmaffo/Scaricati/vcflib/tabixpp"
          make[1]: *** [all-recur] Errore 1
          make[1]: uscita dalla directory "/home/elmaffo/Scaricati/vcflib/tabixpp"
          make: *** [tabixpp/tabix.o] Errore 2

          Comment


          • #6
            @Andrea

            Non ti preoccupare, parlo italiano.

            Mi sembra che si manca zlib: http://stackoverflow.com/questions/1...late-with-zlib

            Zlib e' installato nella tua sistema?

            Comment


            • #7
              @ekg

              here is the list of Zlib-related packages on my Ubuntu 12.10 box:

              Installed: zlib1g, zlib1g-dev, zlib1g:i386
              not installed: zlib-bin, zlib-gst, zlib1g-dbg, zlibc,

              do I need anyone of the "not installed"?

              thanks
              Andrea

              Comment


              • #8
                @ekg

                Found out the issue was related to tabixcpp, as specified by guillermo-carrasco in this thread (check out the last messages in the thread):

                Hi, I'm trying to compile tabixpp on ubuntu , but am getting a lot of "undefined reference to xxxxx" errors (https://gist.github.com/2309326) Am I missing a step? Is this a known issue? rob-> make ...


                I edited the Makefile in the tabixpp folder as suggested in the thread. Everything compiled.

                Bye

                Comment


                • #9
                  Here you go

                  For anyone who is interested, I ended up writing a couple of scripts for splitting and merging multi-allelic lines.
                  They are available in the "utils" directory of the Atlas2 trunk.
                  http://sourceforge.net/projects/atlas2/

                  Comment


                  • #10
                    Here is another small tool to do the same thing, written in python:

                    Simple vcf parser, based on PyVCF. Contribute to moonso/vcf_parser development by creating an account on GitHub.

                    Comment

                    Latest Articles

                    Collapse

                    • seqadmin
                      Non-Coding RNA Research and Technologies
                      by seqadmin




                      Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

                      Nobel Prize for MicroRNA Discovery
                      This week,...
                      10-07-2024, 08:07 AM
                    • seqadmin
                      Recent Developments in Metagenomics
                      by seqadmin





                      Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable1. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
                      09-23-2024, 06:35 AM

                    ad_right_rmr

                    Collapse

                    News

                    Collapse

                    Topics Statistics Last Post
                    Started by seqadmin, Today, 06:55 AM
                    0 responses
                    8 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 10-02-2024, 04:51 AM
                    0 responses
                    105 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 10-01-2024, 07:10 AM
                    0 responses
                    113 views
                    0 likes
                    Last Post seqadmin  
                    Started by seqadmin, 09-30-2024, 08:33 AM
                    1 response
                    117 views
                    0 likes
                    Last Post EmiTom
                    by EmiTom
                     
                    Working...
                    X