Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • francy
    Member
    • Jun 2011
    • 19

    Merge individual vcf files

    Dear SEQanswers,

    I cannot get the Perl scripts in VCFtools called "vcf-merge" to work and would like to ask for your help.

    This is my problem:
    I have 5 VCF files I would like to merge, and I have converted each one to VCFv4.0 using the perl script in VCFtools/perl called "vcf-convert".
    Then I used bgzip and tabix -p vcf as it is explained in the website.
    I also changed the PATH to where the perl scripts are, like this:
    "export PERL5LIB='/ugi/home/claudiagiambartolomei/shared/vcftools_0.1.5"

    Finally, when I type "perl vcf-merge file1.vcf.gz.tbi file2.vcf.gz.tbi file3.vcf.gz.tbi file4.vcf.gz.tbi file5.vcf.gz.tbi | gzip -c > out_merged.vcf.gz "

    First a lot of odd-looking signs and question marks come up, and then an error comes up which says:

    ], assuming VCFv4.0
    Broken VCF header, no column names?
    at /vcftools_0.1.5/perl//Vcf.pm line 169
    Vcf::throw('Vcf4_0=HASH(0x25672d8)', 'Broken VCF header, no column names?') called at /vcftools_0.1.5/perl//Vcf.pm line 808
    VcfReader::_read_column_names('Vcf4_0=HASH(0x25672d8)') called at /vcftools_0.1.5/perl//Vcf.pm line 583
    VcfReader:arse_header('Vcf4_0=HASH(0x25672d8)') called at /vcftools_0.1.5/perl/vcf-merge line 128
    main::init_cols('HASH(0x28a5b20)', 'Vcf4_0=HASH(0x2566b40)') called at /vcftools_0.1.5/perl/vcf-merge line 221
    main::merge_vcf_files('HASH(0x28a5b20)') called at /vcftools_0.1.5/perl/vcf-merge line 12

    Can you please help me figure out what I am doing wrong?

    Thank you very much for any advice you could give me,
    -f
  • iansealy
    Member
    • Oct 2010
    • 15

    #2
    It looks like you're trying to merge the tabix index files rather than the VCF files. Try something like:

    perl vcf-merge file1.vcf.gz file2.vcf.gz file3.vcf.gz file4.vcf.gz file5.vcf.gz > out_merged.vcf

    Cheers,
    Ian

    Comment

    • pbluescript
      Senior Member
      • Nov 2009
      • 224

      #3
      I can't help with vcftools, but bedtools supports vcf files if you want to give that a shot.

      Comment

      • francy
        Member
        • Jun 2011
        • 19

        #4
        Thank you very much to both for your replies!
        iansealy, it worked: I just needed to do run the command using the files not indexed using tabix! Although I am not sure why since it states clearly in the perl script "Merge the bgzipped and tabix indexed VCF files. (E.g. bgzip file.vcf; tabix -p vcf file.vcf.gz)\n"
        ...Anyway using only the .gz works!!

        pbluescript, I didn't know about bedtools and since I'll do much of this in the future I am sure I'll need that too so thanks for mentioning it.

        Cheers,
        -f

        Comment

        • ulz_peter
          Senior Member
          • Feb 2010
          • 219

          #5
          When you index a file using tabix (or what ever tool) it creates an index which is used together withthe file itself. The .tbi files are not the indexed vcf file but extra index files (every .vcf file needs an additional .vcf.tbi file). You need to create the indexes but you need to specify the vcf (in this case gzipped vcf.gz) files. The program recognises the index files by just adding the .tbi to the file name you specified.
          That's like a general rule for the tools which need indexed files. You create the index file but you specify the normal file and the program searches for the index files itself...

          I hope this clarifies it.

          Comment

          • francy
            Member
            • Jun 2011
            • 19

            #6
            Dear ulz peter,

            Thank you for explaining this, it finally makes sense now!

            cheers,
            -f

            Comment

            Latest Articles

            Collapse

            • GATTACAT
              Reply to Nine Things a Sample Prep Scientist Thinks About Before Sequencing
              by GATTACAT
              Love this - good data definitely starts from good input, and poor input can only give relatively poor data. I particularly like the mention of Nanodrop/absorbance based methods for quantification. It's such a toss up if you'll get an accurate reading or what amounts to a randomly generated number, and a lot of library/sequencing related issues can be traced back to poor quant.
              07-01-2026, 11:43 AM
            • SEQadmin2
              Nine Things a Sample Prep Scientist Thinks About Before Sequencing
              by SEQadmin2


              I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

              Here are nine questions we think about, in roughly the order they matter, before...
              06-18-2026, 07:11 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, 07-02-2026, 11:08 AM
            0 responses
            10 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-30-2026, 05:37 AM
            0 responses
            13 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-26-2026, 11:10 AM
            0 responses
            20 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-17-2026, 06:09 AM
            0 responses
            54 views
            0 reactions
            Last Post SEQadmin2  
            Working...