Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • LeonDK
    Member
    • Sep 2014
    • 69

    VCF to BCF conversion error: [E::bcf_hdr_add_sample] Duplicated sample name

    In converting a .vcf file to a .bcf file a get the following error message:
    Code:
    [E::bcf_hdr_add_sample] Duplicated sample name
    I am running conversion using bcftools like so:
    Code:
    bcftools convert -O u -o /path/to/input/file.bcf  /path/to/output/file.vcf.gz
    How do I fix this?
  • jmarshall
    Samtools maintainer
    • Jul 2009
    • 39

    #2
    Sample names are parsed from the #CHROM POS ID REF line, and can't appear more than once (at least for bcftools's implementation; not sure if this is clear in the VCF spec).

    The actual "Duplicated sample name" error message also tells you which sample name it's complaining about.

    Comment

    • LeonDK
      Member
      • Sep 2014
      • 69

      #3
      Originally posted by jmarshall View Post
      Sample names are parsed from the #CHROM POS ID REF line, and can't appear more than once (at least for bcftools's implementation; not sure if this is clear in the VCF spec).

      The actual "Duplicated sample name" error message also tells you which sample name it's complaining about.
      Yes, I was looking for a tool, which could identify and remove sample duplicates

      Comment

      • jmarshall
        Samtools maintainer
        • Jul 2009
        • 39

        #4
        That error message identifies duplicate sample names, or to find them all at once:

        Code:
        bcftools view -h file.bcf|grep '^#CHROM'| cut -f10-|tr '\t' '\n'|sort|uniq -c|grep -v '^ *1 '
        Then you could e.g. hack the #CHROM line in a text editor and use bcftools view -s to remove the duplicates you've marked via the text editor. Or use awk or similar.

        Comment

        • LeonDK
          Member
          • Sep 2014
          • 69

          #5
          Originally posted by jmarshall View Post
          That error message identifies duplicate sample names, or to find them all at once:

          Code:
          bcftools view -h file.bcf|grep '^#CHROM'| cut -f10-|tr '\t' '\n'|sort|uniq -c|grep -v '^ *1 '
          Then you could e.g. hack the #CHROM line in a text editor and use bcftools view -s to remove the duplicates you've marked via the text editor. Or use awk or similar.
          I wrote me a small script to search and identify sample name duplicates and exclude, such that only one is kept for each possible duplicate.

          Thanks for input.

          Comment

          Latest Articles

          Collapse

          ad_right_rmr

          Collapse

          News

          Collapse

          Topics Statistics Last Post
          Started by SEQadmin2, Today, 06:09 AM
          0 responses
          9 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-09-2026, 11:58 AM
          0 responses
          33 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-05-2026, 10:09 AM
          0 responses
          38 views
          0 reactions
          Last Post SEQadmin2  
          Started by SEQadmin2, 06-04-2026, 08:59 AM
          0 responses
          43 views
          0 reactions
          Last Post SEQadmin2  
          Working...