Header Leaderboard Ad

Collapse

syndip dataset for benchmark variant

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • syndip dataset for benchmark variant

    I have a question about syndip dataset : https://github.com/lh3/CHM-eval . I'm struggling to find the syndip vcf.

    In the release ( https://github.com/lh3/CHM-eval/releases ), we have a file named : rep2.37.broad.hc.raw.vcf.gz, that i don't know what it is. And we have a file named CHM-evalkit-20180222.tar wich contain full.37m.vcf and other files ( bed, eval ...). So i did my search and according to this file: they mentionned that full.37m.vcf is the truth dataset. ( https://www.biorxiv.org/content/bior...1/456103-1.pdf Page 16).

    The problem is that the file rep2.37.broad.hc.raw.vcf.gz contain variants with MQ, DP, GQ ... that i need to extract. But the full.37m.vcf doesn't contain this information.. ( just Chrom pos ref alt and QUAL.)

    So i tried to intersect rep2.37.broad.hc.raw.vcf.gz with full.37m.vcf and take the variant that present in two files, with the DP MQ GQ in rep2.37.broad.hc.raw.vcf.gz. Is that okay ? Since I don't know what is rep2.37.broad.hc.raw.vcf.gz.

    And i also noticed that the QUAL in the full.37m.vcf is always 30 .. Is it normal ? Thank's
    Last edited by maryem life; 05-24-2022, 08:02 AM. Reason: add something
Working...
X