Hey, anyone here experienced with HTSJDK?
I'm currently using it to read VCF files (bi-allelic microarray derived) and grab all the genotypes for all the individuals then transform them into a Byte Array, storing each genotype as the sum of it's indices (so ./. > -2, 0/0 > 0, 0/1 > 1, 1/1 > 2 etc). Using other language (cyvcf2 etc) libraries they often natively provide genotypes as indices. However, with HTSJDK when requesting genotypes from a VariantContext with getGenotypes() you get the genotypes in base form (something like [A*, C], [A*, A*], [C, C] where each allele is it's own object). Currently I'm using getAlleleIndices() from VariantContext to convert these Genotype objects back into an Array of indices which I can then sum. This is a bit slower than I'd ideally like, as such is there a better or faster way of doing this rather than using getAlleleIndices() to transform each Genotype into it's Indices?
I'm currently using it to read VCF files (bi-allelic microarray derived) and grab all the genotypes for all the individuals then transform them into a Byte Array, storing each genotype as the sum of it's indices (so ./. > -2, 0/0 > 0, 0/1 > 1, 1/1 > 2 etc). Using other language (cyvcf2 etc) libraries they often natively provide genotypes as indices. However, with HTSJDK when requesting genotypes from a VariantContext with getGenotypes() you get the genotypes in base form (something like [A*, C], [A*, A*], [C, C] where each allele is it's own object). Currently I'm using getAlleleIndices() from VariantContext to convert these Genotype objects back into an Array of indices which I can then sum. This is a bit slower than I'd ideally like, as such is there a better or faster way of doing this rather than using getAlleleIndices() to transform each Genotype into it's Indices?