Hi all,
I want to split my VCF file into two files, one with SNPs and another with INDELs. To that I am using VCF tools, with the following sentences:
# to keep only SNPs
vcftools --vcf myvariants.vcf --remove-indels --recode-INFO-all --out only_SNPs --recode
# to keep only INDELs
vcftools --vcf myvariants.vcf --keep-only-indels --recode-INFO-all --out only_INDELs --recode
but when I check the files, I get this:
INDELs:
SNPs:
The splitting process doesn't make any sense, I have SNPs and INDELs in both files (I didn't add the genotype data here because it would be very difficult trying to read it)
In attachment the firts lines of my original VCF file.
I am pretty sure that the problem comes from my VCF file, not from vcftools, but I can't see the problem.
is there a tool to check if a vcf file is malformed?
Thanks in advance
I want to split my VCF file into two files, one with SNPs and another with INDELs. To that I am using VCF tools, with the following sentences:
# to keep only SNPs
vcftools --vcf myvariants.vcf --remove-indels --recode-INFO-all --out only_SNPs --recode
# to keep only INDELs
vcftools --vcf myvariants.vcf --keep-only-indels --recode-INFO-all --out only_INDELs --recode
but when I check the files, I get this:
INDELs:
Code:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT CM003279.1 1274 C A 999 . CM003279.1 3637 A C 157 . CM003279.1 3788 GCCCC GCCCCC 130 . CM003279.1 3879 A C 999 . . . .
Code:
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT CM003279.1 25370 TAAA TAA 999 . CM003279.1 75537 TACAC TAC 999 . CM003279.1 77780 ACATCA ACA 999 . CM003279.1 3177577 CTTT CTT 999 . . . .
In attachment the firts lines of my original VCF file.
I am pretty sure that the problem comes from my VCF file, not from vcftools, but I can't see the problem.
is there a tool to check if a vcf file is malformed?
Thanks in advance
Comment