Hi everyone!
I'm a newbie in NGS data analysis, that’s why I’m asking for some help in things that might be easy in fact.
I have a zipped vcf –file, which contains human chromosome 20 sequences for 269 individuals. I want to filter out singletons and doubletons for subsequent analysis. I use vcftools v0.1.15 installed on a server.
Here is what I do:
however, I get an empty output file (only sample names, no sequence information) and an error message that says as following:
Outputting VCF file...
After filtering, kept 0 out of a possible 991704 Sites
No data left for analysis!
Run Time = 24.00 seconds
I’ve tried to play around with –-mac and –-max-mac flags. First I run the following line:
where I tried n = 1; 10; 100
All three attempts gave me the same output file (not empty this time) and the log file says:
Outputting VCF file...
After filtering, kept 991704 out of a possible 991704 Sites
Run Time = 95.00 seconds
Actually I get the same output if I run
Then I’ve tried running
and got an empty output again
Then I’ve run
And got the same file as in case of --max-mac n
It seems to me that these flags ‘see’ my file as if it contained only zeros, which is not the case (I’ve looked at the content of the file manually)
If I try to filter for minor allele frequency instead of allele counts (which is not what I want to do, but I was just playing around to better understand what’s going on) I get this:
Outputting VCF file...
Error: Require Genotypes in variant file to filter by frequency and/or call rate
I’ve tried vcftools versions 0.1.13 as well with no difference.
Any help is greatly appreciated.
Best,
Vasili
I'm a newbie in NGS data analysis, that’s why I’m asking for some help in things that might be easy in fact.
I have a zipped vcf –file, which contains human chromosome 20 sequences for 269 individuals. I want to filter out singletons and doubletons for subsequent analysis. I use vcftools v0.1.15 installed on a server.
Here is what I do:
Code:
vcftools --gzvcf chr20_269ind.vcf.gz --mac 1 --max-mac 1 --recode --stdout | gzip -c > output_test.vcf.gz
Outputting VCF file...
After filtering, kept 0 out of a possible 991704 Sites
No data left for analysis!
Run Time = 24.00 seconds
I’ve tried to play around with –-mac and –-max-mac flags. First I run the following line:
Code:
vcftools --gzvcf chr20_269ind.vcf.gz - -max-mac n --recode --stdout | gzip -c > output_tesmaxnt.vcf.gz
All three attempts gave me the same output file (not empty this time) and the log file says:
Outputting VCF file...
After filtering, kept 991704 out of a possible 991704 Sites
Run Time = 95.00 seconds
Actually I get the same output if I run
Code:
vcftools --gzvcf chr20_269ind.vcf.gz --recode --stdout | gzip -c > output_tesmaxnt.vcf.gz
Code:
vcftools --gzvcf chr20_269ind.vcf.gz --mac 1--recode --stdout | gzip -c > output_test.vcf.gz
Then I’ve run
Code:
vcftools --gzvcf chr20_269ind.vcf.gz --mac 0--recode --stdout | gzip -c > output_test.vcf.gz
It seems to me that these flags ‘see’ my file as if it contained only zeros, which is not the case (I’ve looked at the content of the file manually)
If I try to filter for minor allele frequency instead of allele counts (which is not what I want to do, but I was just playing around to better understand what’s going on) I get this:
Outputting VCF file...
Error: Require Genotypes in variant file to filter by frequency and/or call rate
I’ve tried vcftools versions 0.1.13 as well with no difference.
Any help is greatly appreciated.
Best,
Vasili