Hi,
I created a multi-sample mpileup file via the following:
samtools mpileup -P -D -S -uf L.genome.fa LScatALignment.sorted.bam WScatALignment.sorted.bam WSGcatALignment.sorted.bam | bcftools view -bvcg - > AllPOP.ALignment.mpileup
I then attempt to run the results through VarScan via:
java -jar VarScan.v2.3.2.jar mpileup2snp AllPOP.ALignment.vcf --min-coverage 20 --min-var-freq 0.8 --p-value 0.05 -output-vcf-1
but I get the following error:
"mgpl@ubuntu:~/Desktop/varscan$ java -jar VarScan.v2.3.2.jar mpileup2snp AllPOP.ALignment.vcf --min-coverage 20 --min-var-freq 0.8 --p-value 0.05 -output-vcf-1
Only SNPs will be reported
Min coverage: 20
Min reads2: 2
Min var freq: 0.8
Min avg qual: 15
P-value thresh: 0.05
Reading input from AllPOP.ALignment.vcf
Chrom Position Ref Var Cons:Cov:Reads1:Reads2:Freq:P-value StrandFilter:R1+:R1-:R2+:R2-val SamplesRef SamplesHet SamplesHom SamplesNC Cons:Cov:Reads1:Reads2:Freq:P-value
Error: Invalid format for pileup at line 1
##fileformat=VCFv4.1"
Can anybody tell me what is wrong with my VCF?
My VCF does not contain ID, or FILTER values.
A sample is below.
Many thanks in advance.
##fileformat=VCFv4.1
##samtoolsVersion=0.1.18 (r982:295)
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
##INFO=<ID=DP4,Number=4,Type=Integer,Description="# high-quality ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square mapping quality of covering reads">
##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred probability of all samples being the same">
##INFO=<ID=AF1,Number=1,Type=Float,Description="Max-likelihood estimate of the first ALT allele frequency (assuming HWE)">
##INFO=<ID=AC1,Number=1,Type=Float,Description="Max-likelihood estimate of the first ALT allele count (no HWE assumption)">
##INFO=<ID=G3,Number=3,Type=Float,Description="ML estimate of genotype frequencies">
##INFO=<ID=HWE,Number=1,Type=Float,Description="Chi^2 based HWE test P-value based on G3">
##INFO=<ID=CLR,Number=1,Type=Integer,Description="Log ratio of genotype likelihoods with and without the constraint">
##INFO=<ID=UGT,Number=1,Type=String,Description="The most probable unconstrained genotype configuration in the trio">
##INFO=<ID=CGT,Number=1,Type=String,Description="The most probable constrained genotype configuration in the trio">
##INFO=<ID=PV4,Number=4,Type=Float,Description="P-values for strand bias, baseQ bias, mapQ bias and tail distance bias">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##INFO=<ID=PC2,Number=2,Type=Integer,Description="Phred probability of the nonRef allele frequency in group1 samples being larger (,smaller) than in group2.">
##INFO=<ID=PCHI2,Number=1,Type=Float,Description="Posterior weighted chi^2 P-value for testing the association between group1 and group2 samples.">
##INFO=<ID=QCHI2,Number=1,Type=Integer,Description="Phred scaled PCHI2.">
##INFO=<ID=PR,Number=1,Type=Integer,Description="# permutations yielding a smaller PCHI2.">
##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GL,Number=3,Type=Float,Description="Likelihoods for RR,RA,AA genotypes (R=ref,A=alt)">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="# high-quality bases">
##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled strand bias P-value">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT LScatALignment.sorted.bam WScatALignment.sorted.bam WSGcatALignment.sorted.bam
LER_WGS_1_CONTIG_13 176 . T G 21.3 . DP=8;AF1=0.4992;AC1=3;DP4=4,2,0,2;MQ=44;FQ=24.3;PV4=0.43,1,1,1 GT:PLP:SP:GQ 0/1:0,0,0:0:0:3 0/1:22,0,156:6:0:24 0/1:34,0,34:2:0:34
LER_WGS_1_CONTIG_71 179 . A G 10.4 . DP=1;AF1=1;AC1=6;DP4=0,0,1,0;MQ=44;FQ=-26.4 GT:PLP:SP:GQ 1/1:40,3,0:1:0:3 0/1:0,0,0:0:0:3 0/1:0,0,0:0:0:3
I created a multi-sample mpileup file via the following:
samtools mpileup -P -D -S -uf L.genome.fa LScatALignment.sorted.bam WScatALignment.sorted.bam WSGcatALignment.sorted.bam | bcftools view -bvcg - > AllPOP.ALignment.mpileup
I then attempt to run the results through VarScan via:
java -jar VarScan.v2.3.2.jar mpileup2snp AllPOP.ALignment.vcf --min-coverage 20 --min-var-freq 0.8 --p-value 0.05 -output-vcf-1
but I get the following error:
"mgpl@ubuntu:~/Desktop/varscan$ java -jar VarScan.v2.3.2.jar mpileup2snp AllPOP.ALignment.vcf --min-coverage 20 --min-var-freq 0.8 --p-value 0.05 -output-vcf-1
Only SNPs will be reported
Min coverage: 20
Min reads2: 2
Min var freq: 0.8
Min avg qual: 15
P-value thresh: 0.05
Reading input from AllPOP.ALignment.vcf
Chrom Position Ref Var Cons:Cov:Reads1:Reads2:Freq:P-value StrandFilter:R1+:R1-:R2+:R2-val SamplesRef SamplesHet SamplesHom SamplesNC Cons:Cov:Reads1:Reads2:Freq:P-value
Error: Invalid format for pileup at line 1
##fileformat=VCFv4.1"
Can anybody tell me what is wrong with my VCF?
My VCF does not contain ID, or FILTER values.
A sample is below.
Many thanks in advance.
##fileformat=VCFv4.1
##samtoolsVersion=0.1.18 (r982:295)
##INFO=<ID=DP,Number=1,Type=Integer,Description="Raw read depth">
##INFO=<ID=DP4,Number=4,Type=Integer,Description="# high-quality ref-forward bases, ref-reverse, alt-forward and alt-reverse bases">
##INFO=<ID=MQ,Number=1,Type=Integer,Description="Root-mean-square mapping quality of covering reads">
##INFO=<ID=FQ,Number=1,Type=Float,Description="Phred probability of all samples being the same">
##INFO=<ID=AF1,Number=1,Type=Float,Description="Max-likelihood estimate of the first ALT allele frequency (assuming HWE)">
##INFO=<ID=AC1,Number=1,Type=Float,Description="Max-likelihood estimate of the first ALT allele count (no HWE assumption)">
##INFO=<ID=G3,Number=3,Type=Float,Description="ML estimate of genotype frequencies">
##INFO=<ID=HWE,Number=1,Type=Float,Description="Chi^2 based HWE test P-value based on G3">
##INFO=<ID=CLR,Number=1,Type=Integer,Description="Log ratio of genotype likelihoods with and without the constraint">
##INFO=<ID=UGT,Number=1,Type=String,Description="The most probable unconstrained genotype configuration in the trio">
##INFO=<ID=CGT,Number=1,Type=String,Description="The most probable constrained genotype configuration in the trio">
##INFO=<ID=PV4,Number=4,Type=Float,Description="P-values for strand bias, baseQ bias, mapQ bias and tail distance bias">
##INFO=<ID=INDEL,Number=0,Type=Flag,Description="Indicates that the variant is an INDEL.">
##INFO=<ID=PC2,Number=2,Type=Integer,Description="Phred probability of the nonRef allele frequency in group1 samples being larger (,smaller) than in group2.">
##INFO=<ID=PCHI2,Number=1,Type=Float,Description="Posterior weighted chi^2 P-value for testing the association between group1 and group2 samples.">
##INFO=<ID=QCHI2,Number=1,Type=Integer,Description="Phred scaled PCHI2.">
##INFO=<ID=PR,Number=1,Type=Integer,Description="# permutations yielding a smaller PCHI2.">
##INFO=<ID=VDB,Number=1,Type=Float,Description="Variant Distance Bias">
##FORMAT=<ID=GT,Number=1,Type=String,Description="Genotype">
##FORMAT=<ID=GQ,Number=1,Type=Integer,Description="Genotype Quality">
##FORMAT=<ID=GL,Number=3,Type=Float,Description="Likelihoods for RR,RA,AA genotypes (R=ref,A=alt)">
##FORMAT=<ID=DP,Number=1,Type=Integer,Description="# high-quality bases">
##FORMAT=<ID=SP,Number=1,Type=Integer,Description="Phred-scaled strand bias P-value">
##FORMAT=<ID=PL,Number=G,Type=Integer,Description="List of Phred-scaled genotype likelihoods">
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT LScatALignment.sorted.bam WScatALignment.sorted.bam WSGcatALignment.sorted.bam
LER_WGS_1_CONTIG_13 176 . T G 21.3 . DP=8;AF1=0.4992;AC1=3;DP4=4,2,0,2;MQ=44;FQ=24.3;PV4=0.43,1,1,1 GT:PLP:SP:GQ 0/1:0,0,0:0:0:3 0/1:22,0,156:6:0:24 0/1:34,0,34:2:0:34
LER_WGS_1_CONTIG_71 179 . A G 10.4 . DP=1;AF1=1;AC1=6;DP4=0,0,1,0;MQ=44;FQ=-26.4 GT:PLP:SP:GQ 1/1:40,3,0:1:0:3 0/1:0,0,0:0:0:3 0/1:0,0,0:0:0:3
Comment