We've recently had a 2GS carried out by an external facility, and they did some basic variation analysis on it. I would like to interpret this analysis, but don't know what program was used to generate it (and therefore don't know what all the columns mean). The facility produced documentation for one output format (SOAPsnp), but not the others shown below:
Any ideas what program(s) were used to generate these results?
Here's my current guess at columns:
Code:
==> CNV/alts_M_C_L.anno.variant_function <== intergenic TUBB8(dist=23823),ZMYND11(dist=57205) chr10 119001 123200 0 0 Copyratio:0.51412818168789 CNVlength:4200 BinNumber:6 intronic DIP2C chr10 522901 525000 0 0 Copyratio:0.0681283935965505 CNVlength:2100 BinNumber:3 ==> CNV/alts_M_C_L.dat <== chr10 119001 123200 6 0.51412818168789 chr10 522901 525000 3 0.0681283935965505 ==> INDEL/1T.filter.vcf <== ##fileformat=VCFv4.1 ##samtoolsVersion=0.1.16 (r963:234) ==> INDEL/1T.format.variant_function <== intergenic NONE(dist=NONE),TUBB8(dist=19917) chr10 72911 72911 - AAAA hom 55.9 15 INDEL;DP=15;AF1=1;CI95=0.5,1;DP4=0,0,4,3;MQ=37;FQ=-47.5 GT:PL:GQ 1/1:133,50,37,95,0,89,122,31,79,119:23 upstream TUBB8 chr10 95429 95429 - A het 191 38 INDEL;DP=38;AF1=0.5;CI95=0.5,0.5;DP4=6,9,14,7;MQ=48;FQ=194;PV4=0.18,1,2.5e-09,1 GT:PL:GQ 0/1:229,0,245:99 ==> SV/1T.filter.gff <== chr:10 INS 352698 352751 intron NM_014974 334362 355969 chr:10 INS 490952 491074 intron NM_014974 486937 518377 ==> SV/1T.filter.sv <== chr10 INS 365 44 155372 155373 2 chr10 INS 367 37 352698 352751 2
Here's my current guess at columns:
- CNV/alts_M_C_L.anno.variant_function: ?possibly ANNOVAR
- CNV/alts_M_C_L.dat: chromosome, start position, end position, ?copy number, ?reliability
- INDEL/1T.filter.vcf: VCF 1.4
- INDEL/1T.format.variant_function: ?possibly ANNOVAR
- SV/1T.filter.gff: GFF format, but does't seem to provide as much information as the .sv file
- SV/1T.filter.sv: chromosome, variant type, ?, ?, start position, end position, ?variant count
Comment