Since I could not find any documentation on how Bedtools intersectBed works with vcf format, I'm posting my question here:
I'm intersecting a vcf file from mpileup and bcftools with dbSNP132 that is in vcf format:
intersectBed -a mutations_chr8.vcf -b dbSNP132_chr8.vcf -wao > mutations_chr8_dbSNP132.txt
In my example below I expected to see only dbSNP entries that have the exactly same coordinates, ie.e chr8 112454855, but I also get hits for chr8 112454865 and chr8 112454876:
This is correct and useful considering that indel coordinates are often a bit shifted. However, I'd like to how Bedtools determines the interval in which to look for the intersection. I guess it determines the length of the reference allele the -a file, but does it also do the same for the -b file? And how could I restrict the intersection to the exact position instead of the interval?
Thanks in advance for any help - and for Bedtools in general
Barbara
I'm intersecting a vcf file from mpileup and bcftools with dbSNP132 that is in vcf format:
intersectBed -a mutations_chr8.vcf -b dbSNP132_chr8.vcf -wao > mutations_chr8_dbSNP132.txt
In my example below I expected to see only dbSNP entries that have the exactly same coordinates, ie.e chr8 112454855, but I also get hits for chr8 112454865 and chr8 112454876:
Code:
chr8 112454855 . Tacacacacacacacacacacaca TACacacacacacacacacacacaca 35.5 . INDEL;DP=24;AF1=0.5;CI95=0.5,0.5;DP4=0,3,2,2;MQ=46;PV4=0.43,0.3,0.34,1 PL:GT:GQ 73,0,78:0/1:75 chr8 112454855 rs76365113 T A . . dbSNPBuildID=131;VP=050000000001070008000100;WGT=1;VC=SNP;VLD;G5A;G5;KGPilot1 1 chr8 112454855 . Tacacacacacacacacacacaca TACacacacacacacacacacacaca 35.5 . INDEL;DP=24;AF1=0.5;CI95=0.5,0.5;DP4=0,3,2,2;MQ=46;PV4=0.43,0.3,0.34,1 PL:GT:GQ 73,0,78:0/1:75 chr8 112454855 rs67322031 A AAC . . dbSNPBuildID=130;VP=050000000001000000000200;WGT=1;VC=INDEL 1 chr8 112454855 . Tacacacacacacacacacacaca TACacacacacacacacacacacaca 35.5 . INDEL;DP=24;AF1=0.5;CI95=0.5,0.5;DP4=0,3,2,2;MQ=46;PV4=0.43,0.3,0.34,1 PL:GT:GQ 73,0,78:0/1:75 chr8 112454855 rs71851070 TAC T . . dbSNPBuildID=130;VP=050000000001000000000200;WGT=1;VC=INDEL 3 chr8 112454855 . Tacacacacacacacacacacaca TACacacacacacacacacacacaca 35.5 . INDEL;DP=24;AF1=0.5;CI95=0.5,0.5;DP4=0,3,2,2;MQ=46;PV4=0.43,0.3,0.34,1 PL:GT:GQ 73,0,78:0/1:75 chr8 112454865 rs34424884 CAC C . . dbSNPBuildID=126;VP=050000000001000000000200;WGT=1;VC=INDEL 3 chr8 112454855 . Tacacacacacacacacacacaca TACacacacacacacacacacacaca 35.5 . INDEL;DP=24;AF1=0.5;CI95=0.5,0.5;DP4=0,3,2,2;MQ=46;PV4=0.43,0.3,0.34,1 PL:GT:GQ 73,0,78:0/1:75 chr8 112454876 rs36066513 ACA A . . dbSNPBuildID=126;VP=050100000001030100000200;WGT=1;VC=INDEL;SLO;G5A;G5;GNO 3
Thanks in advance for any help - and for Bedtools in general

Barbara
Comment