I have been playing around with vcf-tools, and have noticed something strange. When I used vcf-isec to create a complement vcf file containing variants only found in ALL my input files, the result is a list of variants found in components that are within a very narrow window (comps 22,450 to 69,079) whereas the full assembly contains 3,113,715 comps.
The vcf files from which I created the complement do contain a greater range of comps, but the vast majority also come from this window as explained above. Obviously when reducing the number of variants to those common in all files loses the comps at the edges.
I can't understand why all the common variants would come from such a narrow range of comps from the reference assembly. Is the assembly (which I did de novo in Trinity) arranged so that similar contigs are closer to one another in the assembly? Is there another explanation for why my variants seem to be restricted to a narrow range within the assembly?
Any advice or suggestions are very gratefully received!!
The vcf files from which I created the complement do contain a greater range of comps, but the vast majority also come from this window as explained above. Obviously when reducing the number of variants to those common in all files loses the comps at the edges.
I can't understand why all the common variants would come from such a narrow range of comps from the reference assembly. Is the assembly (which I did de novo in Trinity) arranged so that similar contigs are closer to one another in the assembly? Is there another explanation for why my variants seem to be restricted to a narrow range within the assembly?
Any advice or suggestions are very gratefully received!!