Hi All,
I'm new to the community and just beginning to understand many of the programs used for SNP selection. In my current exome seq project, I am looking for homozygous SNPs in a consanguineous pedigree with multiple affected siblings. My pipeline is as follows:
previously aligned input.bam files were obtained from sequencing facility
$ samtools sort input.bam input.sorted.bam
$ samtools rmdup –s input.sorted.bam input.sorted.rmdup.bam
$ samtools index input.sorted.rmdup.bam
$ samtools faidx HG19.fa
$ samtools mpileup –uf HG19.fa input.sorted.rmdup.bam > variants.raw
$ bcftools view –bvcg variants.raw > variants.raw.bcf
$ bcftools view variants.raw.bcf | vcfutils.pl varFilter –d 3 –D 1000 –G 20 > variants.flt.vcf
Afer this, I discard common variants using the 1000 genomes and/or dbSNP
I then grep for homozygous variants, variants shared among affected individuals and found in the heterozygous state in the unaffected parents.
This protocol seems to be generating a large number of INDELs (>50%) compared to SNPs. Is this unusual? Should I have more stringent filters in place?
Also, once I have obtained a final list of variants, what programs are recommended currently for functional analysis?
I'm new to the community and just beginning to understand many of the programs used for SNP selection. In my current exome seq project, I am looking for homozygous SNPs in a consanguineous pedigree with multiple affected siblings. My pipeline is as follows:
previously aligned input.bam files were obtained from sequencing facility
$ samtools sort input.bam input.sorted.bam
$ samtools rmdup –s input.sorted.bam input.sorted.rmdup.bam
$ samtools index input.sorted.rmdup.bam
$ samtools faidx HG19.fa
$ samtools mpileup –uf HG19.fa input.sorted.rmdup.bam > variants.raw
$ bcftools view –bvcg variants.raw > variants.raw.bcf
$ bcftools view variants.raw.bcf | vcfutils.pl varFilter –d 3 –D 1000 –G 20 > variants.flt.vcf
Afer this, I discard common variants using the 1000 genomes and/or dbSNP
I then grep for homozygous variants, variants shared among affected individuals and found in the heterozygous state in the unaffected parents.
This protocol seems to be generating a large number of INDELs (>50%) compared to SNPs. Is this unusual? Should I have more stringent filters in place?
Also, once I have obtained a final list of variants, what programs are recommended currently for functional analysis?
Comment