I'm getting confused as to which workflow I should build up to get a list of SNPs I can be confident with. I have WGS data and a custom built reference genome and I'm using Galaxy main for now.
So far my workflow consists in mapping my reads, removing reads that didn't align, removing duplicates (rmdup_ I don't manage to get Markduplicates work for me). and transforming my SAM file into a BAM file.
from what I read that I should then realign my reads and recalibrate the quality score. I'm not sure why I need to do that and what do I use for that ?
What tool should I use to call SNPs? Seems like pileup is good enough for SNPs on small number of sample but doesn't generate a VCF file that I need for SNPs calling.
what criteria do you use to filter SNPs?
Sorry for this newbie question... and thanks for any advice
So far my workflow consists in mapping my reads, removing reads that didn't align, removing duplicates (rmdup_ I don't manage to get Markduplicates work for me). and transforming my SAM file into a BAM file.
from what I read that I should then realign my reads and recalibrate the quality score. I'm not sure why I need to do that and what do I use for that ?
What tool should I use to call SNPs? Seems like pileup is good enough for SNPs on small number of sample but doesn't generate a VCF file that I need for SNPs calling.
what criteria do you use to filter SNPs?
Sorry for this newbie question... and thanks for any advice
Comment