I am testing JointSNVMix2 and used the joint_snv_mix_two workflow:
It has an output for every single chromosome coordinate, so I tried to screen for somatic mutation calls:
And I get a bunch of things like this:
All those have p_AA_AB (i.e., probability of AA genotype in normal --> AB genotype in tumor) of 1, but when you look at the counts, they hardly look like something with a somatic probability of 1.
Is there a model in JointSNVMix (there are a few) that works better than this one?
When you assign probability=1 for a bunch of sites with minimal evidence for somatic mutation, then this model for somatic mutation calling is useless for our sample.
Code:
python jsm.py classify joint_snv_mix_two \ /PATH/references/human_g1k_v37_decoy.fasta \ /PATH/Normal_Sample.bam \ /PATH/Tumor_Sample.bam \ /PATH/trained.exome.parameter.cfg \ /PATH/exome.jointsnvmix2.tsv
Code:
cat exome.jointsnvmix2.tsv | awk -F "\t" '$10+$11>.99'
And I get a bunch of things like this:
Code:
chrom position ref_base var_base normal_counts_a normal_counts_b tumour_counts_a tumour_counts_b p_AA_AA p_AA_AB p_AA_BB p_AB_AA p_AB_AB p_AB_BB p_BB_AA p_BB_AB p_BB_BB 1 12907379 T C 156 12 125 24 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 1 12907380 C A 156 10 128 24 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 1 16890771 G A 237 12 267 25 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 1 16890800 A T 53 2 44 8 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 1 16893070 C T 466 34 457 39 0.0000 1.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000 0.0000
Is there a model in JointSNVMix (there are a few) that works better than this one?
When you assign probability=1 for a bunch of sites with minimal evidence for somatic mutation, then this model for somatic mutation calling is useless for our sample.