Hi all,
I ran GATK SomaticIndelDetector on (1) exome data from a biopsy vs. normal(blood) and (2) exome data from a cell line issued from the same biopsy vs. the same blood sample. The overlap is pretty low between the somatic indels from the biopsy and from the cell line: Jaccard similarity coefficient = 0.11. On the other hand, the overlap is much higher for somatic SNVs (identified by MuTect): Jaccard similarity coefficient = 0.63.
This may indicate that there are still many false positives in the Indels identified by the SomaticIndelDetector. Did someone applied some heuristic rules to filter the SomaticIndelDetector results? Does someone have some advice to filter on the different parameters of the INFO field of the VCF file (DP:MM:MQS:NQSBQ:NQSMM:REnd:RStart:SC)?
In advance thanks
Sylvain
I ran GATK SomaticIndelDetector on (1) exome data from a biopsy vs. normal(blood) and (2) exome data from a cell line issued from the same biopsy vs. the same blood sample. The overlap is pretty low between the somatic indels from the biopsy and from the cell line: Jaccard similarity coefficient = 0.11. On the other hand, the overlap is much higher for somatic SNVs (identified by MuTect): Jaccard similarity coefficient = 0.63.
This may indicate that there are still many false positives in the Indels identified by the SomaticIndelDetector. Did someone applied some heuristic rules to filter the SomaticIndelDetector results? Does someone have some advice to filter on the different parameters of the INFO field of the VCF file (DP:MM:MQS:NQSBQ:NQSMM:REnd:RStart:SC)?
In advance thanks
Sylvain