Hi all, I've sequenced the mitochondria of an individual using Illumina (100bp PE, insertion size is 150-180nt). So far, I've been able to determine that its mitochondria has a heteroplasmy (estimate is 30-40%) of a ~230bp deletion relative to its reference.
I've yet to find a indel discovery program that also quantifies the relative allele frequency. Normal methods of indel/CNV quantification do not work because mitochondrial indels/CNV heteroplasmy would not occur as simple 50% increases/decreases, as would be seen in a genomic deletion/CNV.
What I've done so far is mapped reads to the reference and allowed for large gaps (thereby splitting reads at the deletion) and calculated the fraction that split as a percent of coverage. However, there is an issue where many reads do not split across the deletion and instead have unmapped free ends, as the end of the read isn't long enough for the mapping software to call it as a deletion read. Therefore, there is an underestimate of the deletion frequency.
Here's an example of the mapping output that shows reads mapping to the reference, split reads, and reads with unaligned free ends on either side of the deletion (should actually be contributing to the deletion read pool):

I've also considered simply counting the raw reads that correspond to either the deletion or the reference and calculating the fraction that are deletion, e.g.
reference: AAAAABBBBBCCCCC
deletion read: AAAAACCCCC
non-deletion read: AAAAABBBBB
So for example, if I counted 2000 of AAAAACCCCC (deletion read) and 5000 of AAAAABBBBB (non-deletion read), the overall rate of the deletion is 2000/7000=29%. I'm not sure if this is an oversimplification, but it would take out the possibility of the bias caused by unaligned free ends.
This topic has been visited before but I don't think a satisfactory answer has been found.
So my overall question is, is it possible to accurately quantify deletions greater than your read length/insertion size in a non-ploidy population?
I've yet to find a indel discovery program that also quantifies the relative allele frequency. Normal methods of indel/CNV quantification do not work because mitochondrial indels/CNV heteroplasmy would not occur as simple 50% increases/decreases, as would be seen in a genomic deletion/CNV.
What I've done so far is mapped reads to the reference and allowed for large gaps (thereby splitting reads at the deletion) and calculated the fraction that split as a percent of coverage. However, there is an issue where many reads do not split across the deletion and instead have unmapped free ends, as the end of the read isn't long enough for the mapping software to call it as a deletion read. Therefore, there is an underestimate of the deletion frequency.
Here's an example of the mapping output that shows reads mapping to the reference, split reads, and reads with unaligned free ends on either side of the deletion (should actually be contributing to the deletion read pool):

I've also considered simply counting the raw reads that correspond to either the deletion or the reference and calculating the fraction that are deletion, e.g.
reference: AAAAABBBBBCCCCC
deletion read: AAAAACCCCC
non-deletion read: AAAAABBBBB
So for example, if I counted 2000 of AAAAACCCCC (deletion read) and 5000 of AAAAABBBBB (non-deletion read), the overall rate of the deletion is 2000/7000=29%. I'm not sure if this is an oversimplification, but it would take out the possibility of the bias caused by unaligned free ends.
This topic has been visited before but I don't think a satisfactory answer has been found.
So my overall question is, is it possible to accurately quantify deletions greater than your read length/insertion size in a non-ploidy population?
Comment