(I'm very new to this particular field.)
From what I've read, it seems all variant call workflows do: alignment -> pileup -> variant call -> filtering/etc, regardless of whether the data is from whole genome or exome sequencing, and I understand this approach is valid for some applications/diseases. But for cancer applications where a tumor (or tumors) may be heterogeneous and have multiple mutation profiles (e.g. within an exon), is it not more valid to do variant calls on each read (cluster), then do a 'pileup' on the calls?
For example:
If anyone can point me to any papers/etc that discuss this, it is much appreciated.
From what I've read, it seems all variant call workflows do: alignment -> pileup -> variant call -> filtering/etc, regardless of whether the data is from whole genome or exome sequencing, and I understand this approach is valid for some applications/diseases. But for cancer applications where a tumor (or tumors) may be heterogeneous and have multiple mutation profiles (e.g. within an exon), is it not more valid to do variant calls on each read (cluster), then do a 'pileup' on the calls?
For example:
Code:
ref: ...AACGTG... ...AACGTG... 800x *clusters* had this sequence ...AACGAG... 100x *clusters* had this sequence ...ATCGTG... 100x *clusters* had this sequence The above data (assume 100% confidence in base call) will be concluded as: ...AACGTG... 90% wildtype ...ATCGAG... 10% mutant with two mutations ... when, in fact, it is two separate mutations at 10% each.
Comment