We've got DNA sequencing for exome-capture run for about 40 cancer cell lines. I'm finding some very common differences from the reference genome that isn't in dbSNP or 1000 Genomes and I was wondering if anyone could help explain what it is and if there is a reasonable way to filter off these features.
One example is A1BG with a T109G (ACC->CCC at position 325 [transcript ENST00000453054], reverse strand), chr19:58862835 T->G (hg19, chr19:63554647 for hg18). I have that mutation on 37/42 samples.
The various QC metrics are good and the reads are nicely clustering around exonic regions as expected. I checked with a known gene (p53) and the resulting mutations also match up.
The fact that it appears in so many samples and isn't in either dbSNP or 1000 genomes worries me a little bit. We have over 100 genes that have unknown mutations in over 30 samples, so I'd love to have a decent filter for them.
One example is A1BG with a T109G (ACC->CCC at position 325 [transcript ENST00000453054], reverse strand), chr19:58862835 T->G (hg19, chr19:63554647 for hg18). I have that mutation on 37/42 samples.
The various QC metrics are good and the reads are nicely clustering around exonic regions as expected. I checked with a known gene (p53) and the resulting mutations also match up.
The fact that it appears in so many samples and isn't in either dbSNP or 1000 genomes worries me a little bit. We have over 100 genes that have unknown mutations in over 30 samples, so I'd love to have a decent filter for them.
Comment