    Hi everyone,

    I've recently been "promoted" from the microarray analysis world now that the previous guy babysitting our dna sequencing projects has left for greener pastures.

    You all come very highly recommended, so I'm hoping you can help me go from asking lots of dumb questions to being able to contribute at least a bit.

    With the caveat that I'm not sure if this has been discussed ad nauseam and I whether I've got enough details to be clear, here's my little problem:

    We've got DNA sequencing for exome-capture run for about 40 cancer cell lines. I'm finding some very common differences from the reference genome that isn't in dbSNP or 1000 Genomes and I was wondering if anyone could help explain what it is and if there is a reasonable way to filter off these features.

    One example is A1BG with a T109G (ACC->CCC at position 325 [transcript ENST00000453054], reverse strand), chr19:58862835 T->G (hg19, chr19:63554647 for hg18). I have that mutation on 37/42 samples.

    The various QC metrics are good and the reads are nicely clustering around exonic regions as expected. I checked with a known gene (p53) and the resulting mutations also match up.

    The fact that it appears in so many samples and isn't in either dbSNP or 1000 genomes worries me a little bit. We have over 100 genes that have unknown mutations in over 30 samples, so I'd love to have a decent filter for them.
