Hi, we're doing ChIP-seq in several cancer cell lines. Of course, common features of cancers can be translocations and polyploidy. We know that one cell line we use, K562, has I think 68 chromosomes and there are numerous translocations.
Discussing some recent ChIP-seq data a colleague in the lab was asking me about the reference genome and whether it factored in the translocations. I said "of course not" but it brought up a point that I (embarassingly) hadn't thought of and I'm wondering what the conventional wisdom is regarding ChIP-seq in cancer cells/cell lines?
The way I see it is the reference genome will have all the sequences but not necessarily in the same place for the cancer line in question. If we find some major peaks upstream of some gene in our ChIP, we don't know a priori whether the downstream (or upstream) sequences are present in our cell line as they are in the reference genome - they could be very far away.
For that matter if the sequences are on a rearrangement boundary any reads that span it will not map because they are composed of sequences from different regions. In fact, in this one ChIP, there were an unusually large number of reads that didn't map. Blast searches of many of these unmapped reads resultsed in multiple partial matches in the genome. This is of course what we see when doing deep-sequencing of 3C libraries which by design are chimeric sequences that are generated.
It seems to me that in this case it is imperitive to generate a reference genome for that cell line, otherwise the peak coordinates might be dramatically off. What's the accepted minimum coverage for sequencing a genome?
In the case of K562 the translocations haven't been mapped with high enough resolution to be confident in many regions. Of course peaks located in a region with no known rearrangements are probably fine.
Any thoughts on this?
Discussing some recent ChIP-seq data a colleague in the lab was asking me about the reference genome and whether it factored in the translocations. I said "of course not" but it brought up a point that I (embarassingly) hadn't thought of and I'm wondering what the conventional wisdom is regarding ChIP-seq in cancer cells/cell lines?
The way I see it is the reference genome will have all the sequences but not necessarily in the same place for the cancer line in question. If we find some major peaks upstream of some gene in our ChIP, we don't know a priori whether the downstream (or upstream) sequences are present in our cell line as they are in the reference genome - they could be very far away.
For that matter if the sequences are on a rearrangement boundary any reads that span it will not map because they are composed of sequences from different regions. In fact, in this one ChIP, there were an unusually large number of reads that didn't map. Blast searches of many of these unmapped reads resultsed in multiple partial matches in the genome. This is of course what we see when doing deep-sequencing of 3C libraries which by design are chimeric sequences that are generated.
It seems to me that in this case it is imperitive to generate a reference genome for that cell line, otherwise the peak coordinates might be dramatically off. What's the accepted minimum coverage for sequencing a genome?
In the case of K562 the translocations haven't been mapped with high enough resolution to be confident in many regions. Of course peaks located in a region with no known rearrangements are probably fine.
Any thoughts on this?
Comment