Greetings,
I'm trying to come up with a brief but fair description of the relationship of Sanger (CE) sequencing to Massively Parallel sequencing.
Would you give me your take on this topic?
Here's what I have so far:
Briefly, a rule of thumb is that if the sequencing region of interest is < 60,000 bp than it is still cheaper to use the CE Sanger sequencing approach. Full "exome" capture and sequencing is currently ~$2,200 at the MPS Core. For that money you could sequence 88,000 bp at 5X coverage. You would have 800 nucleotide read lengths. So fairly easy assembly of the data to a reference sequence and very close to 99.99% accuracy. MPSequencing on the other hand gets ~75 nt read lengths, so assembly and mapping to a reference sequence is more difficult and accuracy, last I've read with >20X coverage is ~99.9%. So 1 error in 1000 bases instead of 1 error in 10,000 bases.
It is for the latter reason that Sanger sequencing is used to validate sequence variants discovered with MPsequencing.
In addition closing gaps between contigs (assembled shorter reads into longer contiguous regions) is usually accomplished with Sanger sequencing.
And because of the higher error rate for individual MP reads, the reads that are not "seen" multiple times are filtered out. This can make it more difficult to find variations that occur at a low frequency if not all cells contain the variation. E.g. looking for causative mutations in cancer research.
Your help with this would be greatly appreciated.
I'm trying to come up with a brief but fair description of the relationship of Sanger (CE) sequencing to Massively Parallel sequencing.
Would you give me your take on this topic?
Here's what I have so far:
Briefly, a rule of thumb is that if the sequencing region of interest is < 60,000 bp than it is still cheaper to use the CE Sanger sequencing approach. Full "exome" capture and sequencing is currently ~$2,200 at the MPS Core. For that money you could sequence 88,000 bp at 5X coverage. You would have 800 nucleotide read lengths. So fairly easy assembly of the data to a reference sequence and very close to 99.99% accuracy. MPSequencing on the other hand gets ~75 nt read lengths, so assembly and mapping to a reference sequence is more difficult and accuracy, last I've read with >20X coverage is ~99.9%. So 1 error in 1000 bases instead of 1 error in 10,000 bases.
It is for the latter reason that Sanger sequencing is used to validate sequence variants discovered with MPsequencing.
In addition closing gaps between contigs (assembled shorter reads into longer contiguous regions) is usually accomplished with Sanger sequencing.
And because of the higher error rate for individual MP reads, the reads that are not "seen" multiple times are filtered out. This can make it more difficult to find variations that occur at a low frequency if not all cells contain the variation. E.g. looking for causative mutations in cancer research.
Your help with this would be greatly appreciated.
Comment