To continue this topic, as it's been pretty popular...there was just another paper released by another group at Cold Spring Harbor using Nimblegen arrays in combination with Solexa sequencing.
Link to Nature is here:
http://www.nature.com/ng/journal/v39...g.2007.42.html
Some interesting highlights below, I'll try to expand on the paper tomorrow...it's getting late for me!
1. They show what we've already seen from the Baylor/Emory groups (discussed over in our 454 forum), that it's possible to capture a majority of human exons on 6-7 Nimblegen arrays with ~350k features each. Seems to be the same probe design, 60-mers tiled every 20bp.
2. Some data that suggests WGA samples miss quite a bit...~60% of exons represented with WGA, and ~98% without. Also WGA biases slightly against AT rich exons.
3. Showed that shorter captured samples (150-200, the range Illumina recommends...) work much better than capturing 500-600 base fragments. This is not surprising as Illumina relies on fragment length for efficient bridging and cluster generation. Seems they tried the same protocol that the 454/Nimblegen groups used.
4. A remarkable lack of commentary on sequencing accuracy, with only one comment stating that they detected 60% of HapMap SNP's expected in this CEPH individual.
They show some actual read data in the Supplementary Info from five SNP loci, which in reality is quite disappointing from an accuracy standpoint. There are at least 3 random differences from consensus in some of the reads that they fail to address, and in the same breath claim they are calling heterozygous SNPs simply because there are differences seen at an rs locus. I don't have the paper in front of me now but there were quite a few low quality bases as well. Up to 17 bases out of 26 in one run were quite poor.
I also wonder why the authors, in the intro to the paper, describe Solexa sequencing as generating "35-50" bp reads, and then use 26bp read lengths throughout their study. Comes across like an industrial application note from Illumina/Nimblegen, rather than a Nature paper.
Anyway, very interesting paper, although not the best presentation of actual sequence data nor any biology. It reminds me of the early days of microarrays when all you had to do demonstrate a run on someone's array platform to get a Nature paper. In all actuality I'm just jealous that I don't get to work with this type of technology, yet.
Comment