Seqanswers Leaderboard Ad



No announcement yet.
This is a sticky topic.
  • Filter
  • Time
  • Show
Clear All
new posts

  • Nimblegen arrays for targeted genome enrichment

    Two papers were recently published in Nature Methods detailing an astonishing new technique which combines the flexibility of on-demand array technology with the power of next gen sequencing instrumentation.

    These papers demonstrate the specific capture and sequencing of specific genomic regions, using a programmable microarray platform (from Nimblegen)

    Direct selection of human genomic loci by microarray hybridization (Albert, et al)


    Microarray-based genomic selection for high-throughput resequencing (Okou, et al)

    Both studies capture defined regions of the genome (either ~6,700 exons or up to a 5Mb contiguous region) using Nimblegen arrays, and present 454 sequencing data of the enriched regions. ~65-75% on target reads were reported, with median single base coverage of 5-7 fold. The Albert study validated their technique against four samples from the HapMap collection, and were able to identify 79-94% of known variants in the target regions.

    A third paper in the same issue of Nature Methods from the Church lab ("Multiplex amplification of large sets of human exons", Porreca et al) demonstrates another technique for capture. Rather than capturing the targets using hybridization directly on the array, this study uses a programmable array to synthesize "molecular inversion" capture probes that are cleaved from the array and used to fish out small regions of interest (~60-191 bp exon fragments). Enriched fractions were then sequenced with the Solexa Genome Analyzer.

    The results reported in this study were less than impressive, with only 28% on target hits. There was also a significant problem with calling heterozygous polymorphisms, however the authors hope this can be optimized at the reaction level. This technique, which relies on a dual hybridization event surrounding the region of interest followed by gap-filling/ligation, is much more complicated and seems that it will require intense optimization to approach the success had with direct capture.

    In any event, this enrichment technology will make a significant impact on any study examining a defined subset of the genome, such as candidate region sequencing workflows. What once was a laborious process of PCR primer design, optimization, amplification, and normalization, has become a simple one-pot hybridization event.

  • #2
    There was just a fourth Nimblegen paper released...I put down some thoughts about it here:

    Bridged amplification & clustering followed by sequencing by synthesis. (Genome Analyzer / HiSeq / MiSeq)


    • #3
      I'd love to hear some comments from people about how this is affecting experiments in their labs.

      Completely obsoleting huge PCR-driven resequencing studies?

      Anyone talked to Nimblegen to get an idea of availability? At GME2007 a 454 rep told me they will be launching the exon platform soon (6 mo.?), but perhaps on the higher density chip so they can get it all on one.

      Anyway, if you're reading this, don't be shy...


      • #4
        Target enrichment methods

        Two points about the recently published target enrichment methods.

        1. Comparing the methods from the different papers is not easy. First, Okou et al. used a resequencing array, which gives no measure of how many “off target” sequences were captured. Second, there is no set method of measuring recovery (how many of the targeted sequences were found) or specificity (what fraction of the reads mapped to the targeted regions). Table 1 in the Hodges et al. Nature Genetics paper shows how their numbers for recovery can vary from 54% to 67% depending on whether flanking regions are included. Also, to clarify ECO’s statement, the Church lab recovered only 28% of their targets across 2 experiments, while the run numbers from the methods give the % of on-target reads as 2694087/4794590 = 56%. The Hodges paper had a comparable raw specificity of 1993201 on-target reads out of 4198280 total reads = 47%.

        2. There are 3 important workflow considerations for the average lab which the papers gloss over: Time, sample required, and money.
        Nimblegen: 65 hour hyb of 20 ug DNA to a custom array
        Church lab: 1.5 hour hyb of 1.5ug DNA to pool of oligos (admittedly, with more molecular biology steps).
        Depending on the sequencing platform speed and experimental design, the time may not matter. But 20ug of DNA is hard to get from human samples without amplification, while 1.5ug is feasible for humans and easily available for HapMap cell lines.
        Considering cost, Nimblegen claims the arrays can be reused at least once (but how can you be sure that a variant sequence came from your sample, and is not carryover from the previous sample?) However, the Church lab oligo pool was amplified and presumably could be aliquoted for dozens or even hundreds of individual samples. Clearly George Church is interested in driving the cost of sequencing way down, which may be in part why he uses this method.

        In the end we will probably have to wait for the next round of papers to see how well the different methods can be optimized.


        • #5
          My two cents...

          I'm a postdoc employed to sequence a candidate region of interest via next-gen sequencing methods.

          Here are my thoughts on Nimblegen capture as bullet points ripped from a presentation I gave to my group in November. Sorry, you don't get to the enjoy the accompanying figures I will compare BAC pulldown, chromosome pulldown, the Porreca et al. and Albert et al. Nimblegen capture methods and LR-PCR using a hypothetical 10 Mb contiguous region as an example. I will also use the Porreca, et al. (2007) criteria for "front-end" enrichment technologies.
          Multiplexity: The number of independent capture reactions performed simultaneously in a single reaction.
          Specificity: The fraction of captured nucleic acids that derive from targeted regions.
          Uniformity: The relative abundances of targeted sequences after selective capture.

          BAC Pulldown (Bashiardes et al. (2005))
          • 150 kb Chromosome 17 region.
          • Two rounds of BAC hybridisation results in x10,000 enrichment.
          • 52% of captured clones are from the targeted region.
          • While BAC selection maybe OK for a 150 kb region. Selecting 10 Mb would require 60 – 100 BACs.
          • Invitrogen Offer a BAC clone service
          • Oligo selection of appropriate BAC clones from their libraries.
          • Need oligos 40-100 bp with at least 50% CG content.
          • Oligo must be 1 μmol scale, PAGE purified.
          • About $450 AUD per oligo.
          • One oligo per BAC clone (~100 kb).
          • $45,000 in just oligos for a 10 Mb hypothetical region.
          • Selectivity OK.
          • Multiplicity on multi-megabase regions is low.
          • Too pricey!!!

          • Synthesis of 2104 PCR primers to cover 10 Mb with 10 kb PCRs at 9.5 kb tiling.
          • Oligo synthesis: $7.84 x 2104 primers = $16.5K AUD.
          • PCR cost $2.2 / rxn (5 PRIME). 1052 rxns = $2.3K AUD per patient.
          • Cost for 6 patients is ~$30K plus sequencing costs.
          • A tiling LR-PCR strategy has been used by Perlegen but a detailed method has not been disclosed.
          • No multiplexity.

          Harvard Exon Capture Method (Porreca, et al.,2007)
          • Designed an Agilent array
          • Agilent fabricates with SurePrint ‘Inkjet style’ deposition and standard phosphoramidite chemistry to give 60-mer oligos.
          • Up to 244,000 features on a 1” x 3” glass slide.
          • This study specified 55,000 100mer probes.
          • Novel, non-spec array design. $12,000 USD.
          • Observed 28% of targets over two experiments
          • At least one Solexa read mapping to 15,380 exons from 55,000.
          • A low rate of capture is associated with:
          • Less than 40%, or greater than 70% GC content in captured exon.
          • Shorter targets.
          • Evidence suggests bias in initial exon capture step.
          • Once the probe library is synthesised this method is relatively cheap as 200 reactions can be created per microarray synthesis.
          • Uniformity is an issue. This method is not yet suitable for enrichment of a contiguous region.

          Nimblegen CGH Capture (Okou ,et al. 2007)
          • A collaboration of Zwick lab at the Emory University School of Medicine, Atlanta, Georgia and Nimblegen Inc.
          • Shear DNA to 300 bp.
          • End repair (add A)
          • Ligate T overhang adaptors.
          • Capture on array
          • Elute and amplify with adaptor-based primers.
          • Targeting either 50 kb (FMR1 locus) or 304 kb.
          • 50 kb chip had four pairs of probes for every targeted base.
          • 304 kb chip had one pair of probes for every ~1.5 targeted bases.
          • Probes were 50 - 94mers. Designed for isothermal hybridisation.
          • DNA was Qiagen RepliG WGA from 100 ng of starting DNA.
          • Used 20 μg of patient DNA on the capture chip.
          • Five-fold Cot-1 DNA added (100 μg).
          • Yields of 700 ng to 1.2 μg after a 60 h hybridisation.
          • Could reuse capture chips at least once with no apparent contamination or effect on data quality!
          • Employed an Affymetrix resequencing chip on captured DNA
          • Basecalling rate: 99.1%
          • Inter-sample basecalling agreement: 99.98%
          • Accuracy at HapMap SNP sites: 99.81%
          • 10 Ct’s enrichment of DNA. (2^10 ~ 1000).
          • 50 kb region: 3x10^9 genome / 5x10^3 region = 60,000 enrichment for pure capture.
          • A 1000x fold enrichment of 50 kb target DNA means 1 in 60 captured DNA sequences are from the target region.
          • Considering a large overabundance of non-targeted regions this protocol is OK for downstream chip based sequencing, but not massively parallel sequencing using Roche 454 FLX, Solexa 1G or ABI SOLiD.
          • Poor selectivity.

          Nimblegen CGH Capture (Albert, et al. 2007)
          • A collaboration of the Gibbs group from the Human Genome Sequencing Centre, TX and Nimblegen Inc.
          • Sonicated DNA polished with Klenow and PO4 added with T4PNK.
          • Blunt-end ligated with adaptors, Hybridisation was at 42°C for 65 h.
          • Eight qPCRs from different loci were used to score enrichment.
          • DNA representing 6,726 genomic regions (minimum size 500 bp with 5 Mb total) from 660 genes dispersed throughout the genome.
          • Capture array series targeted areas of 200 kb, 500 kb, 1 Mb, 2 Mb and 5 Mb surrounding the human BRCA1 gene locus.
          • Only ~3x fold variability in FLX read density over a 2Mb contiguous region
          • Minimum coverage still relatively high.
          • Capture is specific.
          • Repeat sequences almost absent.
          • Increasing probe tiling distance decreases repeats and increases reads mapping to selected area.
          • Does dense tiling pulldown BRCA family member DNA or other non-targeted DNA?
          • Enrichment with Nimblegen arrays via the Albert et al. (2007) protocol looks economical, practical and feasible.
          • The introduction of the CGH2 array onto the market means we could target at least 15 Mb of the genome.
          Last edited by sci_guy; 01-24-2008, 06:51 PM.


          • #6
            I've been in contact with Roche and I am lead to believe that a Nimblegen sequence capture service will be available around March, not long after human custom HD2 chips hit the market.

            I have some open-ended questions regarding this "new and improved" formula

            The HD2 array in order to go from 385K probes to 2.1M takes up much more real estate on the slide but also incorporates a feature shrink, from 16 to 13 micron. Presumably, there are less total oligos in this smaller 13 micron region. Will the feature shrink and increased numbers of probes make HD2 "noisier" than the 385K chip? Will we still see reasonable specificity?
            Why have Hodges et al. (2007) used a custom sequencing primer approach to their downstream Solexa sequencing? Will the 1G instrument handle a longer primer OK? Why do they not include the Solexa adaptor as an adaptor in the Nimblegen protocol?
            I ask these questions as I would like to capture 500 bp fragments on the Nimblegen chip and then alter the protocol to reduce the fragment sizes down before a SOLiD or Solexa/Illumina run.

            I'm in the process of emailing Solexa and the corresponding authors of these papers to get some answers. If you can inform me on any of this, please post!
            Last edited by sci_guy; 01-24-2008, 06:53 PM.


            • #7
              Great posts sci_guy! If you are willing to share your full presentation I would be happy to host it for the community (either just images or a powerpoint slideshow). Send me a PM or email if you are interested.

              I have heard from Agilent that they are about to release an improved capture method (specifically NOT the Church circularization method) at the AGBT conference next month. This will supposedly be in a kitted format including all the reagents for capture, combined with a custom array.

              Great stuff...hopefully we can pull some of the Nimblegen folks in here to respond


              • #8
                For future reference, sci_guy's powerpoint deck is in this thread:


                • #9
                  Does anyone know if the HD2 chips will be available for species other than human/mouse? Or if Nimblegen will make custom chips to order?


                  • #10
                    Custom CGH

                    pmcget - It's my belief that Nimblegen will offer custom HD2 CGH chips around March.


                    • #11
                      reply to my two cents

                      SCI guy,
                      what a great job on comparing information from the various paper. However, I would like to bring some correction to the info provided on nimblegen capture chip.

                      for Okou, et al, 2007. T
                      heir enrichment derivation (10 Ct’s enrichment of DNA. (2^10 ~ 1000)) was applied on 300k region not 50k region. This should have been 3x10^9 genome / 300x10^3 region = 10,000 fold enrichment for pure capture. therefore A 1000x fold enrichment of 300 kb target DNA means that 1 in 10 (not 1 in 60) captured DNA sequences are from the target region.

                      for Albert, et al. 2007.
                      Their enrichment calculation was based on the percentage of read that mapped back the to the target for the triplicate result of 5 Mb region. it was calculated as follow. In theory, 3x10^9 genome / 5x10^6 region = 600 fold enrichment = 100% enrichment. But with average of 72% read that mapped to the target [(75%+65%+77%)/3 = 72.33% =~72%) the enrichment represent 432 fold [(600 x 72/100].

                      If Okou, et al had derived their enrichment the same as that Albert et al. I think they found 99.1% of their read to map back to the target, then this represents 9910 fold enrichment [(10000 x 99.1)/100].

                      Ii that is true, whose estimation of enrichment or selectivity is accurate?
                      Last edited by ivoire; 01-28-2008, 03:02 PM.


                      • #12
                        With the Nimblegen capture array, the larger the size the more economical it is. the price of the chip should be the same regardless of the size (of course the larger the size the more spaced the probes on the chip will be). I also do not understand why Nimblegen capture chip required 3 days hybe. this is just hybridization of oligo and it should in theory happen overnight (~14 - 18 hrs). Additionally what up with the 42 degree hybe?
                        Last edited by ivoire; 01-28-2008, 03:03 PM.


                        • #13
                          slow hybs in array capture methods

                          My guess for the reason for the long hybs to Nimblegen arrays is that at 42 degrees, it takes a LONG time for the hybridization reaction to approach equilibrium. While short oligos can hyb quickly in solution, the oligo probes on the array are fixed and immobile, so the reaction relies on diffusion of the target genomic DNA fragments. These target DNAs are at least several hundred nt, and it will take each one a lot longer than overnight to find its complement on the array surface, esp. if the concentration is limiting (note the large sample requirement in these protocols). It would be much faster if the targets were short, like microRNAs. The combination of large sample+slow hyb times for the array based methods may give an edge to solution-based capture methods if the bugs get worked out (the solution capture probes could find their targets more quickly than fixed capture probes).
                          I would bet that the 42 degree hyb temperature could be determined not by biochemistry, but by engineering. For Nimblegen probe lengths, the hyb could more stringent (and diffusion is faster) at 65 degrees, but its harder to contain a small volume of liquid at a higher temperature for a long time.


                          • #14
                            Okou et al.

                            Originally posted by ivoire View Post
                            SCI guy,
                            for Okou, et al, 2007. T
                            [T]heir enrichment derivation (10 Ct’s enrichment of DNA. (2^10 ~ 1000)) was applied on 300k region not 50k region.
                            Yes, you are quite right. Good spotting. I had become confused by the header on supplementary figure 1. 1 in 10 enrichment makes more sense and is in the same ballpark as Albert et al. Perhaps the slightly reduced enrichment is a consequence of shearing DNA to 300mers?

                            Originally posted by ivoire View Post
                            If Okou, et al had derived their enrichment the same as that Albert et al. I think they found 99.1% of their read to map back to the target, then this represents 9910 fold enrichment [(10000 x 99.1)/100].
                            I'm not sure this determination is valid. Okou et al. used an Affy resequencing chip, which requires a second hybridisation to generate sequencing data.

                            My presentation is biased towards our intended application, where we need high enrichment across a large genomic region in a small number of samples for, most likely, SNP discovery in exons.

                            Affymetrix sequencing chips cost around $23K AUD for mask design. The 49-format chip has a capacity to sequence 300 kb and one must order one wafer of chips minimum (49 chips @ ~$600 AUD). Given that:
                            • chip-based sequencing is not entirely reliable for SNP discovery (T can form 3-H bonds with G making G-T mismatches troublesome with chips) .
                            • you cannot even sequence the CCDS set exons over our region of interest
                            • we must order 49 chips minimum

                            ...we decided against chip based resequencing. Okou et al. wanted to detect INDELs or repeat length polymorphisms in a small genomic region in a large number of people. In this instance, Affymetrix chip based resequencing is a good choice.


                            • #15
                              Originally posted by ivoire View Post
                              With the Nimblegen capture array, the larger the size the more economical it is. the price of the chip should be the same regardless of the size (of course the larger the size the more spaced the probes on the chip will be).
                              The HD2 array physically takes up 3.84-fold sized area over the first-gen chip and has a feature shrink down to 2/3 size:
                              3.84 / 0.666 =~ 5.7x potential probes by area
                              By actual probes calculation:
                              2.1M probes / 385,000 probes = 5.45x more probes

                              Probe spacing (as I have understood your point) seems relatively the same. The feature shrink means probe is even more limiting so hyb kinetics will differ. Likewise, the addition of 5.45x more probes. I would love to see some empirical data on enrichment with HD2.

                              HD2 chips are twice the price of 385,000 probe chips. See costs and array size data in my presentation here:
                              Any topic/question that does not fit into the subcategories below. If you're unsure of where to put something, ask in here!


                              Latest Articles


                              • seqadmin
                                Latest Developments in Precision Medicine
                                by seqadmin

                                Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

                                Somatic Genomics
                                “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
                                05-24-2024, 01:16 PM
                              • seqadmin
                                Recent Advances in Sequencing Analysis Tools
                                by seqadmin

                                The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
                                05-06-2024, 07:48 AM





                              Topics Statistics Last Post
                              Started by seqadmin, 05-24-2024, 07:15 AM
                              0 responses
                              Last Post seqadmin  
                              Started by seqadmin, 05-23-2024, 10:28 AM
                              0 responses
                              Last Post seqadmin  
                              Started by seqadmin, 05-23-2024, 07:35 AM
                              0 responses
                              Last Post seqadmin  
                              Started by seqadmin, 05-22-2024, 02:06 PM
                              0 responses
                              Last Post seqadmin