Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • snapper
    Junior Member
    • Jun 2009
    • 5

    Barcoding vs anonymous pooling

    Hi, new to here, but hope someone may be able to offer some advice.

    I'm currently thinking of designing an experiment whereby we sequence ~1000human samples through a 1.7Mb custom region using Agilent SureSelect.

    2 strategies suggested are:

    1) Pool DNA from 25 samples, make 1 library from the pooled DNA, analyse for variants using Syzygy, follow up called variants through the original 24 samples to identify which sample the variant originates in.

    2) Barcode the samples from scratch which would require individual libraries for each +/- more lanes of sequencing?

    Obviously option 2 makes follow up easier but significantly increases the costs and time required.

    Does anyone have any experience of either method?
  • henry.wood
    Member
    • Apr 2010
    • 63

    #2
    One of the problems with the anonymous pooling is that a heterozygous SNP in one of your 25 samples is only going to be present in 2% of your reads for that region, which isn't much higher than the error rate. However, if the samples are barcoded, then all the reads with the SNP will all be identified as being from one person, so statistically a lot easier to spot.
    I do understand why you wouldn't want to make 1000 libraries though.

    Comment

    • andymay
      Junior Member
      • Jun 2010
      • 2

      #3
      Have you considered doing this by PCR rather than capture? We have developed a system that allows simple preparation of PCR products in which the library preparation and barcoding takes place during amplification. It scales well with large numbers of samples and amplicons, and works with both 454 and Illumina sequencing.

      Comment

      • snapper
        Junior Member
        • Jun 2009
        • 5

        #4
        Thanks both - we are committed to the pulldown capture following a pilot project so PCR not currently an option. In the pilot, the analysis did seem to work reasonably at identifying even a single het call within the pooled system although clearly the false positive rate will be higher than if we barcoded. However, follow up through the pools is potentially fairly substantial.

        Comment

        • Loris
          Junior Member
          • Dec 2009
          • 6

          #5
          If you are willing to make 40 pooled libraries, would you be willing to make 80?
          If you put each sample into two internally anonymous libraries - and sequence at sufficient depth - then you will be able to determine which sample near-unique variations came from. It'll probably work okay for rare mutations, although the more common they are the more follow-up work required.

          Comment

          • westerman
            Rick Westerman
            • Jun 2008
            • 1104

            #6
            I agree with Loris. A good way is to make 2 times the libraries in a row/column pool fashion. We use to do this with 'overgos' (ref: https://www.ncbi.nlm.nih.gov/project...chOvergo.shtml) and the same idea should be applicable to any sequencing project.

            Also I agree with henry.wood in that 2% is getting very close to the noise level. In theory with enough sequencing depth we should be able to detect variants below 1% but in practice I find this hard to accomplish as per the spiked controls we have used.

            Comment

            • krobison
              Senior Member
              • Nov 2007
              • 734

              #7
              WRT pooling, you might also look at DNA Sudoku.

              As per the comments above, you also could see this as an optimization problem -- what is the smallest number of pooled libraries which have acceptable sensitivity, with some degree of losing the ability to precisely localize a variant in the first run (i.e. instead of 1 pooled anonymous library, what about 2 each with half the samples, 4 each with 1/4, etc)

              If you haven't run this SureSelect design yet, beware that you may get uneven coverage -- so some regions will capture much more than others, which further complicates trying to design in the right sensitivity. Also, I believe Agilent still recommends capturing each library separately, though certainly here you will find folks discussing capturing pooled libraries

              Comment

              • snapper
                Junior Member
                • Jun 2009
                • 5

                #8
                Thanks all - this is extremely helpful.

                Comment

                • mrivas
                  Junior Member
                  • Nov 2010
                  • 4

                  #9
                  Hello Snapper,

                  I developed Syzygy while at the Broad Institute. Syzygy performs well with 25 individuals per pool. In fact we have several small targeted experiments that we designed with 50 individuals per pool (100 chromosomes) across 10 pools . We observe a high validation rate (~90% ) for all variants singletons and above. You can get more information about Syzygy from


                  We are currently optimizing Syzygy to deal with larger target sizes. Intended targets for applications was approximately 60-100 kb.

                  Best Regards,
                  Manuel Rivas

                  Comment

                  • gfmgfm
                    Member
                    • Jun 2010
                    • 64

                    #10
                    Hello Manuel Rivas,

                    I have a pooled experiment with target size of ~803 kb.
                    Can I use Syzygy?

                    If not- does anyone has suggestions what tool to use to call the SNPs from a pooled run (10 individuals in one Illumina run)?

                    Comment

                    • james hadfield
                      Moderator
                      Cambridge, UK
                      Community Forum
                      • Feb 2008
                      • 224

                      #11
                      check out http://genomebiology.com/2011/12/1/R1/abstract in the latest Genome Biology.

                      A scalable, fully automated process for construction of sequence-ready human exome targeted capture libraries

                      Comment

                      • mrivas
                        Junior Member
                        • Nov 2010
                        • 4

                        #12
                        Originally posted by gfmgfm View Post
                        Hello Manuel Rivas,

                        I have a pooled experiment with target size of ~803 kb.
                        Can I use Syzygy?

                        If not- does anyone has suggestions what tool to use to call the SNPs from a pooled run (10 individuals in one Illumina run)?
                        Yes you can use Syzygy. I am uploading an optimized version of Syzygy in the next couple of days it should handle 800 kb target without a problem. You can send an e-mail to [email protected]



                        Is the Software's website.

                        Best Regards,
                        Manuel

                        Comment

                        • mrivas
                          Junior Member
                          • Nov 2010
                          • 4

                          #13
                          The current version handles 800 kb target size without a problem.

                          Comment

                          • gfmgfm
                            Member
                            • Jun 2010
                            • 64

                            #14
                            Great.
                            Thanks!

                            Comment

                            • vbansal
                              Junior Member
                              • Jan 2011
                              • 1

                              #15
                              For calling variants from pooled sequencing data, you can also try CRISP, a method specifically designed to detect variants using sequence reads from multiple pools (each with a moderate number of individuals). The statistical model behind CRISP is described in this Bioinformatics article http://bioinformatics.oxfordjournals...i318.full?etoc

                              A python implementation of CRISP is available here: http://polymorphism.scripps.edu/~vba...oftware/CRISP/
                              A faster and more accurate C implementation is under development and is available on request. We have used CRISP to call variants (both SNPs and short indels) from pooled sequencing of ~600kb of DNA (captured using Agilent SureSelect) of 100 individuals using 5 pools of 20 each. The false discovery rate for detecting SNPs on this dataset was ~ 1%

                              Comment

                              Latest Articles

                              Collapse

                              • SEQadmin2
                                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                                by SEQadmin2


                                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                                ...
                                06-02-2026, 10:05 AM
                              • SEQadmin2
                                Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
                                by SEQadmin2


                                With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


                                Introduction

                                Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
                                05-22-2026, 06:42 AM
                              • SEQadmin2
                                Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
                                by SEQadmin2

                                Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


                                Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
                                05-06-2026, 09:04 AM

                              ad_right_rmr

                              Collapse

                              News

                              Collapse

                              Topics Statistics Last Post
                              Started by SEQadmin2, Today, 08:59 AM
                              0 responses
                              7 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 12:03 PM
                              0 responses
                              21 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 06-02-2026, 11:40 AM
                              0 responses
                              14 views
                              0 reactions
                              Last Post SEQadmin2  
                              Started by SEQadmin2, 05-28-2026, 11:40 AM
                              0 responses
                              29 views
                              0 reactions
                              Last Post SEQadmin2  
                              Working...