Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • TKC
    Member
    • Jul 2013
    • 10

    Novice help: RADseq strategy for population level study

    Hi all,

    I am new to NGS and was hoping to receive some advice regarding the experimental design for a project we are currently putting together in which we are looking to ID informative SNPs for detecting introgression between multiple species with a convoluted history of much hybridization/ reticulations. Genome size approx. 2.2 gb w/ estimate of 38% GC content. The “closest” genomic resource available is within family, different genera. Other (potentially pertinent?) details: We think that 20X coverage as a minimum would be appropriate, and are looking to multiplex at least 96 individuals per lane.

    Question: Does it make sense to genotype all (1000+) of our samples using RAD (or ddRAD?), or should we use RADseq on a subset of individuals (96 individuals total from “pure” populations?), ID informative SNPs, and then screen the rest of our samples using some genotyping assay (e.g. Sequenom)? Many projects seem to go this direction, but the problem I see is that it requires that there be enough flanking sequence (to the SNP) to develop oligos for Sequenom.

    We will be running this on an Illumina HiSeq 2000, which yields fragments that I think may be too small (~100 bp) for us to be able to develop flanking oligos for use with Sequenom, unless the SNP happens to be smack in the center of the fragment, right? Does anyone have any experience with the Sequenom MassArray platform for SNP genotyping and could shed light on this issue?
    Alternatively, we may have access to a MiSeq to yield larger fragment sizes, or we could use an overlapping paired-end method to try and get longer fragments as well (Hohenlohe et al. 2013; Molecular Ecology).

    Any help/ direction is much appreciated.
  • JackieBadger
    Senior Member
    • Mar 2009
    • 385

    #2
    "Does it make sense to genotype all (1000+) of our samples using RAD (or ddRAD?)"
    Not only does it not make sense but will be VERY expensive!

    You can get a good estimate of genome wide diversity in a population with 20-30 individuals



    "or should we use RADseq on a subset of individuals (96 individuals total from “pure” populations?), ID informative SNPs, and then screen the rest of our samples using some genotyping assay (e.g. Sequenom)?"

    This is counter to the rationale behind RADseq. Radseq negates the need for costly, timely, expensive SNP assays. RADseq is genotyping by sequencing. You ID the SNPs and genotype at the same time. Assays not needed

    "Many projects seem to go this direction, but the problem I see is that it requires that there be enough flanking sequence (to the SNP) to develop oligos for Sequenom. "

    Unless you want to develop an assay that will be used a lot, then it's not worth it. You need to think carefully about the costs of each. For what you want to do (detect introgression between species) you could do this probably with a minimum of 15 samples from each species. We are !

    I would advise do RAD on a fraction of your samples, and then look at neutral markers (micro-sats, mtDNA) in the remainder to get the whole picture.

    Comment

    • TKC
      Member
      • Jul 2013
      • 10

      #3
      Hi,

      Thank you for the response, I'll take all the help I can get!

      "You can get a good estimate of genome wide diversity in a population with 20-30 individuals"
      -I guess I should clarify, this is a species complex for which we have samples representing multiple populations per species, and the 1000+ samples is 20-30 individuals per population per species. So we were looking for the most affordable way of looking at every population (as work with msats and mtDNA has already shown that they should be managed as distinct units)... We also know from previous work than hybridization is likely much more/less extensive in some populations than others.

      As to cost, what would be a good estimate (rough, ballpark, etc) of cost per individual for RADseq? Would you recommend attempting the library prep in house, or farming that out with the sequencing? We have talked with Floragenex to some extent, and still don't have any concrete quote for total cost per individual...

      "This is counter to the rationale behind RADseq. Radseq negates the need for costly, timely, expensive SNP assays. RADseq is genotyping by sequencing. You ID the SNPs and genotype at the same time. Assays not needed"
      -I'm not 100% sure how much RADseq will cost us per individual, but Sequenom should cost us (after some start up costs) $4-5 per ~30 SNPs per individual (quoted from a commercial outfit- does anyone else know better??). So assuming we need 150 SNPs to have good diagnostic power that should cost us $20-25 per individual to assay. Since RADseq will give us more information than we really need, and if Sequenom is cheaper, it made sense to me that we could identify the SNPs (via RAD) that are useful for diagnosing hybrids, and then run the rest of our samples through the assay for genotyping.

      Again, thanks for taking the time to respond!

      Comment

      • SNPsaurus
        Registered Vendor
        • May 2013
        • 525

        #4
        Hi TKC,

        In the "old" days of a few years ago, lots of people did what you suggest here: use RAD-Seq to identify SNPs in a subset of a population, then convert a subset of those SNPs to a high-throughput genotyping platform. People used RAD PE contigs to get 300-500 bp, or overlap PE. And as you mention, a MiSeq run would now also have the same purpose.

        But as sequencing costs drop, the population size where that strategy is appropriate keeps getting higher. It is an investment to set up the genotyping, and you are then dealing with only previously known SNPs so lose the ability to discern new alleles that may be of interest.

        It sounds like the first question is if you need to get information on all 1000 individuals. When someone contacts SNPsaurus or my lab about a genotyping project (disclosure: my academic lab developed RAD-Seq, I have equity in Floragenex which offers RAD-Seq, I founded SNPsaurus which offers nextRAD), we ask how many markers are needed, do you need perfect information at each locus (this is a little tricky, some applications such as a genetic map prefer to have high quality genotypes that reliably call each allele of a heterozygote, others want good quality calls but missing alleles are OK), and what is the SNP rate in the population if known (i.e. how many sequenced loci will have a SNP in the population).

        From that, you can design the experiment. In your case, since cost is a factor for a large population, if you can get away with it you'd hope to get by with fewer markers and lower coverage. So a project assaying 30,000 tags per genome (10,000 markers if 1/3 of tags have SNPs) at 5x coverage (you will have good quality calls but miss a portion of heterozygous alleles) can fit >600 samples per lane. Usually the project is constrained by index availability at that kind of multiplexing (for nextRAD we use dual indexing and typically mutliplex at 192 samples per lane), in which case you need 6 lanes of sequencing, and will get higher coverage than planned because you aren't multiplexing as much as possible.

        For this kind of low coverage sequencing, the library cost will dominate, then. Most people peg the cost of materials at $15-20 per sample. I think the biggest unplanned for cost is labor (again, I'm an outsourcing service provider so I'll make that argument, but in my academic lab we help people with RAD projects and sometimes it drags on for months with run failure after run failure, so we see the ugly side as well!).

        Oops, I went back and saw your 20X coverage... it actually still fits in 6 lanes, so the project would be the same.
        Last edited by SNPsaurus; 07-25-2013, 09:20 AM.
        Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

        Comment

        • JackieBadger
          Senior Member
          • Mar 2009
          • 385

          #5
          5x coverage is way too low in my opinion. 20-30x minimum.

          If cost is no issue, go with one of the Oregon providers (they ain't cheap!..but will deliver SNPs hassle free).... if you do not have 10's/100's of thousands of dollars to spend, I would suggest collaborating with a lab which has expertise.

          Comment

          • SNPsaurus
            Registered Vendor
            • May 2013
            • 525

            #6
            JackieBadger, 5X may be low, but it really depends on the application. GBS (the Elshire method) is designed to sample many loci at sub-1X coverage, for example. If TKC just wants to see if introgression is happening, and there are hundreds of strain-specific SNPs, then having some missing data won't be a problem. But, you are right that it is better to be conservative about it!
            Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

            Comment

            Latest Articles

            Collapse

            • SEQadmin2
              From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
              by SEQadmin2


              Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


              The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
              ...
              06-02-2026, 10:05 AM
            • SEQadmin2
              Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
              by SEQadmin2


              With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


              Introduction

              Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
              05-22-2026, 06:42 AM
            • SEQadmin2
              Environmental Genomics in the Age of NGS: From Microbes to Conservation Strategies
              by SEQadmin2

              Studying ecosystems means dealing with complex, multi-species communities that are hard to observe at scale. This complexity, however, hides many important questions to be answered, from how biogeochemical cycles work and how climate change can affect species distribution to how conservation strategies can work best.


              Genomics, particularly since the expansion of NGS, has transformed ecosystem ecology. By sequencing environmental DNA, we can now assess biodiversity without direct...
              05-06-2026, 09:04 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by SEQadmin2, Yesterday, 08:59 AM
            0 responses
            13 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-02-2026, 12:03 PM
            0 responses
            21 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 06-02-2026, 11:40 AM
            0 responses
            19 views
            0 reactions
            Last Post SEQadmin2  
            Started by SEQadmin2, 05-28-2026, 11:40 AM
            0 responses
            31 views
            0 reactions
            Last Post SEQadmin2  
            Working...