Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • designing ddRAD-seq in pest arthopod

    I am designing a project to initially look at population diversity of a mite species at various spatial levels and then to relate these genotypes to phenotypes related to pathogenicity by GWAS. The mites are very inbred in wild populations so signatures should be strong with regards to GWAS.

    I have previously carried out an RNAseq project on this species of mite and there is a genome available, though its not great. Genome is 500mb and 40%GC.

    I have spoken with a number of people and they recommended ddRADseq as appropriate and most efficient in terms of cost. So after reading multiple papers on designing the experiment but would like some guidance and advice !

    I have access to thousands of mites from multiple locations globally and so as usual I could get greedy in my sample numbers if necessary and appropriate.

    Where to begin ?! I aim on using a HiSeq2500 and have read that recommended coverage of 20x is appropriate. I would like advice or pointers on determining best target seq range / plexity and what other variables I need to consider.

    Thanks for reading and cheers for any help you can give.

  • #2
    My usual advice is to calculate a perfect multiplexing level of # sites X read depth and then go much higher. Say you choose to look at 10,000 loci, so the first calculation is 10,000 x 20 = 200,000 reads per sample.

    But now assume a 4-fold to 10-fold change in read number from sample to sample. So if the average read number is 200,000 there will be samples at 50,000 reads and samples at 1,000,000 reads.

    Because ddRAD has loci defined by two cut sites, each locus has a particular size. If you take a fragment size range of 150-300 bp, the 150 bp fragments will PCR much more than the 300 bp fragments. GC content of the fragment will also affect amplification. So you will have another 10-fold range in locus read depth. Some loci in the low-end samples that got just 50,000 reads will get 1 or 2X read depth instead of the already low 5X.

    If you have lots of samples, you will have to make many libraries. So some loci will be in some libraries and not others since the size selection is not perfectly consistent. If you are selecting a wide size range you may just have variable presence of 1000 or fewer loci, but with a tight size range this could be a larger problem.

    If your ddRAD fragment has a rare-cutting enzyme and a frequently cutting enzyme, and the population diverse genetically, then you also have to worry about missing data from locus drop-out (see Table 1 of http://www.ncbi.nlm.nih.gov/pubmed/23551379). The issue is that both cut sites, if affected by polymorphism, will drop the locus from the library in that sample. Also, a polymorphism can also create the frequent cut site at the dozen or so "almost cut sites" in a fragment, also removing it from the library.

    So if you are wanting to assay 10,000 sites consistently across the population, it is a good idea to accept some missing data in 10% or more of the samples, and you'll want to think about your size selection step and if the genetic diversity of the population will lead to unacceptable levels of drop-out. You can add more loci to partly compensate, but they will also be susceptible to locus drop out, and you may want to avoid 4-cutters or degenerate 5-cutters if the population diversity is very high.

    Sounds like a neat project!
    Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

    Comment


    • #3
      Thanks for the quick and helpful reply.

      A few further questions ! You say an approximate 10000 loci as an example. Is this a good starting point ? How do I figure the number of loci I should aim for ? Can I get an idea through looking at the existing genome. I also have RNAseq data between some populations under pesticide selection and could maybe estimate SNPs from neutral genes in that data ?

      The population diversity is likely to be low; the mites are invasive in regions for only 30 - 100 years and a high percentage of matings are full sibling. This might limit the loci drop-out ?

      Comment


      • #4
        It sounds like you'll want to assay more than 10,000 sites. If the total population genetic diversity is low, then you'll have a low rate of converting loci to polymorphic SNPs. But it sounds like you might have low diversity within a sample (inbred local population) and unknown total diversity (some inbred populations might be very different from each other). The RNA-Seq data should be a big help in determining what the structure looks like. For a GWAS study you may want more than 10,000 SNPs depending on the size of the blocks of linkage disequilibrium. The shorter they are the more SNPs you'll need. The RNA-Seq data may help there as well since you can find SNPs that are 1kb, 10kb apart and see how often linkage breaks.

        If the local population is inbred and you need lots of SNPs and have lots of samples and a limited budget (who doesn't?), then an approach slightly riskier would be to sample lots of loci at low coverage. 10,000 loci at 20X coverage takes the same # of reads as 60,000 at 3X coverage. Just assume the loci are homozygous and that there are very few heterozygous alleles to be missed by using such low coverage anyway. I'm not totally sure how that might affect the GWAS analysis though... again, it probably depends on how inbred they are.

        How much DNA can you get from a mite? Do they have endosymbionts like wolbachia? How much of the DNA is mite versus DNA from whatever they eat. We see small insects tend to have a lot more gut-contents DNA in a sample than larger organisms. If you can determine this then you can give more reads to overcome the wasted reads going to other genomes.
        Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

        Comment


        • #5
          I keep forgetting to mention, but if you are in Scotland like your Location says, then some of the most experienced people in using RAD were in the Gene Pool sequencing facility, which is now Edinburgh Genomics. You might check in with them since a local source can be a big help. I'm always happy to discuss projects, though!
          Providing nextRAD genotyping and PacBio sequencing services. http://snpsaurus.com

          Comment


          • #6
            That is really a nice post!! You know I also need the Pest control Port Macquarie services for conducting the inspection at my home. Actually I saw a couple of bed bugs in my living room so I want to get the services as soon as possible. Do you have any recommendations?

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Essential Discoveries and Tools in Epitranscriptomics
              by seqadmin




              The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
              04-22-2024, 07:01 AM
            • seqadmin
              Current Approaches to Protein Sequencing
              by seqadmin


              Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
              04-04-2024, 04:25 PM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Today, 08:47 AM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-11-2024, 12:08 PM
            0 responses
            60 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 10:19 PM
            0 responses
            59 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 04-10-2024, 09:21 AM
            0 responses
            53 views
            0 likes
            Last Post seqadmin  
            Working...
            X