Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • "ideal" insert size

    Has anyone discovered a study or formal recommendation of some sort that gives reason for chosing one ideal insert size for paried-end sequencing on human samples? I have been asked this by our labratory staff and all I can tell them is that a really narrow distribution would be good, but as for insert distance I have little information to go on.
    We do both alignment and assembly on our data.

    Any help appreciated.

  • #2
    You don't mention the platform you're using, but I'd imagine the major constraint is going to be the technical limitations of your sequencer. On Illumina systems longer insert lengths will result in larger, dimmer spots reducing both the amount and quality of data you can obtain. We've run libraries with insert sizes up to about 1kb but I'm not sure I'd want to go much higher than that. There's often no point in having really short inserts either since you'll end up reading through the insert and into the adapter in a significant proportion of your reads.

    The other big issue which may or may not be a factor for you is the amount of material you have. If you perform a very tight size selection then you're reducing the amount of material you have to create your library and you run the risk of getting a big pile of PCR artefacts if you start amplifying from too little material.

    I'm sure there are other considerations specific to your biological application. If you're doing assemblies you might want to look at mate pair libraries which allow the generation of paired sequences separated by much longer distances (2-5kb) whilst still keeping to the insert size limitations of the sequencing platform.

    Comment


    • #3
      Thanks for your input,
      Specifcially I've been asked this by our group who are responsible for illumina sequencing.

      They have cited the trade-off between tight distribution and yield, which makes sense to me.

      What befuddles me is that when I'm asked the question "if you could have any insert size, what would it be?" I don't have much to go on other than we don't want to sequence through the fragment twice. We have restrictions from WTSS, etc. which are driven by the sample, but for WGSS I'm looking for a bioinformatic reason to choose one size over another.

      Shouldn't there be some feature of hg18/hg19 like sines/lines etc. that would necessitate a larger or smaller insert size for WGSS libraries, so that we can make more use of them bioinformatically (aligning and assembly)?

      Comment


      • #4
        This is going eventually to come down to your use case. If you're doing some kind of ChIP experiment then you won't want to increase your insert size too much since you'll lose resolution in your feature detection. I don't do much assembly but my recollection from those that do is that it's useful to have a range of insert sizes (though maybe in separate experiments?) to allow for spanning of short and long repeats.

        Our experience has been that longer read lengths are negating many of the problems of duplicated alignments in remapping experiments. Once you're up to 50bp or so (either paired or single end) then a surprisingly high proportion of 'repeat' sequence is actually mappable. We work in backcrossed strains with no SNPs though, so maybe this is more of an issue if you have more diversity. These days most of the sequences we can't map come from regions not present in the genome assembly (telomeres and centromeres mostly), so there's not much we can do about that.

        Comment


        • #5
          I think your ideal insert size would be somewhere along the lines of the maximum insert and read length that allows you to maximize the throughput of your sequencing platform without saturating your data.

          Comment


          • #6
            I think a lot of these answers are good.

            The optimal insert size depends on your experiment and goals.

            I'm assuming you're not talking ChIP-seq (which often is best doing single-end).

            For exome-seq, something around 200-350 is more than adequate for hitting >99% of the targets and assessing variants. Probably >4 exomes per HiSeq lane doing this based on what I've seen.

            For whole genome, a combination of tightly distributed 200- and 2000-base inserts is optimal for human (for the sake of SV detection). The 2kb insert reads can be fairly low depth--they'll make up for issues mapping over LINEs that you eluded to).

            If you don't care about having the optimal SV detection rate, you can go with 200-350bp whole genome similar to exome without much issue (though the cost may be an issue).

            For the sake of phasing, a less tightly distributed mean 2-3000-base insert would be great (expecting about 1 SNV/1kb).
            Mendelian Disorder: A blogshare of random useful information for general public consumption. [Blog]
            Breakway: A Program to Identify Structural Variations in Genomic Data [Website] [Forum Post]
            Projects: U87MG whole genome sequence [Website] [Paper]

            Comment

            Latest Articles

            Collapse

            • seqadmin
              Strategies for Sequencing Challenging Samples
              by seqadmin


              Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
              03-22-2024, 06:39 AM
            • seqadmin
              Techniques and Challenges in Conservation Genomics
              by seqadmin



              The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

              Avian Conservation
              Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
              03-08-2024, 10:41 AM

            ad_right_rmr

            Collapse

            News

            Collapse

            Topics Statistics Last Post
            Started by seqadmin, Yesterday, 06:37 PM
            0 responses
            10 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, Yesterday, 06:07 PM
            0 responses
            9 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-22-2024, 10:03 AM
            0 responses
            50 views
            0 likes
            Last Post seqadmin  
            Started by seqadmin, 03-21-2024, 07:32 AM
            0 responses
            67 views
            0 likes
            Last Post seqadmin  
            Working...
            X