Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • luc
    Senior Member
    • Dec 2010
    • 469

    insert sizes for RNA-seq

    Has anybody come across studies that have looked into the optimal insert sizes for RNA-seq libraries?

    Would you have recommendations? I assume the optimal size ranges might change with library prep protocols. I am especially interested in recommendations for protocols using RNA fragmentation, random-hexamer-primed 1st strand synthesis, and dUTP incorporation for strand specificity.

    Thanks a lot in advance!
  • Brian Bushnell
    Super Moderator
    • Jan 2014
    • 2709

    #2
    The optimal insert size depends on various factors...

    1) Read length and sequencing platform
    2) Gene and exon length distribution in the target organism
    3) Use of data - assembly vs quantification

    I don't think you can derive a useful number without specifying these things. I like long insert sizes, particularly in organisms with differential splicing, as they are more informative about the source isoform. But it's really experiment-specific.

    Comment

    • luc
      Senior Member
      • Dec 2010
      • 469

      #3
      Thanks, Brian.

      Yes, I should have specified that. I was thinking about Illumina HiSeq systems and transcript quantification as the purpose (e.g. usually single -end 50 bp reads).

      I imagine random-priming will cause some bias against smaller fragments. Illumina flowcell clustering on the other hand is more efficient for smaller fragments. The chemical fragmentation is very likely approximately random; nevertheless there is likely some bias as to which transcripts of specific size ranges (lets say about 400 bp transcripts compared to 3kb transcripts) show up as fragments of a specific size range (e.g. 150 bp inserts or 300 bp inserts)?
      Very likely it would be best to look at some ERCC spike-in data.
      Last edited by luc; 08-11-2014, 02:57 PM.

      Comment

      • Brian Bushnell
        Super Moderator
        • Jan 2014
        • 2709

        #4
        With 50bp single-end reads, there is no reason to shoot for a long insert size, and for quantification, short inserts will be less biased anyway. I don't know what kind of biases are introduced by the different fragmentation methods, though I understand that "random hexamer priming" is actually pretty non-random, so it seems like something to avoid for accurate quantification of small transcripts.

        Also, the shorter your insert sizes, the less genetic material or amplification you will need. So it seems like you should go as short as possible; maybe 100bp.
        Last edited by Brian Bushnell; 08-11-2014, 04:04 PM.

        Comment

        • turnersd
          Senior Member
          • May 2011
          • 115

          #5
          Don't mean to side-track this discussion too much, but I'm noticing I have very poor coverage of a relatively small transcript (1200bp) after rRNA reduction and 2x100 sequencing, need to check on insert size. What are some of the upstream library prep steps that have been discussed here that could result in this poor coverage? That is, could you help me understand why random hexamer priming biases against coverage of small transcripts? How does the insert size affect this small-transcript coverage?

          Thanks.

          Comment

          • Brian Bushnell
            Super Moderator
            • Jan 2014
            • 2709

            #6
            If the random hexamers are not completely random (in terms of their concentration or binding affinity), then transcripts rich in the more concentrated/better-binding hexamers will be overrepresented and those poor in them will be underrepresented. The shorter a transcript is, down to a limit of 6bp, the more highly skewed the abundance distribution of its hexamers is likely to be. 1200bp is probably fairly long for that to play a major role.

            Also, the longer the insert relative to the transcript, the fewer available start/stop positions there are. Considering a 600bp transcript, there's no longer any place an 800bp insert fragment can originate. But assuming you kept 600bp and smaller fragments, the majority of fragments from that transcript would be expected to be the whole unsheared transcript, starting at one end and ending at the other with no coverage in the middle (since only the 2 outermost 100bp sections would be sequenced).

            Comment

            • turnersd
              Senior Member
              • May 2011
              • 115

              #7
              Thanks for the helpful explanation, Brian.

              Comment

              Latest Articles

              Collapse

              • SEQadmin2
                Nine Things a Sample Prep Scientist Thinks About Before Sequencing
                by SEQadmin2


                I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

                Here are nine questions we think about, in roughly the order they matter, before...
                06-18-2026, 07:11 AM
              • SEQadmin2
                From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
                by SEQadmin2


                Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


                The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
                ...
                06-02-2026, 10:05 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, Today, 05:37 AM
              0 responses
              5 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-26-2026, 11:10 AM
              0 responses
              16 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-17-2026, 06:09 AM
              0 responses
              50 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-09-2026, 11:58 AM
              0 responses
              110 views
              0 reactions
              Last Post SEQadmin2  
              Working...