Unconfigured Ad

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts
  • luc
    Senior Member
    • Dec 2010
    • 469

    insert sizes for RNA-seq

    Has anybody come across studies that have looked into the optimal insert sizes for RNA-seq libraries?

    Would you have recommendations? I assume the optimal size ranges might change with library prep protocols. I am especially interested in recommendations for protocols using RNA fragmentation, random-hexamer-primed 1st strand synthesis, and dUTP incorporation for strand specificity.

    Thanks a lot in advance!
  • Brian Bushnell
    Super Moderator
    • Jan 2014
    • 2709

    #2
    The optimal insert size depends on various factors...

    1) Read length and sequencing platform
    2) Gene and exon length distribution in the target organism
    3) Use of data - assembly vs quantification

    I don't think you can derive a useful number without specifying these things. I like long insert sizes, particularly in organisms with differential splicing, as they are more informative about the source isoform. But it's really experiment-specific.

    Comment

    • luc
      Senior Member
      • Dec 2010
      • 469

      #3
      Thanks, Brian.

      Yes, I should have specified that. I was thinking about Illumina HiSeq systems and transcript quantification as the purpose (e.g. usually single -end 50 bp reads).

      I imagine random-priming will cause some bias against smaller fragments. Illumina flowcell clustering on the other hand is more efficient for smaller fragments. The chemical fragmentation is very likely approximately random; nevertheless there is likely some bias as to which transcripts of specific size ranges (lets say about 400 bp transcripts compared to 3kb transcripts) show up as fragments of a specific size range (e.g. 150 bp inserts or 300 bp inserts)?
      Very likely it would be best to look at some ERCC spike-in data.
      Last edited by luc; 08-11-2014, 02:57 PM.

      Comment

      • Brian Bushnell
        Super Moderator
        • Jan 2014
        • 2709

        #4
        With 50bp single-end reads, there is no reason to shoot for a long insert size, and for quantification, short inserts will be less biased anyway. I don't know what kind of biases are introduced by the different fragmentation methods, though I understand that "random hexamer priming" is actually pretty non-random, so it seems like something to avoid for accurate quantification of small transcripts.

        Also, the shorter your insert sizes, the less genetic material or amplification you will need. So it seems like you should go as short as possible; maybe 100bp.
        Last edited by Brian Bushnell; 08-11-2014, 04:04 PM.

        Comment

        • turnersd
          Senior Member
          • May 2011
          • 115

          #5
          Don't mean to side-track this discussion too much, but I'm noticing I have very poor coverage of a relatively small transcript (1200bp) after rRNA reduction and 2x100 sequencing, need to check on insert size. What are some of the upstream library prep steps that have been discussed here that could result in this poor coverage? That is, could you help me understand why random hexamer priming biases against coverage of small transcripts? How does the insert size affect this small-transcript coverage?

          Thanks.

          Comment

          • Brian Bushnell
            Super Moderator
            • Jan 2014
            • 2709

            #6
            If the random hexamers are not completely random (in terms of their concentration or binding affinity), then transcripts rich in the more concentrated/better-binding hexamers will be overrepresented and those poor in them will be underrepresented. The shorter a transcript is, down to a limit of 6bp, the more highly skewed the abundance distribution of its hexamers is likely to be. 1200bp is probably fairly long for that to play a major role.

            Also, the longer the insert relative to the transcript, the fewer available start/stop positions there are. Considering a 600bp transcript, there's no longer any place an 800bp insert fragment can originate. But assuming you kept 600bp and smaller fragments, the majority of fragments from that transcript would be expected to be the whole unsheared transcript, starting at one end and ending at the other with no coverage in the middle (since only the 2 outermost 100bp sections would be sequenced).

            Comment

            • turnersd
              Senior Member
              • May 2011
              • 115

              #7
              Thanks for the helpful explanation, Brian.

              Comment

              Latest Articles

              Collapse

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by SEQadmin2, 06-05-2026, 10:09 AM
              0 responses
              12 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-04-2026, 08:59 AM
              0 responses
              23 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-02-2026, 12:03 PM
              0 responses
              28 views
              0 reactions
              Last Post SEQadmin2  
              Started by SEQadmin2, 06-02-2026, 11:40 AM
              0 responses
              22 views
              0 reactions
              Last Post SEQadmin2  
              Working...