No announcement yet.

Wacky large PE insert results from MiSeq

  • Filter
  • Time
  • Show
Clear All
new posts

  • Wacky large PE insert results from MiSeq

    In response to an off-hand comment by our Illumina FAS we decided to try 1.5 kb PE reads from libraries made using the TruSeq DNA Prep kit. I did not expect it to work, but it seemed worth a shot because paired-end libraries are much easier to make than mate-end (mate pair) libraries.

    Instead of failing completely we got final results indicating that our inserts were ~500 bp shorter than we thought they would be.

    We made 6 libraries of this sort, all with similar results. We fragmented the DNA using the 1.5 kb Covaris protocol. Here is the size distribution from an Agilent 7500 DNA chip:

    Some short fragments there, so we used 0.5:1 AmPure:Sample volume clean-ups at any place in the protocol where AmPure was called for. In addition we did a double 0.5:1 AmPure clean-up prior to ligation. After library construction, but before enrichment PCR, our size distribution looked like this on a DNA High Sensitivity chip:

    Story so far: started with DNA fragmented to a modal length of 1.7 kb. Added adapters and did several 0.5:1 AmPure clean ups and our modal length shifts to 2.25 kb. The adapters add 125 bp and, because they are forked, tend to appear to add more than that. Arguably a consistent result.

    [Section added a few hours after initial post:

    Here is the chip of the sample after 4 cycles of enrichment PCR:


    Okay, then we fired up a MiSeq run at 1/2 normal density (4 pM instead of 8 pM) as calculated using a Kapa qPCR kit and adjusting for relative size of the reference library (phiX) and the large-insert libraries.

    Cluster density is right where we expect it, around 400K clusters/mm2.

    We do 2x150 + index run on the MiSeq. I grab the fastq file for the above sample and map it back to a previous ABySS assembly of a ~50 million base fungal genome from which the DNA derives. I get high mapping rates. However the size distribution, as determined using the TLEN (column 9) values of all >0 contain-records is this:

    BWA also estimated the pair distances:

    [infer_isize] (25, 50, 75) percentile: (883, 1045, 1199)
    [infer_isize] low and high boundaries: 251 and 1831 for estimating avg and std
    [infer_isize] inferred external isize from 82848 pairs: 1032.148 +/- 249.468
    [infer_isize] skewness: -0.414; kurtosis: 0.376; ap_prior: 1.00e-05

    Any ideas where we lost the longer inserts? There is the obvious:

    (1) During enrichment PCR there would have been some bias to shorter fragments. But we only did 4 cycles and used long extensions.

    (2) During clustering short amplicons may predominate over long amplicons. But to this extent?

    (3) Less obvious, but Agilent chip size distributions are mass-based, not count based. So a molar adjustment would tend to shift the mode of the peaks we see to the left.

    We did not do any gel cuts, nor any reverse AmPures.


    Last edited by pmiguel; 06-14-2012, 08:40 AM. Reason: Added post enrichment chip image

  • #2
    Probably a combination of the three you suggested. For fun once I took the raw bioanalyzer data for a trace (so, time and fluorescence value), and then I held the cursor over different parts of the trace to see what bp corresponded to different times. So, I had time/bp/fluorescence values. I then did a transformation under the assumption that the amount of fluorescence units given off by a fragment is linearly proportional to the length of the fragment. I basically plotted (fluorescence units / bp) vs. bp. In my mind this should transform the "mass" plot to "molar" plot. Keep in mind your traces are plotted with the x axis being non-linear; doing what I did transforms that as well to linear.

    I have no idea if the fluorescence being linearly proportional to the length of the fragment is correct; it would be easy to test if you had say 200bp, 600bp, and 1000bp PCR products all purified to the same concentration, and then loaded them on the chip and observed if the fluorescence values correspond linearly. But I've never bothered to do that.

    Anyways, so you can try that and see what it looks like. The annoying part for me was using the cursor to get the bp vs. time measurement; I never found a way to export those automatically.


    • #3
      Originally posted by Heisman View Post
      The annoying part for me was using the cursor to get the bp vs. time measurement; I never found a way to export those automatically.
      That part is easy -- just do a File:export and click what you want exported. You can export in .csv and pull the file into excel. One thing is the migration rate of the ladder is not a very good log-linear fit:

      But the Agilent software uses a "point to point" fit to the ladder to convert to size...



      • #4
        Okay, I did a limited transformation of my migration times into sizes using an exponential equation generated by excel. I also divided the intensity by the estimated length to convert from a mass to to a counting scale. Finally I subtracted 121 bp of adapter added to the fragments, but not included in the calculations based on BWA alignments. Here is what I get:

        It should correspond to the graph I duplicate from above:

        So, either the High Sensitivity chip is giving me the wrong size by quite a bit or there are other factors strongly biasing towards the shorter fragment in the library.



        • #5
          I love this thread...sorry I missed it earlier!

          Without going into more complex biophysics of clustering (which I'm clearly not qualified to do), a 1500bp cluster would have (very) roughly ~56x the area of a 200bp cluster. Thus assuming the same extension efficiencies across fragment sizes and the same likelihood of priming...thus the same yield of molecules across sizes. So basically per unit of flowcell area, the clusters would be extremely low intensity.

          More cycles in clustering would potentially fix this...but unless your size distribution is very tight (as yours is) I would guess that more cycles would cause the short fragments to not only grow large but be very intense (unless they exhaust the flow cell primers).

          Maybe there is hope in the future for a "large fragment" clustering protocol and lower recommended cluster densities. Time to tinker with clustering recipes! For science!


          • #6
            Empirically it does not seem to be even close to a length^2 difference. The Full Width at Half Maximum (FWHM) is a measure of the diameter of a circle defined by a ring of pixels at 1/2 that intensity of the central, maximum pixel intensity of a cluster. For the 1.5 kb amplicon MiSeq run, it looks like:

            Larger, but not 9x larger than a more normal 0.5 kb amplicon run:



            • #7
              I wonder if the relatively minor increase in FHWM with larger fragments is because the image processing software doesn't recognize larger clusters, possibly because the signal is less intense moving away from the cluster center, or if it's because the clusters really are smaller. or something else?

              either way, we are going to try clustering and sequencing some large fragments on the miseq.

              does anyone know if increasing the cycle count is possible and straightforward in the miseq interface?


              • #8
                One thing to keep in mind is that increasing the cycle count will increase cluster FWHM. Above see the FWHM increase after turn-around. Turn-around involves doing some extra cycles of bridge PCR, I am told.



                • #9
                  On a related note, we've also seen that the High Sensitvity chip over estimates library size. We've seen libraries that run ~330bp on the DNA 1000 run closer to 400bp when diluted 1:10 and run on a HS chip.


                  • #10
                    Any chance you would share the .xad's of those?

                    Just the lanes of interest, and the ladders using the "save selected sample" option under the 2100 Expert software "File:" dropdown menu option.

                    Probably not an issue, but I have seen lots of cases where the software calls a ladder band, or a spike-in MW control incorrectly. Also, the shapes of the library peaks are of some significance.



                    • #11
                      We did one run on an "old" Miseq with an average fragment size of 1.2 kb (including Truseq-style adapters) at normal clustering density and got high quality results (2 x 6.26 million reads) from our core service. We will try a 1.5 kb library next.