Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • P1 value vs read quality

    We're sequencing a mammalian genome on RSII for ~70x coverage so we're seeking to sequence approx 300 (or more) SMRT cells.
    We've done 80 SMRT cells so far and we're trying to recalculate the number of SMRT cells we need to sequence.

    Looking at the finished SMRT cells, we found that P1 values correlate with yield quite well when ranging between 25~45%, but read length (average length and N50) don't seem to be affected by P1 value. P1 of 45% would give ~750Mb per cell, while P1 of 37% would give a bit less than 600Mb per cell.

    We were told 37% would be the optimal value for P1 to get the best read quality, but from a cost saving perspective, having a hight P1 value (e.g. 45%) would lead to a higher output and save quite a big number of SMRT cells needed.

    Does anyone have any experience on getting an optimal P1 value to achieve both high quality of reads and higher yield per SMRT cell? Any comments will be appreciated.
    Attached Files

  • #2
    The length should be loosely coupled with the loading, since small molecules can outcompete large ones... or so I've been told. Looks like in your tests that's not the case. It probably depends strongly on your size distribution; maybe you don't have many small molecules.

    However, I see no reason why the quality should be in any way related to the loading. Where did you hear that from?

    Also - what kind of movie length are you running? If you're constrained by disposable costs rather than platform time, you can always run longer and generate a bit more data...

    Comment


    • #3
      Originally posted by Brian Bushnell View Post
      The length should be loosely coupled with the loading, since small molecules can outcompete large ones... or so I've been told. Looks like in your tests that's not the case. It probably depends strongly on your size distribution; maybe you don't have many small molecules.

      However, I see no reason why the quality should be in any way related to the loading. Where did you hear that from?

      Also - what kind of movie length are you running? If you're constrained by disposable costs rather than platform time, you can always run longer and generate a bit more data...
      Thanks for the reply.

      When the machine arrived in our sequencing centre, PacBio gave us some bioinformatics training about sequence analysis. And we were told at that time the overloading will lead to more in-dels as the ZMWs might be clogged so the optimal loading would be at P1 of ~37%. However, we did here from some PacBio technicians that the optimal P1 would be 44% when they gave the wetlab training to our sequencing centre.

      And yes we're not worrying much about machine time so the cost saving will be aiming for least number SMRT cells. We're already running 6h movies so I don't think we can go beyond that?

      Comment


      • #4
        37% P1 vs. 45% is open for debate. 37% is the optimal value for P1 (minimal P0 and P2) given a perfect Poisson distribution, but the complexities of loading result in this rarely being a reality. The parameter to watch is P2, increasing percentage of P2 will result in lower quality data. An increase in yield with a significant increase in P2 is not a good approach for denovo assembly. For the plots that you show I would expect plotting P2 instead of P1 would be more informative. I'm guessing that the cells 7-15, which show a reduced N50 with higher P1 have higher P2 compared with the later high P1 cells that don't show any effect on N50.
        Last edited by rhall; 11-12-2015, 12:24 PM.

        Comment


        • #5
          Originally posted by rhall View Post
          37% P1 vs. 45% is open for debate. 37% is the optimal value for P1 (minimal P0 and P2) given a perfect Poisson distribution, but the complexities of loading result in this rarely being a reality. The parameter to watch is P2, increasing percentage of P2 will result in lower quality data. An increase in yield with a significant increase in P2 is not a good approach for denovo assembly. For the plots that you show I would expect plotting P2 instead of P1 would be more informative. I'm guessing that the cells 7-15, which show a reduced N50 with higher P1 have higher P2 compared with the later high P1 cells that don't show any effect on N50.
          I've got updated plotting with both P1 and P2 now.
          But judging from the plotting, P2 doesn't seem to correlate with read length or yield stats? I don't have a good reference to compare to so it's hard to check the sequencing quality at this stage. Just of curiosity, when you talk about "increasing percentage of P2 will result in lower quality data", what kind of quality measuring do you use? # of indels? # of erroneous bases?
          Attached Files

          Comment


          • #6
            Interesting, looking at the P2% it does not appear that it ever gets high enough to have an effect on N50 so it likely isn't having much of an effect in any of the runs on overall quality. I was probably over interpreting stochastic noise. If you can keep the P1 in the 40-55% range, with P2 ~10-15 as in that plot you shouldn't have any problems with data quality for denovo assembly.
            The issue with P2 and data quality is two fold. Firstly the read length is reduced. P2 is a single number, but in reality a ZMW can go from P2->P1 in the process of a run, imagine a ZMWs has two polymerases, what happens is that in the process of the run one stops sequencing. You then start generating sequence from the remaining polymerse, but you don't have as long a time to sequence so the readlengths are shorter. Secondly the detection of multiple loading is not perfect, so it is possible to generate sequence from cross talking polymerases, this reduces accuracy across all error modes. Depending on the experiment this may not be too much of an issue, but in extreme situations it can be very deleterious. It mostly manifests in requiring higher coverage to generate the same quality consensus, or quiver convergence problems.

            Comment


            • #7
              Thanks for the comments.

              Bringing P1 up to ~50% and keeping P2 at ~10-15 sounds like a good plan. In practise will be hard to hit it perfectly every time, but we’ll see what we can do.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Strategies for Sequencing Challenging Samples
                by seqadmin


                Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
                03-22-2024, 06:39 AM
              • seqadmin
                Techniques and Challenges in Conservation Genomics
                by seqadmin



                The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

                Avian Conservation
                Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
                03-08-2024, 10:41 AM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, 03-27-2024, 06:37 PM
              0 responses
              13 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-27-2024, 06:07 PM
              0 responses
              11 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-22-2024, 10:03 AM
              0 responses
              53 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 03-21-2024, 07:32 AM
              0 responses
              69 views
              0 likes
              Last Post seqadmin  
              Working...
              X