Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • P1 value vs read quality

    We're sequencing a mammalian genome on RSII for ~70x coverage so we're seeking to sequence approx 300 (or more) SMRT cells.
    We've done 80 SMRT cells so far and we're trying to recalculate the number of SMRT cells we need to sequence.

    Looking at the finished SMRT cells, we found that P1 values correlate with yield quite well when ranging between 25~45%, but read length (average length and N50) don't seem to be affected by P1 value. P1 of 45% would give ~750Mb per cell, while P1 of 37% would give a bit less than 600Mb per cell.

    We were told 37% would be the optimal value for P1 to get the best read quality, but from a cost saving perspective, having a hight P1 value (e.g. 45%) would lead to a higher output and save quite a big number of SMRT cells needed.

    Does anyone have any experience on getting an optimal P1 value to achieve both high quality of reads and higher yield per SMRT cell? Any comments will be appreciated.
    Attached Files

  • #2
    The length should be loosely coupled with the loading, since small molecules can outcompete large ones... or so I've been told. Looks like in your tests that's not the case. It probably depends strongly on your size distribution; maybe you don't have many small molecules.

    However, I see no reason why the quality should be in any way related to the loading. Where did you hear that from?

    Also - what kind of movie length are you running? If you're constrained by disposable costs rather than platform time, you can always run longer and generate a bit more data...

    Comment


    • #3
      Originally posted by Brian Bushnell View Post
      The length should be loosely coupled with the loading, since small molecules can outcompete large ones... or so I've been told. Looks like in your tests that's not the case. It probably depends strongly on your size distribution; maybe you don't have many small molecules.

      However, I see no reason why the quality should be in any way related to the loading. Where did you hear that from?

      Also - what kind of movie length are you running? If you're constrained by disposable costs rather than platform time, you can always run longer and generate a bit more data...
      Thanks for the reply.

      When the machine arrived in our sequencing centre, PacBio gave us some bioinformatics training about sequence analysis. And we were told at that time the overloading will lead to more in-dels as the ZMWs might be clogged so the optimal loading would be at P1 of ~37%. However, we did here from some PacBio technicians that the optimal P1 would be 44% when they gave the wetlab training to our sequencing centre.

      And yes we're not worrying much about machine time so the cost saving will be aiming for least number SMRT cells. We're already running 6h movies so I don't think we can go beyond that?

      Comment


      • #4
        37% P1 vs. 45% is open for debate. 37% is the optimal value for P1 (minimal P0 and P2) given a perfect Poisson distribution, but the complexities of loading result in this rarely being a reality. The parameter to watch is P2, increasing percentage of P2 will result in lower quality data. An increase in yield with a significant increase in P2 is not a good approach for denovo assembly. For the plots that you show I would expect plotting P2 instead of P1 would be more informative. I'm guessing that the cells 7-15, which show a reduced N50 with higher P1 have higher P2 compared with the later high P1 cells that don't show any effect on N50.
        Last edited by rhall; 11-12-2015, 12:24 PM.

        Comment


        • #5
          Originally posted by rhall View Post
          37% P1 vs. 45% is open for debate. 37% is the optimal value for P1 (minimal P0 and P2) given a perfect Poisson distribution, but the complexities of loading result in this rarely being a reality. The parameter to watch is P2, increasing percentage of P2 will result in lower quality data. An increase in yield with a significant increase in P2 is not a good approach for denovo assembly. For the plots that you show I would expect plotting P2 instead of P1 would be more informative. I'm guessing that the cells 7-15, which show a reduced N50 with higher P1 have higher P2 compared with the later high P1 cells that don't show any effect on N50.
          I've got updated plotting with both P1 and P2 now.
          But judging from the plotting, P2 doesn't seem to correlate with read length or yield stats? I don't have a good reference to compare to so it's hard to check the sequencing quality at this stage. Just of curiosity, when you talk about "increasing percentage of P2 will result in lower quality data", what kind of quality measuring do you use? # of indels? # of erroneous bases?
          Attached Files

          Comment


          • #6
            Interesting, looking at the P2% it does not appear that it ever gets high enough to have an effect on N50 so it likely isn't having much of an effect in any of the runs on overall quality. I was probably over interpreting stochastic noise. If you can keep the P1 in the 40-55% range, with P2 ~10-15 as in that plot you shouldn't have any problems with data quality for denovo assembly.
            The issue with P2 and data quality is two fold. Firstly the read length is reduced. P2 is a single number, but in reality a ZMW can go from P2->P1 in the process of a run, imagine a ZMWs has two polymerases, what happens is that in the process of the run one stops sequencing. You then start generating sequence from the remaining polymerse, but you don't have as long a time to sequence so the readlengths are shorter. Secondly the detection of multiple loading is not perfect, so it is possible to generate sequence from cross talking polymerases, this reduces accuracy across all error modes. Depending on the experiment this may not be too much of an issue, but in extreme situations it can be very deleterious. It mostly manifests in requiring higher coverage to generate the same quality consensus, or quiver convergence problems.

            Comment


            • #7
              Thanks for the comments.

              Bringing P1 up to ~50% and keeping P2 at ~10-15 sounds like a good plan. In practise will be hard to hit it perfectly every time, but we’ll see what we can do.

              Comment

              Latest Articles

              Collapse

              • seqadmin
                Essential Discoveries and Tools in Epitranscriptomics
                by seqadmin




                The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
                04-22-2024, 07:01 AM
              • seqadmin
                Current Approaches to Protein Sequencing
                by seqadmin


                Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
                04-04-2024, 04:25 PM

              ad_right_rmr

              Collapse

              News

              Collapse

              Topics Statistics Last Post
              Started by seqadmin, Yesterday, 11:49 AM
              0 responses
              15 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-24-2024, 08:47 AM
              0 responses
              16 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-11-2024, 12:08 PM
              0 responses
              61 views
              0 likes
              Last Post seqadmin  
              Started by seqadmin, 04-10-2024, 10:19 PM
              0 responses
              60 views
              0 likes
              Last Post seqadmin  
              Working...
              X