Header Leaderboard Ad


Metrics that Matter: Important Metrics for Long-Read Sequencing Experiments—Part 1



No announcement yet.

  • Metrics that Matter: Important Metrics for Long-Read Sequencing Experiments—Part 1

    Click image for larger version  Name:	Metrics.jpg Views:	0 Size:	906.8 KB ID:	324411

    Long-read technologies have repeatedly demonstrated their value in genomics research. They’ve been used for improving genome assemblies1, haplotype phasing2, structural variant detection3, and epigenetic analysis4, among many other significant applications. When performing long-read sequencing experiments, it’s important to evaluate the success of the sequencing run using different metrics. Some metrics may be more relevant than others depending on the scope of the study, and no one metric can cover all the aspects of the sequencing run. This two-part article series will focus on the important long-read sequencing metrics for the two current long-read technology providers: Oxford Nanopore Technologies (ONT) and Pacific Biosciences (PacBio).

    Oxford Nanopore Technologies

    When asked about how to evaluate the success of an ONT sequencing run, a spokesperson for the group replied succinctly, “A successful run is where the user gets enough of the data they need, at a suitable quality, in the best timeline, to answer their biological question.” They also added that before addressing whether a run was successful, it is key to mention the importance of planning and designing a successful experiment in relation to the biological question first and then monitoring the experiment's success.

    In that regard, they clarified that most metrics used are dependent on the application and therefore we’ll focus on metrics related to several important applications, as well as some more universal metrics to evaluate the experiment.

    Genetic variation
    One important application of ONT sequencing is utilizing long and ultra-long reads to investigate genetic variation. Ultra-long sequencing involves sequencing very long fragments of native DNA/RNA which allows users to “unlock more comprehensive insights,” the spokesperson said. They also explained that many Oxford Nanopore users are interested in using these reads for, “Achieving a fuller picture of genetic variation, including not only SNPs and indels, but also larger scale variation such as repetitive regions, SVs, and complex CNVs in real time and covering the entirety of the genome or region of interest.”

    When evaluating ultra-long reads, they recommended using the read N50 and read length distribution to “ensure you are sequencing long fragments and ultimately getting long reads,” all of which is completed through an onboard facility in the operating software, MinKNOW™. The read N50 is also a commonly used statistic in computational biology for assessing sequence assemblies. “The read N50 indicates that 50% of the total data you have generated is contained in reads of this size or larger,” the spokesperson explained. “For example, if there is a read N50 of 20kb, this means that 50% of the total base pairs sequenced are contained in reads of this size or larger.”

    They believe that while the read N50 can be useful in many applications, it can additionally be utilized to confirm that the read lengths match the fragment distribution (for ligation sequencing), and ultimately, that the library prep was successful. “From this, you can see the read length distribution including the longest reads in the sequencing run, which can be important for applications where longer reads will be beneficial, such as phasing or whole genome assembly,” they noted.

    The next metric, read lengths, is fairly intuitive but still valuable for understanding the sequencing run. The ONT spokesperson indicated that the range of read lengths is specific to the application and is relative to what you would expect from the extraction of library preparation of the sample. Furthermore, they stated that ONT instruments allow for real-time data streaming for dynamic monitoring, which also allows users to continuously monitor these read lengths.

    This type of dynamic monitoring can also be particularly useful in other applications with fixed read lengths, such as amplicon sequencing. The spokesperson explained, “During sequencing, it is possible to see the read lengths, and if they correspond to the amplicon size, you can tell that the run is successful in the context of the expected read length. Additionally, when looking at long read lengths it is possible to use the real-time feedback to decide whether to continue the run or prepare a new sample and reload that.”

    Epigenetics and methylation analysis
    In addition to investigating genetic variation, another popular ONT application is base-modification analysis. This type of analysis can “be performed alongside nucleotide sequencing on the same single read without the need to run multiple sequencing experiments,” said the spokesperson. “Unlike traditional technologies (for example, bisulfite sequencing for methylation), no additional complex library preparation is required, and epigenetic modification analysis can be performed across the whole genome during the experiment.”

    Users can perform their methylation analysis with “Remora,” a real-time analysis tool integrated into MinKNOW™ which runs parallel to standard basecalling. From there, users can obtain the critical information they need to assess their epigenetic experiments.

    Whole genome sequencing
    Whole genome sequencing, like genetic variation studies, can be performed with ultra-long sequencing and assessed with the read N50 and read length distribution. However, the ONT spokesperson added, “For whole genome sequencing of human samples for variant detection, read coverage will be important, and assessing the total amount of data generated will correspond to the coverage. This is also possible through MinKNOW™.” Users can then monitor their runs and ensure they obtain the read coverage they need to complete their analysis.

    Pore occupancy
    Pore occupancy is a metric specific to ONT sequencers that is used to assess a variety of applications. The spokesperson specified that pore occupancy refers to when, “a nanopore in a single well is actively sequencing DNA/RNA, rather than ‘sitting empty.’” They added, “In most sequencing runs, pore occupancy will be a good indicator of the efficiency of your library preparation and sequencing run.”

    Additionally, pore occupancy characterizes the number of pores that are actively sequencing against the total number of pores available. In general, the higher the better for the pore occupancy; however, the spokesperson clarified, “For some applications, this will be naturally lower, such as cas9 enrichment or adaptive sampling in silico enrichment/depletion of target genomic regions, given that only a subset of reads present in the pool will be of interest. Hence, [they are] selectively sequenced and the system will be automatically rejecting reads in some pores that do not overlap with the target(s) of interest.”

    The pore occupancy is determined by the percentage of available pores that are actively sequencing. “This can be useful when someone accidentally loads a blank or unprepared library as the occupancy will be close to zero,” said the spokesperson. “If after a few moments, the occupancy is near zero, then the run can be stopped and checked to ensure that the correct sample has been loaded.”

    When asked about the acceptable range of this metric, the spokesperson responded that it is slightly different from many other metrics. There is normally an approximation of the total library loaded onto the flow cell, which can be used to determine the expected number of pores to be actively sequencing. Furthermore, they noted, “In some applications, like cas9 targeted sequencing, [the pore occupancy] is expected to be lower due to the application but similar to other applications. If the protocol is followed, there will be an approximation of the amount of the library loaded and this will then correspond to an expected percentage of pores that will be actively sequencing.”

    Translocation speed
    Translocation speed is another helpful metric that can be used across many applications. “The translocation speed represents the speed at which a molecule is passaged through a pore by the sequencing adapter and should correspond to the expected speed for the pore and kit,” the ONT spokesperson specified. Again, this metric can be monitored in real-time using MinKNOW™, which shows a plot for the optimal range of the translocation speed and the average translocation speed for the run-over time. The spokesperson added, “You get similar real-time plots for the estimated read q-score and all the other metrics mentioned, allowing for real-time monitoring of all parameters; for more detailed q-scores, alignment is recommended to a reference genome.”

    Run time/time to result
    Lastly, but also importantly, ONT sequencing doesn’t have fixed run times, allowing users to access and analyze the data as it is generated. As emphasized by the ONT spokesperson, “You can assess the performance of a run in the context of your biological question as data is generated.” This is valuable across many applications where rapid analysis is critical like, “situations from pathogen identification to cancer,” they said.

    Citing a relevant example where a researcher may need to classify a particular pathogen species, the spokesperson concluded, “This means that [the researcher] can start sequencing, analyze the data as it is generated and, when they reach sufficient confidence in species identification, the run can be stopped. Therefore, the success of the run can be judged by the classification and results obtained rather than waiting for the run to finish.”

    1. Wenger AM, Peluso P, Rowell WJ, et al. Accurate circular consensus long-read sequencing improves variant detection and assembly of a human genome. Nature Biotechnology. 2019;37(10):1155-1162. doi:https://doi.org/10.1038/s41587-019-0217-9
    2. Roe D, Williams J, Ivery K, et al. Efficient Sequencing, Assembly, and Annotation of Human KIR Haplotypes. Frontiers in Immunology. 2020;11. doi:https://doi.org/10.3389/fimmu.2020.582927
    3. Cretu Stancu M, van Roosmalen MJ, Renkens I, et al. Mapping and phasing of structural variation in patient genomes using nanopore sequencing. Nature Communications. 2017;8(1). doi:https://doi.org/10.1038/s41467-017-01343-4
    4. Liu Q, Fang L, Yu G, Wang D, Xiao CL, Wang K. Detection of DNA base modifications by deep recurrent neural network on Oxford Nanopore sequencing data. Nature Communications. 2019;10(1). doi:https://doi.org/10.1038/s41467-019-10168-2
      Please sign into your account to post comments.

    Latest Articles