A team of researchers from BGI has unveiled a new nanopore sequencing platform, CycloneSEQ, which promises to enhance long-read sequencing capabilities across a variety of genomic applications. The details of the device are highlighted in a new pre-print. This platform integrates several technological improvements, including advanced motor and pore proteins, a novel chip design, and an innovative basecalling algorithm, enabling higher accuracy and throughput in sequencing.
Enhancing Motor and Pore Protein Performance
Nanopore sequencing relies heavily on motor and pore proteins to translocate and detect nucleic acids with precision. The team at BGI identified novel motor proteins, specifically helicases, through a comprehensive search of deep-sea metagenomic databases. These helicases exhibit unique sequences and structures, with low homology to known helicases, offering potential for new functionality. One such protein, BCH-X, demonstrated strong DNA binding and unwinding activity, essential for maintaining sequencing progression. This led to a high sequencing speed of approximately 380 bp/s under experimental conditions.
Similarly, the researchers developed new pore proteins by mining the same metagenomic databases. These proteins, such as BCP-Y, were engineered to form nanoscale channels with low-noise currents, crucial for high signal complexity and a good signal-to-noise ratio. The combination of BCH-X and BCP-Y resulted in significantly improved sequencing accuracy.
Optimizing Basecalling with Pre-Training
The CycloneSEQ platform also features an improved basecalling algorithm, which leverages a pre-training and fine-tuning approach inspired by advanced speech recognition models. This method uses a dual loss function to enhance the model’s ability to predict DNA bases with greater accuracy. By pre-training the model on a large dataset of unlabeled sequences, followed by fine-tuning with labeled data, the researchers achieved marked improvements in error rates and faster convergence compared to traditional basecalling methods. This approach demonstrated superior performance across multiple species, highlighting the model's ability to generalize well.
Nanopore Local Chemistry Sequencing
One of the key developments of the CycloneSEQ platform is the Nanopore local chemistry (NLC) sequencing method. This technique manipulates the local chemical environment within the nanopore to optimize sequencing conditions. By creating an asymmetric chemical environment around the nanopore, the researchers were able to control the concentration of magnesium ions (Mg2+), which are essential for DNA helicase activity. This control led to enhanced sequencing performance, with current signals that closely matched those of conventional nanopore sequencing methods but with improved stability.
Chip Design for Increased Throughput and Accuracy
The CycloneSEQ platform’s chip design also represents a major advancement in nanopore sequencing technology. The chip is designed with high-density nanopore arrays, allowing for greater parallel processing of nucleic acid strands, thus increasing sequencing throughput. Additionally, the chip’s microwell structure was engineered to maximize electrolyte buffer volume and reduce noise, leading to prolonged stability and enhanced sequencing accuracy. In a 107-hour sequencing run, the CycloneSEQ platform demonstrated the ability to yield over 50 Gb of data with consistent performance, underscoring its potential for long-duration sequencing applications.
Applications in Whole-Genome Sequencing and Beyond
The CycloneSEQ platform was tested on a variety of genomic tasks, including whole-genome sequencing, variant calling, and metagenomic sequencing. For instance, sequencing of the HG002 cell line produced long reads with an N50 value of 33.6 kb and a modal identity of 97.0%, highlighting the platform’s capability for high-throughput, long-read sequencing. The platform also showed promise in metagenomic applications, accurately quantifying microbial abundances in a mock metagenome sample.
Enhancing Motor and Pore Protein Performance
Nanopore sequencing relies heavily on motor and pore proteins to translocate and detect nucleic acids with precision. The team at BGI identified novel motor proteins, specifically helicases, through a comprehensive search of deep-sea metagenomic databases. These helicases exhibit unique sequences and structures, with low homology to known helicases, offering potential for new functionality. One such protein, BCH-X, demonstrated strong DNA binding and unwinding activity, essential for maintaining sequencing progression. This led to a high sequencing speed of approximately 380 bp/s under experimental conditions.
Similarly, the researchers developed new pore proteins by mining the same metagenomic databases. These proteins, such as BCP-Y, were engineered to form nanoscale channels with low-noise currents, crucial for high signal complexity and a good signal-to-noise ratio. The combination of BCH-X and BCP-Y resulted in significantly improved sequencing accuracy.
Optimizing Basecalling with Pre-Training
The CycloneSEQ platform also features an improved basecalling algorithm, which leverages a pre-training and fine-tuning approach inspired by advanced speech recognition models. This method uses a dual loss function to enhance the model’s ability to predict DNA bases with greater accuracy. By pre-training the model on a large dataset of unlabeled sequences, followed by fine-tuning with labeled data, the researchers achieved marked improvements in error rates and faster convergence compared to traditional basecalling methods. This approach demonstrated superior performance across multiple species, highlighting the model's ability to generalize well.
Nanopore Local Chemistry Sequencing
One of the key developments of the CycloneSEQ platform is the Nanopore local chemistry (NLC) sequencing method. This technique manipulates the local chemical environment within the nanopore to optimize sequencing conditions. By creating an asymmetric chemical environment around the nanopore, the researchers were able to control the concentration of magnesium ions (Mg2+), which are essential for DNA helicase activity. This control led to enhanced sequencing performance, with current signals that closely matched those of conventional nanopore sequencing methods but with improved stability.
Chip Design for Increased Throughput and Accuracy
The CycloneSEQ platform’s chip design also represents a major advancement in nanopore sequencing technology. The chip is designed with high-density nanopore arrays, allowing for greater parallel processing of nucleic acid strands, thus increasing sequencing throughput. Additionally, the chip’s microwell structure was engineered to maximize electrolyte buffer volume and reduce noise, leading to prolonged stability and enhanced sequencing accuracy. In a 107-hour sequencing run, the CycloneSEQ platform demonstrated the ability to yield over 50 Gb of data with consistent performance, underscoring its potential for long-duration sequencing applications.
Applications in Whole-Genome Sequencing and Beyond
The CycloneSEQ platform was tested on a variety of genomic tasks, including whole-genome sequencing, variant calling, and metagenomic sequencing. For instance, sequencing of the HG002 cell line produced long reads with an N50 value of 33.6 kb and a modal identity of 97.0%, highlighting the platform’s capability for high-throughput, long-read sequencing. The platform also showed promise in metagenomic applications, accurately quantifying microbial abundances in a mock metagenome sample.