Like all molecular biology applications, next-generation sequencing (NGS) workflows require diligent quality control (QC) measures to ensure accurate and reproducible results. Proper QC begins at nucleic acid extraction and continues all the way through to data analysis. This article outlines the key QC steps in an NGS workflow, along with the commonly used tools and techniques.
Nucleic Acid Quality Control
Preparing for NGS starts with isolating the target nucleic acids. Once extracted, the DNA or RNA must be assessed to confirm its suitability for library construction. For some applications, certain levels of impurities may be tolerated. Additionally, when samples are limited or valuable (e.g., FFPE or clinical samples), researchers may need to utilize their extractions regardless of quality concerns. However, for most NGS workflows, proper QC of isolated nucleic acids is essential to ensure sufficient concentration, purity, and integrity for reliable library preparation and downstream sequencing.
During this stage, researchers should first assess sample concentration to confirm extraction success and ensure compatibility with the concentration range required for library preparation. This is commonly done using spectrophotometric methods or fluorescent dyes. Spectrophotometers offer a convenient way to measure concentration and detect impurities, though accuracy can be reduced by contaminants like proteins. Fluorometric instruments provide greater accuracy in quantification but are unable to detect impurities.
The next step is to assess nucleic acid size and fragment distribution. While gel electrophoresis can be used for this purpose, microfluidic capillary systems such as Agilent’s TapeStation and Femto Pulse or QIAGEN’s QIAxcel are now preferred for their speed, accuracy, and high throughput. Microfluidic capillary electrophoresis also gives a fair estimate of nucleic acid concentration, as well as the detection of impurities like degradation or unwanted DNA contamination for RNA samples.
In cases where there are impurities, fragmentation, or a low concentration of nucleic acid, researchers can utilize several tools to improve their yields and clean up their samples. This is most often done using commercialized spin columns or magnet beads (e.g., AMPure XP beads) that capture the nucleic acid of interest and wash away impurities or nucleic acids outside of the desired size range. Other tools include size selection instruments for target enrichment of nucleic acids like Ranger® Technology from YourGene Health or Pippin HT from Sage Science.
Once the target nucleic acids have been isolated and purified, the next step is library preparations.
Post-Library Prep QC
Preparing an NGS library involves converting the target nucleic acids into a compatible form for the respective sequencing platform. While some QC steps may be included during longer library preparation workflows, most QC is performed afterward to ensure that the libraries are properly constructed and sequencing-ready. The type of QC required at this stage varies depending on the sequencing platform, but it typically includes verifying library concentration, fragment size distribution, and purity, as well as detecting any contaminants that could interfere with sequencing.
In general, the same QC tools and techniques used after nucleic acid extraction are also applied post-library prep. However, there is a key difference when it comes to measuring concentration: at this stage, researchers may opt for qPCR to obtain more accurate quantification. These qPCR assays provide precise concentration measurements by using specific primers that bind to adapter regions unique to functional library molecules. This ensures that the sequencer is not over- or underloaded, which can affect the quality and quantity of data.
A common concern during this stage is the presence of adapter dimers and other unwanted byproducts from library preparation, especially in workflows involving amplification or adapter ligation. If residual adapters or primer dimers are detected, additional cleanup steps, such as AMPure XP bead purification or size selection techniques (e.g., Sage Science’s Pippin Prep), can be employed to remove them.
The final QC step at this stage is ensuring that libraries are normalized before sequencing. Sequencing platforms like Illumina require libraries to be pooled at specific molar concentrations to achieve balanced sequencing coverage across samples. This normalization can be done through manual dilution or by using bead-based normalization methods or enzymatic approaches designed to equalize input concentrations before sequencing.
Post-Sequencing QC
QC doesn’t stop after sequencing. In fact, one of the most critical QC steps is evaluating the raw sequence data to identify potential issues before starting the analysis. This process begins with assessing the raw data, and FastQC is one of the most popular tools for this purpose1. It provides important metrics such as base quality scores, GC content, overrepresented sequences, and sequence duplication levels. Another valuable tool is MultiQC, which aggregates QC reports from multiple sources (including FastQC) into a single, comprehensive summary2. While MultiQC does not perform the analysis itself, it is particularly useful for saving time by compiling QC reports and visualizing trends across multiple datasets or samples.
After the initial assessment, the next step is trimming low-quality based and removing any adapter sequences. This improves overall read quality and prevents adapter contamination in downstream analysis. Trimming can be performed using tools like Trimmomatic, Skewer, Cutadapt, and Fastp, which can also provide quality profiling3,4,5,6.
With the expansion of long-read sequencing platforms, several QC tools have been developed specifically for long-read data. Oxford Nanopore Technologies (ONT)-specific tools include PycoQC and Porechop7,8. PycoQC computes metrics and generates interactive QC plots for ONT data, while Porechop is used for adapter trimming and quality filtering, though these tools are no longer supported.
For broader long-read QC needs, NanoPack provides visualization and processing tools for ONT and PacBio long-read data, while Filtlong improves QC by filtering low-quality, adapter-contaminated, or off-length reads9,10. LongQC provides QC for both PacBio and ONT long reads, offering sample QC to assess data readiness and platform QC for sequencing performance evaluation11. Finally, LongReadSum is a multi-threaded QC tool that delivers fast, comprehensive metrics and basecalling signal analysis for long-read sequencing data across major platforms, including ONT, PacBio, and Illumina Complete Long Reads12.
References
- Andrews, S. (2010). FastQC: A quality control tool for high throughput sequence data [Online]. Retrieved from http://www.bioinformatics.babraham.a...ojects/fastqc/
- Ewels, P., Magnusson, M., Lundin, S., & Käller, M. (2016). MultiQC: Summarize analysis results for multiple tools and samples in a single report. Bioinformatics, 32(19), 3047–3048. https://doi.org/10.1093/bioinformatics/btw354
- Bolger, A. M., Lohse, M., & Usadel, B. (2014). Trimmomatic: A flexible trimmer for Illumina sequence data. Bioinformatics, 30(15), 2114–2120. https://doi.org/10.1093/bioinformatics/btu170
- Jiang, H., Lei, R., & Ding, S. W. (2014). Skewer: A fast and accurate adapter trimmer for next-generation sequencing paired-end reads. BMC Bioinformatics, 15, 182. https://doi.org/10.1186/1471-2105-15-182
- Martin, M. (2011). Cutadapt removes adapter sequences from high-throughput sequencing reads. EMBnet Journal, 17(1), 10–12.
- Chen, S., Zhou, Y., Chen, Y., & Gu, J. (2018). fastp: An ultra-fast all-in-one FASTQ preprocessor. Bioinformatics, 34(17), i884–i890. https://doi.org/10.1093/bioinformatics/bty560
- Leger, A., & Leonardi, T. (2019). pycoQC, interactive quality control for Oxford Nanopore sequencing. Journal of Open Source Software, 4(34), 1236. https://doi.org/10.21105/joss.01236
- Wick, R. R., Judd, L. M., Gorrie, C. L., & Holt, K. E. (2017). Completing bacterial genome assemblies with multiplex MinION sequencing. Microbial Genomics, 3(10), e000132. https://doi.org/10.1099/mgen.0.000132
- De Coster, W., D’Hert, S., Schultz, D. T., Cruts, M., & Van Broeckhoven, C. (2018). NanoPack: Visualizing and processing long-read sequencing data. Bioinformatics, 34(15), 2666–2669. https://doi.org/10.1093/bioinformatics/bty149
- Wick, R. R. (2018). Filtlong [Internet]. GitHub. Retrieved from https://github.com/rrwick/Filtlong