From Algorithms to Assemblies: An Interview with Sequencing Analysis Experts—Part 1

Published: 04-05-2023, 11:45 PM
775 views
2 comments
- Share
- Tweet

From Algorithms to Assemblies: An Interview with Sequencing Analysis Experts—Part 1

As sequencing technologies and data analysis tools continue to advance, it is more important than ever to ensure your sequencing data is being handled appropriately. Analyzing sequencing data is a complex process and the current platforms are now capable of performing diverse tasks such as assembling genomes, quantifying transcripts, interpreting detailed experiments, and much more.

In this Q&A article series, we’re interviewing top sequencing analysis providers to understand the workings of their platforms and learn how they handle different aspects of the analysis process.

This first installment of the series will focus on quality control measures and ensuring subsequent analyses are set up for success.

What is your approach to quality control of sequencing data, and how do you ensure the data is high enough quality for downstream analyses?

Richard Moir, Director of Product and Technology, Geneious
Geneious Prime provides a broad set of tools for pre-processing and quality control that can be used in a flexible manner for a wide range of sequencing use cases. Our approach is to provide scientists with an intuitive interface for selecting and configuring the right tools for their data, then, through powerful visualizations, empowering them to explore the results and make their own assessment on the quality of the data.

In many cases, Geneious Prime uses trusted open-source solutions for QC and preprocessing including many from the BBTools package such as bbmerge for merging paired reads, bbnorm for error correction/normalization and bbduk for trimming. Tools are also provided for demultiplexing, sub-sampling and chimera detection.

As each preprocessing step is performed, a new result is saved which can be inspected using summary statistics such as base call quality, read length, GC content and ambiguities. This allows easy comparison of different approaches and rolling back when one approach doesn't work well.

When the process needs to be more standardized and repeatable, the Workflow system in Geneious Prime can be used to create a pre-configured pipeline using a visual editor that can then be run with one click. These workflows can then be exported or shared using the built-in shared database functionality to create a standard operating procedure for a wider group of scientists.

Dr. Ni Ming, Senior Vice-President, MGI

NGS is a bit different from other businesses. All products and analysis services are based on the sequencing data, which means that the quality of this kind of data is vital for most applications. MGI and Complete Genomics ensure quality control in almost all steps of sequencing:Firstly, our sequencing strategy is based on DNA nanoball which ensures minimized PCR cycles to avoid introducing errors during DNA copying.

Following that, we launch automation which significantly reduces the use of manual operations and ensures minimal manual during lab work.

During data analysis, through implementation of our in-house software SOAPnuke1, which is publicly available in github and integrated into MegaBOLT for customer use and testing, we apply a number of key criteria, such as Q30, GC content, adapter contamination rate and more to evaluate sequencing data quality before and after data processing to make sure only data of good quality is fed into subsequent analyses. This effectively elevates the accuracy and reliability of the insights derived from data analyses. In addition, all parameters can be modified to cater to different customer needs.

Finally, we have in place management systems covering the whole sequencing workflow—from sample submission, to sample management, laboratory management, data analysis and reporting.

Our fully automated sequencing and analysis process is an integrated, one-stop analysis with:

a) Comprehensive data filtering and detailed visualization
Once the raw data is generated, it is processed using the self-developed, full-featured SOAPnuke software by filtering reads with adapter, of low quality, and with high N content to obtain high-quality data. At the same time, the statistics of the preprocessed data are added with visualization-based method, for example by looking at the scale of Q30 (i.e. error rate 0 .1%), base and quality status of each sequencing cycle, etc., for better understanding of the data.

b) Flexible parameters adjustment

Because relevant parameters can be flexibly adjusted, and datasets from the same kinds of library and sequencers often have the same data characteristics, users can select the same parameters for similar data, and build business processes according to platform characteristics, in order to evaluate a set of suitable parameter scheme for subsequent production business.

c) Accelerated quality control process
To optimize the whole data quality control process, we have designed a streamlined pipeline and undergone data splitting for big sequencing data and auto-parallelization.

References:
1. Chen Y, Chen Y, Shi C, et al. SOAPnuke: a MapReduce acceleration-supported software for integrated quality control and preprocessing of high-throughput sequencing data. Gigascience. 2018;7(1):1-6. doi:10.1093/gigascience/gix120.

Simon Valentine, Chief Commercial Officer, Basepair

Quality control (QC) of sequencing data is critical to performing successful genomic analyses. With a few clicks, Basepair allows you to QC raw reads from a variety of datatypes (RNA-seq, ChIP-seq, ATAC-seq, single-cell RNA-seq, WGS/WES, etc.) using industry-standard bioinformatics tools, while also providing helpful visualizations and reports to assess your data. The first step in each of our analysis workflows always includes trimming of low-quality bases as well as detection and removal of adapter contamination.

We provide summary statistics and visualizations that allow you to quickly compare each sample within a project in order to check for the proper enrichment, signal/noise ratio, and any potential experimental bias before moving on to downstream analyses.

QIAGEN Digital Insights Team

QIAGEN CLC Genomics Workbench Premium has all the tools scientists need for success with sequencing data analysis – whether they are analyzing RNA, DNA, microbial data, single-cell data, etc. It even has pre-defined workflows that can be customized depending on specific needs. To ensure high-quality sequencing data for downstream analyses using QIAGEN CLC, here are some recommended steps for quality control, which are generally universal:

a. Quality control and trimming of raw data: This is the first step in the sequencing data analysis pipeline; you should check the quality of the raw data and trim any low-quality reads
b. De novo assembly of reads: This will generate an assembled sequence
c. Mapping of reads: The trimmed reads should then be mapped to a reference genome. Important QC metrics are the number of reads mapped to a reference genome, and for panels, the number of reads mapped to target regions in addition to coverage
d. Variant calling: The mapped reads will serve to find germline or somatic variants
e. Post-variant calling quality control: Once variant calling is complete, it is essential to check the quality of the called variants

To ensure the data is high enough quality for downstream analyses, it is essential to follow best practices for quality control and consider the specific requirements for the downstream analysis. It is also possible to validate the analysis results using additional methods, such as PCR or Sanger sequencing. Additionally, it may be helpful to consult with experts in the field and seek advice from the scientific community to ensure your experimental design is robust for the best possible outcome.

Mike Lelivelt, VP of Software Product Management and Marketing, Illumina

Data quality starts with manufacturing at Illumina, ensuring our reagents are created under ISO-certified processes. Once it gets to the customer, each of our instrument runs provides extensive performance feedback to the operator, ensuring everything is within normal tolerances. Each base is assigned a quality score that predicts the accuracy of the base calls. Data that is below tolerance is removed. This is based on the Q scores people are fond of talking about. It’s about the quality of the base call, not the variant. And yes, it is a critical measure, but not the only one.

Quality base-level data from millions of reads are then fed into algorithms to map the reads into the genome and then call variants. Each of these steps has its own performance evaluation process. Illumina routinely benchmarks data against standard performance metrics provided by the Precision FDA Challenges. The Illumina DRAGEN™ pipelines are routinely awarded as the most accurate.

Check out the second, third, fourth, fifth, and sixth (final) installment of our Q&A series!
Tags: None
1 - 2

2 comments
- #1
  
  AndrewO commented
  
  04-10-2023, 07:41 AM
  
  Editing a comment
  
  It's interesting to see how each company took the question a bit differently. I guess it depends on what kind of data you're talking about (RNA, DNA, etc) and at what stage/step in your workflow.
- #2
  
  Ben3 commented
  
  04-12-2023, 06:43 AM
  
  Editing a comment
  
  AndrewO that's true. A lot of these questions could be taken into several different directions so it's interesting to see how each provider answers them.
Please sign into your account to post comments.

Advanced Sequencing Platforms Tackle Neuroscience’s Toughest Genomics Problems

by SEQadmin2

Genomics studies in neuroscience face a special challenge due to the brain’s complexity and scarcity of samples. Mapping changes in cell type and state using conventional next-generation sequencing methods remains challenging. Advances in technologies like single-cell sequencing, spatial transcriptomics, and long-read sequencing have opened the door to deeper studies of the brain and diseases like Alzheimer’s, amyotrophic lateral sclerosis (ALS), and schizophrenia.
...
- Channel: Articles
07-09-2026, 11:10 AM
Cancer Drug Resistance: The Lingering Barrier to Rising Survival

by SEQadmin2

Cancer survival rates have significantly increased in the last few decades in the United States, reaching a combined 70% 5-year survival rate by 2021. Behind this number, there are years of research to find new therapies, drug targets, and early detection methods. But there is one core challenge that keeps slowing down these advances, and it’s about drug resistance.

There is no single reason why many patients don’t respond to treatment as expected. Cancer is...
- Channel: Articles
07-08-2026, 05:17 AM
Nine Things a Sample Prep Scientist Thinks About Before Sequencing

by SEQadmin2

I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.

Here are nine questions we think about, in roughly the order they matter, before...
- Channel: Articles
06-18-2026, 07:11 AM

UC San Diego Bioengineers Map Gene Function in Human Stem Cells

by SEQadmin2

Bioengineers at the University of California San Diego have developed a genome-scale reference map showing how individual genes control the functions...
- Channel: News
07-13-2026, 10:26 AM
New Analysis Splits Leukemia Into 16 Epigenomic Subgroups

by SEQadmin2

Acute myeloid leukemia (AML) is one of the most aggressive of all blood cancers, and how it is classified helps determine how each patient is treated....
- Channel: News
07-09-2026, 10:04 AM
Genome-Wide CRISPR Screen Uncovers Unlikely Psoriasis Target

by SEQadmin2

Biohub researchers performed what they believe is the first genome-wide CRISPR study of primary human adult skin cells, then used an AI model to mine...
- Channel: News
07-08-2026, 10:08 AM
Engineered Protein Motor Takes Its First Steps Along DNA Track

by SEQadmin2

An international team led by Lund University and the University of New South Wales has built an artificial protein motor that takes controlled, directional...
- Channel: News
07-07-2026, 11:05 AM

Unconfigured Ad

From Algorithms to Assemblies: An Interview with Sequencing Analysis Experts—Part 1

From Algorithms to Assemblies: An Interview with Sequencing Analysis Experts—Part 1

About the Author

Latest Articles

ad_right_rmr

News