From Algorithms to Assemblies: An Interview with Sequencing Analysis Experts—Part 3

Published: 04-10-2023, 01:51 PM
459 views
0 comments
- Share
- Post

From Algorithms to Assemblies: An Interview with Sequencing Analysis Experts—Part 3

This is part three of our Q&A article series with our expert sequencing analysis providers. We’re asking them important questions to learn how they handle different aspects of the analysis process.

In this segment of our series, we ask our participants about the complex processes of transcript quantification and differential expression analysis.

Review the first installment to see how they handle the quality control process, or the second installment to see how they evaluate assemblies and alignments.

How do you perform transcript quantification and also differential expression analysis?

Simon Valentine, Chief Commercial Officer, Basepair

The most commonly used workflow on the Basepair platform provides QC, alignment, and quantification of RNA-seq data. Reads are first aligned to the genome with STAR and then featureCounts is used to measure gene expression at both the gene- and transcript-level. We can go from raw reads to RPKM values in about 15 minutes for a typical human dataset, with the ability to process hundreds of samples in parallel. Individual samples can then be combined in downstream differential expression analyses using DESeq2 and we also provide workflows for measuring de novo transcript isoform abundance and differential expression using cufflinks and cuffdiff.

QIAGEN Digital Insights Team
The tools available in QIAGEN CLC Genomics Workbench Premium are compatible both with the Expression Tracks created by the RNA-Seq Analysis tool and the tables created by the miRNA quantification tool. Two tools are available in the Workbench for calculating differential expressions. The Differential Expression in Two Groups tool performs a statistical differential expression test for a set of Expression Tracks and a set of control tracks. The Differential Expression for RNA-Seq tool performs a statistical differential expression test for a set of Expression Tracks with associated metadata. Both tools use multi-factorial statistics based on a negative binomial Generalized Linear Model (GLM).

Regarding the number of replicates, the Differential Expression for RNA-Seq tool is capable of running without replicates, but this is not recommended and the results should be treated with caution. You want to have as many biological replicates as possible - typically at least 3. Replication is important because it allows the ‘within group’ variation to be accurately estimated for a gene. Without replication, the Differential Expression for RNA-Seq tool assumes that genes with similar average expression levels have similar variability.

When considering technical vs. biological replicates, Auer and Doerge, 2010 illustrate the importance of biological replicates with the example of an alien visiting Earth. The alien wishes to know if men are taller than women. It abducts one man and one woman and measures their heights several times (i.e., performs several technical replicates). However, without biological replicates, the alien would erroneously conclude that women are taller than men if this was the case in the two abducted individuals.

The use of the GLM formalism allows us to fit curves to expression values without assuming that the error on the values is normally distributed. Similarly to edgeR and DESeq2, we assume that the read counts follow a Negative Binomial distribution, as explained in McCarthy et al., 2012. The Negative Binomial distribution can be understood as a ‘Gamma-Poisson’ mixture distribution, i.e., the distribution resulting from a mixture of Poisson distributions, where the Poisson parameter λ is itself Gamma-distributed. In an RNA-Seq context, this Gamma distribution is controlled by the dispersion parameter, such that the Negative Binomial distribution reduces to a Poisson distribution when the dispersion is zero.

Mike Lelivelt, VP of Software Product Management and Marketing, Illumina

DRAGEN offers a complete secondary analysis solution for RNA-seq data. DRAGEN RNA pipeline leverages a graph capable mapper and Multigenome (graph) Reference that improve mapping accuracy, especially in difficult-to-map regions. The transcript expression quantification by DRAGEN shows high concordance with widely adopted algorithms, e.g., Salmon and Kallisto, with significant runtime improvement, thanks to hardware (FPGA) acceleration. The individual transcript quantifications can then be aggregated into sample groups for differential expression analysis by the DESeq2 module. Fusion gene detection and RNA small variant calling can also be optionally enabled, allowing users to maximize their findings in their RNA-seq data.

DRAGEN comes with lossless ORA Compression, which reduces RNA-seq FASTQ data files by >70%. This allows users to easily manage their datasets and reduce data storage costs, especially for high-throughput users.
DRAGEN RNA pipeline meets customer’s needs where they need it - onboard selected Illumina sequencers (NextSeqTM 1000 and 2000, NovaSeqTM X series) and Illumina cloud platforms (BaseSpaceTM Sequence Hub, Illumina Connected Analytics). The onboard or cloud analysis can be configured at sequencing run planning, enabling seamless and ultra-efficient end-to-end workflows.

Richard Moir, Director of Product and Technology, Geneious

Geneious offers several options for mapping or assembly depending on the data used, including SPAdes, STAR and our own Geneious algorithm. Geneious calculates raw counts along with basic stats such as TPM while accounting for reads that are mapped to multiple locations.

Differential gene expression can then be analyzed using DESeq2 without the need for configuring a Python environment. Geneious downloads and configures all the necessary files for you. Results are presented in multiple integrated views that can be used in concert for identifying genes of interest. These include an interactive table view, volcano plots and PCA plots as well as heatmap coloring of genes in the interactive genome viewer.

Dr. Ni Ming, Senior Vice-President, MGI

a) Our strategy in MegaBOLT is to help clients with analyses which require high computing time and resources. The pipeline that we are using is heavily based on GATK best practices and is a well-recognized publicly available software. The output of our RNA-seq pipeline is the standard gene expression matrix for each sample. Other downstream analyses for the gene expression are also included, i.e. heatmap, cluster analysis, etc.

b) In terms of the differential expression analysis, there are many software available, and this highly depends on the customer’s project design, i.e. selection of control/treatment pairs, grouping of different samples, etc. The customer can use the standards expression matrix as input for most of the tools, i.e. DEGseq¹, DEseq², edgeR³, EBseq⁴, NOIseq⁵ and PossionDis⁶.

References:
1. Wang L, Feng Z, Wang X, Wang X, Zhang X. DEGseq: an R package for identifying differentially expressed genes from RNA-seq data. Bioinformatics. 2010;26(1):136-138. doi:10.1093/bioinformatics/btp612.
2. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550. doi:10.1186/s13059-014-0550-8.
3. Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26(1):139-140. doi:10.1093/bioinformatics/btp616.
4. Leng N, Dawson JA, Thomson JA, et al. EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments. Bioinformatics. 2013;29(8):1035-1043. doi:10.1093/bioinformatics/btt087.
5. Tarazona S, García-Alcalde F, Dopazo J, Ferrer A, Conesa A. Differential expression in RNA-seq: a matter of depth. Genome Res. 2011;21(12):2213-2223. doi:10.1101/gr.124321.111.
6. Audic S, Claverie JM. The significance of digital gene expression profiles. Genome Res. 1997;7(10):986-995. doi:10.1101/gr.7.10.986.

Follow up with the fourth, fifth, and sixth (final) installment of our Q&A series.
Tags: None
Please sign into your account to post comments.

Beyond CRISPR/Cas9: Understand, Choose, and Use the Right Genome Editing Tool

by SEQadmin2

CRISPR/Cas9 sparked the gene editing revolution for both research and therapeutics.¹ But this system still showed severe issues that limited its applications. The most prominent were the heavy reliance on PAM sequences, delivery limitations, double-stranded breaks that prompt unintended edits and cell death, and editing inefficiency (both in targeting and in knock-in reliability).

Despite this, “CRISPR helped turn genome editing from a specialized technique into...
- Channel: Articles
Today, 11:01 AM
Proteomic Platforms: How to Choose the Right Analytical Strategy to Improve Detection and Clinical Applications

by SEQadmin2

Proteomics platforms are evolving rapidly, with advances in mass spectrometry and affinity-based approaches expanding what researchers can detect and at what scale. As the field moves toward deeper proteome coverage and clinical applications, scientists face an increasingly complex landscape of tools. This article will explore how researchers are navigating these choices to find the right platform for their work.

The systematic characterization of the human proteome has...
- Channel: Articles
07-20-2026, 11:48 AM
Advanced Sequencing Platforms Tackle Neuroscience’s Toughest Genomics Problems

by SEQadmin2

Genomics studies in neuroscience face a special challenge due to the brain’s complexity and scarcity of samples. Mapping changes in cell type and state using conventional next-generation sequencing methods remains challenging. Advances in technologies like single-cell sequencing, spatial transcriptomics, and long-read sequencing have opened the door to deeper studies of the brain and diseases like Alzheimer’s, amyotrophic lateral sclerosis (ALS), and schizophrenia.
...
- Channel: Articles
07-09-2026, 11:10 AM

New Genomic Method Uncovers Ancient Hominin DNA

by SEQadmin2

UC Berkeley researchers have developed a new computational technique that identifies regions of the human genome inherited from previously unknown archaic...
- Channel: News
Today, 02:55 AM
Study Captures the First Moments of DNA Replication

by SEQadmin2

Researchers at the MRC Laboratory of Medical Sciences (LMS) and collaborators have identified one of the earliest steps in DNA replication: the moment...
- Channel: News
07-24-2026, 12:17 PM
Chemotherapy Leaves Detectable DNA Signatures in Childhood Tumors

by SEQadmin2

A study led by The Hospital for Sick Children (SickKids), published in Nature, found that nearly half of childhood tumors treated with common chemotherapies...
- Channel: News
07-23-2026, 11:41 AM
Single-Cell Atlases Skew Toward European Ancestry, Analysis Finds

by SEQadmin2

A study led by researchers at the Icahn School of Medicine at Mount Sinai finds that single-cell atlases do not fairly represent the world’ populations....
- Channel: News
07-20-2026, 11:10 AM

Unconfigured Ad

From Algorithms to Assemblies: An Interview with Sequencing Analysis Experts—Part 3

From Algorithms to Assemblies: An Interview with Sequencing Analysis Experts—Part 3

About the Author

Latest Articles

ad_right_rmr

News