Unconfigured Ad

Collapse

From Algorithms to Assemblies: An Interview with Sequencing Analysis Experts—Part 3

Collapse
X
Collapse
  •  

  • From Algorithms to Assemblies: An Interview with Sequencing Analysis Experts—Part 3

    Click image for larger version  Name:	Analysis Image6.jpg Views:	0 Size:	673.9 KB ID:	324256

    This is part three of our Q&A article series with our expert sequencing analysis providers. We’re asking them important questions to learn how they handle different aspects of the analysis process.


    In this segment of our series, we ask our participants about the complex processes of transcript quantification and differential expression analysis.

    Review the first installment to see how they handle the quality control process, or the second installment to see how they evaluate assemblies and alignments.






    How do you perform transcript quantification and also differential expression analysis?




    Basepair
    Simon Valentine, Chief Commercial Officer, Basepair



    The most commonly used workflow on the Basepair platform provides QC, alignment, and quantification of RNA-seq data. Reads are first aligned to the genome with STAR and then featureCounts is used to measure gene expression at both the gene- and transcript-level. We can go from raw reads to RPKM values in about 15 minutes for a typical human dataset, with the ability to process hundreds of samples in parallel. Individual samples can then be combined in downstream differential expression analyses using DESeq2 and we also provide workflows for measuring de novo transcript isoform abundance and differential expression using cufflinks and cuffdiff.











    QIAGEN
    QIAGEN Digital Insights Team
    ​​The tools available in QIAGEN CLC Genomics Workbench Premium are compatible both with the Expression Tracks created by the RNA-Seq Analysis tool and the tables created by the miRNA quantification tool. Two tools are available in the Workbench for calculating differential expressions. The Differential Expression in Two Groups tool performs a statistical differential expression test for a set of Expression Tracks and a set of control tracks. The Differential Expression for RNA-Seq tool performs a statistical differential expression test for a set of Expression Tracks with associated metadata. Both tools use multi-factorial statistics based on a negative binomial Generalized Linear Model (GLM).

    Regarding the number of replicates, the Differential Expression for RNA-Seq tool is capable of running without replicates, but this is not recommended and the results should be treated with caution. You want to have as many biological replicates as possible - typically at least 3. Replication is important because it allows the ‘within group’ variation to be accurately estimated for a gene. Without replication, the Differential Expression for RNA-Seq tool assumes that genes with similar average expression levels have similar variability.

    When considering technical vs. biological replicates, Auer and Doerge, 2010 illustrate the importance of biological replicates with the example of an alien visiting Earth. The alien wishes to know if men are taller than women. It abducts one man and one woman and measures their heights several times (i.e., performs several technical replicates). However, without biological replicates, the alien would erroneously conclude that women are taller than men if this was the case in the two abducted individuals.


    The use of the GLM formalism allows us to fit curves to expression values without assuming that the error on the values is normally distributed. Similarly to edgeR and DESeq2, we assume that the read counts follow a Negative Binomial distribution, as explained in McCarthy et al., 2012. The Negative Binomial distribution can be understood as a ‘Gamma-Poisson’ mixture distribution, i.e., the distribution resulting from a mixture of Poisson distributions, where the Poisson parameter λ is itself Gamma-distributed. In an RNA-Seq context, this Gamma distribution is controlled by the dispersion parameter, such that the Negative Binomial distribution reduces to a Poisson distribution when the dispersion is zero.


    Illumina
    Mike Lelivelt, VP of Software Product Management and Marketing, Illumina


    DRAGEN offers a complete secondary analysis solution for RNA-seq data. DRAGEN RNA pipeline leverages a graph capable mapper and Multigenome (graph) Reference that improve mapping accuracy, especially in difficult-to-map regions. The transcript expression quantification by DRAGEN shows high concordance with widely adopted algorithms, e.g., Salmon and Kallisto, with significant runtime improvement, thanks to hardware (FPGA) acceleration. The individual transcript quantifications can then be aggregated into sample groups for differential expression analysis by the DESeq2 module. Fusion gene detection and RNA small variant calling can also be optionally enabled, allowing users to maximize their findings in their RNA-seq data.

    DRAGEN comes with lossless ORA Compression, which reduces RNA-seq FASTQ data files by >70%. This allows users to easily manage their datasets and reduce data storage costs, especially for high-throughput users.
    DRAGEN RNA pipeline meets customer’s needs where they need it - onboard selected Illumina sequencers (NextSeqTM 1000 and 2000, NovaSeqTM X series) and Illumina cloud platforms (BaseSpaceTM Sequence Hub, Illumina Connected Analytics). The onboard or cloud analysis can be configured at sequencing run planning, enabling seamless and ultra-efficient end-to-end workflows.




    Geneious
    Richard Moir, Director of Product and Technology, Geneious



    Geneious offers several options for mapping or assembly depending on the data used, including SPAdes, STAR and our own Geneious algorithm. Geneious calculates raw counts along with basic stats such as TPM while accounting for reads that are mapped to multiple locations​.

    Differential gene expression can then be analyzed using DESeq2 without the need for configuring a Python environment. Geneious downloads and configures all the necessary files for you. Results are presented in multiple integrated views that can be used in concert for identifying genes of interest. These include an interactive table view, volcano plots and PCA plots as well as heatmap coloring of genes in the interactive genome viewer.






    MGI (Complete Genomics)
    ​ Dr. Ni Ming, Senior Vice-President, MGI​


    a) Our strategy in MegaBOLT is to help clients with analyses which require high computing time and resources. The pipeline that we are using is heavily based on GATK best practices and is a well-recognized publicly available software. The output of our RNA-seq pipeline is the standard gene expression matrix for each sample. Other downstream analyses for the gene expression are also included, i.e. heatmap, cluster analysis, etc.

    b) In terms of the differential expression analysis, there are many software available, and this highly depends on the customer’s project design, i.e. selection of control/treatment pairs, grouping of different samples, etc. The customer can use the standards expression matrix as input for most of the tools, i.e. DEGseq1, DEseq2, edgeR3, EBseq4, NOIseq5 and PossionDis6.

    References:
    1. Wang L, Feng Z, Wang X, Wang X, Zhang X. DEGseq: an R package for identifying differentially expressed genes from RNA-seq data. Bioinformatics. 2010;26(1):136-138. doi:10.1093/bioinformatics/btp612.
    2. Love MI, Huber W, Anders S. Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2. Genome Biol. 2014;15(12):550. doi:10.1186/s13059-014-0550-8.
    3. Robinson MD, McCarthy DJ, Smyth GK. edgeR: a Bioconductor package for differential expression analysis of digital gene expression data. Bioinformatics. 2010;26(1):139-140. doi:10.1093/bioinformatics/btp616.
    4. Leng N, Dawson JA, Thomson JA, et al. EBSeq: an empirical Bayes hierarchical model for inference in RNA-seq experiments. Bioinformatics. 2013;29(8):1035-1043. doi:10.1093/bioinformatics/btt087.
    5. Tarazona S, García-Alcalde F, Dopazo J, Ferrer A, Conesa A. Differential expression in RNA-seq: a matter of depth. Genome Res. 2011;21(12):2213-2223. doi:10.1101/gr.124321.111.
    6. Audic S, Claverie JM. The significance of digital gene expression profiles. Genome Res. 1997;7(10):986-995. doi:10.1101/gr.7.10.986.




    Follow up with the fourth, fifth, and sixth (final) installment of our Q&A series.
      Please sign into your account to post comments.

    About the Author

    Collapse

    seqadmin Benjamin Atha holds a B.A. in biology from Hood College and an M.S. in biological sciences from Towson University. With over 9 years of hands-on laboratory experience, he's well-versed in next-generation sequencing systems. Ben is currently the editor for SEQanswers. Find out more about seqadmin

    Latest Articles

    Collapse

    • Nine Things a Sample Prep Scientist Thinks About Before Sequencing
      by SEQadmin2


      I’m not a sequencing expert. I’m a purification scientist who uses NGS to evaluate workflows my group develops. With this perspective, we think about the sample first and the NGS workflow second. The sequencer is an exceptionally honest reporter, but it can only report on what you give it, so whether you get clean, interpretable data from an NGS workflow is largely determined before you begin.


      Here are nine questions we think about, in roughly the order they matter, before...
      Yesterday, 07:11 AM
    • From Collection to Sequencing: Why Sample Preparation and Preservation Define Sequencing Data
      by SEQadmin2


      Data variability is still an issue in sequencing technologies despite the advances in reproducibility and accuracy of these platforms. But the problem does not originate in the sequencing itself, but in the previous steps, before the sample reaches the sequencer.


      The first step is collection, followed by preservation and sample preparation for analysis. Most scientists overlook those steps, but not being careful might just be skewing the experiment’s results.
      ...
      06-02-2026, 10:05 AM
    • Single-Cell Sequencing at an Inflection Point: Early Impacts of New Platforms and Emerging Trends
      by SEQadmin2


      With the launch of new single-cell sequencing platforms in 2026, the field stands at an exciting inflection point. This article surveys the most impactful advances in the field and discusses how they’re reshaping research in cancer, immunology, and beyond.


      Introduction

      Single-cell sequencing technologies have undergone remarkable advances over the past decade, transitioning from low-throughput experimental approaches to highly scalable platforms capable of...
      05-22-2026, 06:42 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Working...