Seqanswers Leaderboard Ad



No announcement yet.

Differential Expression and Data Visualization: Recommended Tools for Next-Level Sequencing Analysis


  • Differential Expression and Data Visualization: Recommended Tools for Next-Level Sequencing Analysis

    Click image for larger version  Name:	Data Visualization Photo.jpg Views:	0 Size:	557.8 KB ID:	324617

    After covering QC and alignment tools in the first segment and variant analysis and genome assembly in the second segment, we’re wrapping up with a discussion about tools for differential gene expression analysis and data visualization. In this article, we include recommendations from the following experts: Dr. Mark Ziemann, Senior Lecturer in Biotechnology and Bioinformatics, Deakin University; Dr. Medhat Mahmoud Postdoctoral Research Fellow at Baylor College of Medicine; and Dr. Ming "Tommy" Tang, Director of Computational Biology at Immunitas and author of From Cell Line to Command Line.

    Differential gene expression analysis tools

    Differential gene expression is the variation in gene activity levels between different conditions or cell types. A thorough understanding of this process is important as it helps identify genes that are upregulated or downregulated in response to specific stimuli or in different disease states, providing researchers with insights into the underlying molecular mechanisms, cellular processes, and potential therapeutic targets associated with those conditions.

    When asked about his procedure and preferred tools for differential expression analysis, Ziemann explains, “I use a PCA plot to visualize the sample variation and I omit samples from the downstream analysis if they appear like outliers and are supported by the QC. I load the Kallisto counts (detailed in the first article of the series) into R and collapse these to the gene level as I'm not that interested in alternative splicing. I then use DESeq2 for differential expression, as it is the most accurate according to my unpublished simulation work. DESeq2 also allows for complex experimental designs, which allow us to correct for potential confounders.”

    Ziemann also notes that in order to interpret his data, he uses the Bioconductor package, mitch, for enrichment analysis. “Mitch is quite unique in that it accommodates multiple DESeq2 comparisons into an analysis, which gives a more integrated overview of the trends in a complex dataset with many contrasts.”

    Tang supports the recommendation for using DESeq2 and states that it is standard for differential gene expression analysis. His claim is also backed up by tens of thousands of journal articles that cite DESeq2, clearly making it the gold standard for differential analysis. Although not included in the recommendations, common alternatives to this popular tool include edgeR, limma, NOISeq, and sleuth.

    Data visualization tools

    While each step of the analysis process is important, the final step—data visualization—is critical for an accurate understanding of the data. This process allows researchers to interpret complex patterns and relationships, highlight significance, and effectively communicate their findings.

    Tang recommends ComplexHeatmap, ggplot2, and Bioconductor visualization packages for effective visualization tools. ComplexHeatmap is a package that is also available on Bioconductor and is ideal for building heatmaps to visualize data associations and patterns. ggplot2 is an R package offering versatile plot creation capabilities, while Bioconductor provides a wide range of visualization options tailored to specific application and analysis requirements.

    “For [visualization of] differential expression analysis, I keep it fairly simple,” says Ziemann. “PCA plots to understand overall trends, base R for volcano or smear plots, heatmap.2 for heatmaps, and I like beeswarm charts to show gene expression differences between groups. For pathway enrichment, mitch provides a set of nice visualizations.” All of these types of visualization methods can be also created using R or from existing packages in Bioconductor.

    Mahmoud utilizes a combination of R and Python libraries for his visualization needs. He employs ggplot2 from R, which enables the creation of versatile plots. In Python, he utilizes Matplotlib for comprehensive figure generation, Seaborn for informative statistical graphics based on Matplotlib, and Plotly, an interactive, browser-based graphing library. He also uses the Integrative Genomics Viewer (IGV) browser for much of his work.

    Additional tools recommended by Mahmoud include samplot for structural variant visualization. Lastly, he suggests using Circos, an innovative tool primarily used for circular layout representations executed in Perl. Circos has enhanced the visualization of scientific results, particularly in the field of genomics.


    There are many more influential tools and important sequencing analysis applications not mentioned in this article series. So, we’ll ask the community. What are some of your preferred tools for these processes? Make sure you are logged in so you can comment below!
    Attached is a PDF containing additional details about some of the tools recommended above.
    Attached Files
      Please sign into your account to post comments.

    About the Author


    seqadmin Benjamin Atha holds a B.A. in biology from Hood College and an M.S. in biological sciences from Towson University. With over 9 years of hands-on laboratory experience, he's well-versed in next-generation sequencing systems. Ben is currently the editor for SEQanswers. Find out more about seqadmin

    Latest Articles


    • Current Approaches to Protein Sequencing
      by seqadmin

      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM
    • Strategies for Sequencing Challenging Samples
      by seqadmin

      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
      03-22-2024, 06:39 AM
    • Techniques and Challenges in Conservation Genomics
      by seqadmin

      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

      Avian Conservation
      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
      03-08-2024, 10:41 AM