Seqanswers Leaderboard Ad



No announcement yet.

Best Practices for Single-Cell Sequencing Analysis


  • Best Practices for Single-Cell Sequencing Analysis

    Click image for larger version

Name:	SingleCell Analysis.2.jpg
Views:	515
Size:	164.0 KB
ID:	325784

    While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers advice for validating findings, and makes helpful recommendations.

    Analysis Challenges
    Adam Cribbs, Ph.D., Associate Professor at the University of Oxford, shared that the primary hurdle is “disentangling the biology from the technical variability.” This difficulty arises because technical inconsistencies can obscure true biological signals, making it hard to interpret the data accurately. Cribbs emphasized the importance of well-designed experiments to mitigate these issues and maximize the chance of a successful project. Proper experimental design helps to reduce technical noise and ensures that the data reflect true biological differences rather than artifacts introduced during the sequencing process. Cribbs also pointed out that although the high cost of single-cell sequencing often restricts sample sizes, it is imperative to sequence the appropriate number of samples to obtain meaningful biological insights, particularly in disease studies.

    Ming "Tommy" Tang, Ph.D., Director of Computational Biology at Immunitas and author of From Cell Line to Command Line, finds that cell annotation is the most challenging aspect of single-cell analysis. “Every single-cell dataset is unique in terms of data quality and QC has to be carried out in a dataset-specific manner,” explained Tang. Despite using automatic tools, he noted that immunologists often introduce new cell-type labels, adding to the difficulty. Annotating cells from scRNA-seq data is also challenging because gene expression levels are continuous rather than discrete, and differences in gene expression do not always correspond to differences in cellular function1.

    Common Mistakes in Single-Cell Analysis
    The complexity of single-cell analysis opens up many possibilities for errors, some of which are easier to detect than others. “People don't realize that there are inherent biases with single-cell sequencing data,” Cribbs noted. He explained that while many incorrectly consider sequencing itself the most common source of sequencing-related errors, another culprit is largely responsible for these issues. “The biggest source of errors is PCR amplification,” Cribbs stated. Some of Cribb’s most recent work in Nature Methods highlighted how PCR errors are detrimental to accurate quantification at the single-cell level2. Results from the study demonstrated that synthesizing unique molecular identifiers with homotrimeric nucleotide blocks helps to provide a solution for error correction and allows for accurate counting of sequenced molecules. Cribbs also emphasized the importance of quality control and cautioned against over-trusting the data without adequate validation.

    In addition to these issues, Tang explained that other common analysis mistakes involve differential gene expression analysis for two conditions (health vs. disease), each with multiple samples. “People just group all the cells from each condition together and do a differential gene expression at the cell level,” Tang noted. “The cells from each sample are not independent and when you use so many cells you get inherently small p-values.” Instead of this approach, Tang advised researchers to use pseudobulk methods, which have been shown to outperform numerous differential expression analysis methods3,4.

    Furthermore, Tang pointed out issues with batch correction or data integration from different datasets. Each method operates under specific assumptions, and data integration can sometimes erase biological signals. This issue was highlighted in a recent pre-print that compared seven batch correction methods for scRNA-seq data, finding that most methods, including MNN, SCVI, and LIGER, introduce artifacts5. The authors recommended Harmony for batch correction due to its consistent minimization of data distortion and preservation of biological integrity.

    Both Tang and Cribbs mentioned the controversy with high-dimensional clustering methods like UMAP. Tang shared that while it is useful for visualization, the distance between points on UMAP does not mean much. “UMAP is a non-linear dimension reduction and one should not read too much into the points on the UMAP,” he emphasized. The debate over dimension reduction techniques was also recently covered in a more nuanced discussion in Nature6. Similar to the advice provided by Tang and Cribbs, the article detailed how researchers should select parameters judiciously, avoid confirmation bias, and recognize that these tools are just starting points for analysis, not definitive conclusions.

    Best Practices for Validating Findings
    Another important aspect of single-cell analysis is ensuring the validity and accuracy of the findings. Tang encourages researchers to use multiple data types and sources of data for proper validation. “For example, validate the scRNAseq data with some protein data,” Tang explained. “If there are publicly available datasets to answer the same question, see if you have the same conclusion with a different dataset.” However, he added that validating through a wet experiment is clearly the gold standard.

    Cribbs noted that validating single-cell data requires linking differential gene expression to functional outputs. He advocated for using temporal processes and paired data for more detailed biological insights, as well as techniques like pseudotime analysis and receptor-ligand interaction studies that can help infer functional consequences. Cribbs also mentioned the need for orthogonal approaches, such as downstream functional assays and CRISPR, to confirm findings. Ultimately, Cribbs suggested forming hypotheses from single-cell data and verifying them through in vitro and in vivo model systems to ensure functional relevance.

    Resources and Recommendations
    For researchers aiming to deepen their understanding of single-cell analysis, Tang encourages R learners to use Bioconductor’s single-cell analysis book, and Python users to take the online course on single-cell best practices from the Theis Lab. Tang also regularly shares advice and guidance through his website and newsletter, which includes a detailed presentation on the best practices and unresolved issues in single-cell analysis.

    Beyond these recommendations, Cribbs advises extensively reading scientific literature to build a strong foundation. This is particularly important for researchers with backgrounds outside of biology because of its complex knowledge requirements. Cribbs stressed that while learning technical skills like coding and statistics isn’t easy, the real challenge is effectively applying them to create meaningful biological narratives. Moreover, integrating different areas of knowledge like molecular biology, statistics, and programming is a significant challenge often mastered only through constant application in Ph.D. programs.

    Tangential to single-cell research, Cribbs highlighted the importance of ethics and data sharing. This has been central to his work on large projects like the Human Cell Atlas (HCA). Given the differing restrictions across counties, understanding how data can be shared and complying with the laws to avoid legal repercussions is an unexpected yet crucial consideration. These factors were also part of a recent discussion where Cribbs and colleagues described the collaborative efforts needed to overcome barriers to single-cell RNA sequencing adoption in low- and middle-income countries7. Finally, Cribbs advised researchers to consult information governance experts to navigate these regulations and noted that while this issue extends beyond single-cell analysis, it is essential for collaboration and advancing science.

    1. Pasquini G, Eduardo J, Schäfer P, Busskamp V. Automated methods for cell type annotation on scRNAseq data. Computational and Structural Biotechnology Journal. 2021;19:961-969. doi:
    2. Sun J, Philpott M, Loi D, et al. Correcting PCR amplification errors in unique molecular identifiers to generate accurate numbers of sequencing molecules. Nature Methods. 2024;21(3):401-405. doi:
    3. Squair JW, Gautier M, Kathe C, et al. Confronting false discoveries in single-cell differential expression. Nature Communications. 2021;12(1):5692. doi:
    4. Murphy AE, Skene NG. A balanced measure shows superior performance of pseudobulk methods in single-cell RNA-sequencing analysis. Nature Communications. 2022;13(1):7851. doi:
    5. Antonsson SE, Melsted P. Batch correction methods used in single cell RNA-sequencing analyses are often poorly calibrated. bioRxiv. Published online January 1, 2024:2024.03.19.585562. doi:
    6. Marx V. Seeing data as tSNE and UMAP do. Nature Methods. Published online 2024. doi:
    7. Boakye Serebour, T., Cribbs, AP, Baldwin, MJ, et al. Overcoming barriers to single-cell RNA sequencing adoption in low- and middle-income countries. European Journal of Human Genetics. Published online 2024. doi:
      Please sign into your account to post comments.

    About the Author


    seqadmin Benjamin Atha holds a B.A. in biology from Hood College and an M.S. in biological sciences from Towson University. With over 9 years of hands-on laboratory experience, he's well-versed in next-generation sequencing systems. Ben is currently the editor for SEQanswers. Find out more about seqadmin

    Latest Articles


    • Best Practices for Single-Cell Sequencing Analysis
      by seqadmin

      While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
      06-06-2024, 07:15 AM
    • Latest Developments in Precision Medicine
      by seqadmin

      Technological advances have led to drastic improvements in the field of precision medicine, enabling more personalized approaches to treatment. This article explores four leading groups that are overcoming many of the challenges of genomic profiling and precision medicine through their innovative platforms and technologies.

      Somatic Genomics
      “We have such a tremendous amount of genetic diversity that exists within each of us, and not just between us as individuals,”...
      05-24-2024, 01:16 PM
    • Recent Advances in Sequencing Analysis Tools
      by seqadmin

      The sequencing world is rapidly changing due to declining costs, enhanced accuracies, and the advent of newer, cutting-edge instruments. Equally important to these developments are improvements in sequencing analysis, a process that converts vast amounts of raw data into a comprehensible and meaningful form. This complex task requires expertise and the right analysis tools. In this article, we highlight the progress and innovation in sequencing analysis by reviewing several of the...
      05-06-2024, 07:48 AM