Best Practices for Single-Cell Sequencing Analysis

Best Practices for Single-Cell Sequencing Analysis
While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers advice for validating findings, and makes helpful recommendations.

Analysis Challenges
Adam Cribbs, Ph.D., Associate Professor at the University of Oxford, shared that the primary hurdle is “disentangling the biology from the technical variability.” This difficulty arises because technical inconsistencies can obscure true biological signals, making it hard to interpret the data accurately. Cribbs emphasized the importance of well-designed experiments to mitigate these issues and maximize the chance of a successful project. Proper experimental design helps to reduce technical noise and ensures that the data reflect true biological differences rather than artifacts introduced during the sequencing process. Cribbs also pointed out that although the high cost of single-cell sequencing often restricts sample sizes, it is imperative to sequence the appropriate number of samples to obtain meaningful biological insights, particularly in disease studies.

Ming "Tommy" Tang, Ph.D., Director of Computational Biology at Immunitas and author of From Cell Line to Command Line, finds that cell annotation is the most challenging aspect of single-cell analysis. “Every single-cell dataset is unique in terms of data quality and QC has to be carried out in a dataset-specific manner,” explained Tang. Despite using automatic tools, he noted that immunologists often introduce new cell-type labels, adding to the difficulty. Annotating cells from scRNA-seq data is also challenging because gene expression levels are continuous rather than discrete, and differences in gene expression do not always correspond to differences in cellular function¹.

Common Mistakes in Single-Cell Analysis
The complexity of single-cell analysis opens up many possibilities for errors, some of which are easier to detect than others. “People don't realize that there are inherent biases with single-cell sequencing data,” Cribbs noted. He explained that while many incorrectly consider sequencing itself the most common source of sequencing-related errors, another culprit is largely responsible for these issues. “The biggest source of errors is PCR amplification,” Cribbs stated. Some of Cribb’s most recent work in Nature Methods highlighted how PCR errors are detrimental to accurate quantification at the single-cell level². Results from the study demonstrated that synthesizing unique molecular identifiers with homotrimeric nucleotide blocks helps to provide a solution for error correction and allows for accurate counting of sequenced molecules. Cribbs also emphasized the importance of quality control and cautioned against over-trusting the data without adequate validation.

In addition to these issues, Tang explained that other common analysis mistakes involve differential gene expression analysis for two conditions (health vs. disease), each with multiple samples. “People just group all the cells from each condition together and do a differential gene expression at the cell level,” Tang noted. “The cells from each sample are not independent and when you use so many cells you get inherently small p-values.” Instead of this approach, Tang advised researchers to use pseudobulk methods, which have been shown to outperform numerous differential expression analysis methods^3,4.

Furthermore, Tang pointed out issues with batch correction or data integration from different datasets. Each method operates under specific assumptions, and data integration can sometimes erase biological signals. This issue was highlighted in a recent pre-print that compared seven batch correction methods for scRNA-seq data, finding that most methods, including MNN, SCVI, and LIGER, introduce artifacts⁵. The authors recommended Harmony for batch correction due to its consistent minimization of data distortion and preservation of biological integrity.

Both Tang and Cribbs mentioned the controversy with high-dimensional clustering methods like UMAP. Tang shared that while it is useful for visualization, the distance between points on UMAP does not mean much. “UMAP is a non-linear dimension reduction and one should not read too much into the points on the UMAP,” he emphasized. The debate over dimension reduction techniques was also recently covered in a more nuanced discussion in Nature⁶. Similar to the advice provided by Tang and Cribbs, the article detailed how researchers should select parameters judiciously, avoid confirmation bias, and recognize that these tools are just starting points for analysis, not definitive conclusions.

Best Practices for Validating Findings
Another important aspect of single-cell analysis is ensuring the validity and accuracy of the findings. Tang encourages researchers to use multiple data types and sources of data for proper validation. “For example, validate the scRNAseq data with some protein data,” Tang explained. “If there are publicly available datasets to answer the same question, see if you have the same conclusion with a different dataset.” However, he added that validating through a wet experiment is clearly the gold standard.

Cribbs noted that validating single-cell data requires linking differential gene expression to functional outputs. He advocated for using temporal processes and paired data for more detailed biological insights, as well as techniques like pseudotime analysis and receptor-ligand interaction studies that can help infer functional consequences. Cribbs also mentioned the need for orthogonal approaches, such as downstream functional assays and CRISPR, to confirm findings. Ultimately, Cribbs suggested forming hypotheses from single-cell data and verifying them through in vitro and in vivo model systems to ensure functional relevance.

Resources and Recommendations
For researchers aiming to deepen their understanding of single-cell analysis, Tang encourages R learners to use Bioconductor’s single-cell analysis book, and Python users to take the online course on single-cell best practices from the Theis Lab. Tang also regularly shares advice and guidance through his website and newsletter, which includes a detailed presentation on the best practices and unresolved issues in single-cell analysis.

Beyond these recommendations, Cribbs advises extensively reading scientific literature to build a strong foundation. This is particularly important for researchers with backgrounds outside of biology because of its complex knowledge requirements. Cribbs stressed that while learning technical skills like coding and statistics isn’t easy, the real challenge is effectively applying them to create meaningful biological narratives. Moreover, integrating different areas of knowledge like molecular biology, statistics, and programming is a significant challenge often mastered only through constant application in Ph.D. programs.

Tangential to single-cell research, Cribbs highlighted the importance of ethics and data sharing. This has been central to his work on large projects like the Human Cell Atlas (HCA). Given the differing restrictions across counties, understanding how data can be shared and complying with the laws to avoid legal repercussions is an unexpected yet crucial consideration. These factors were also part of a recent discussion where Cribbs and colleagues described the collaborative efforts needed to overcome barriers to single-cell RNA sequencing adoption in low- and middle-income countries⁷. Finally, Cribbs advised researchers to consult information governance experts to navigate these regulations and noted that while this issue extends beyond single-cell analysis, it is essential for collaboration and advancing science.

References
Pasquini G, Eduardo J, Schäfer P, Busskamp V. Automated methods for cell type annotation on scRNAseq data. Computational and Structural Biotechnology Journal. 2021;19:961-969. doi:https://doi.org/10.1016/j.csbj.2021.01.015

Sun J, Philpott M, Loi D, et al. Correcting PCR amplification errors in unique molecular identifiers to generate accurate numbers of sequencing molecules. Nature Methods. 2024;21(3):401-405. doi:https://doi.org/10.1038/s41592-024-02168-y

Squair JW, Gautier M, Kathe C, et al. Confronting false discoveries in single-cell differential expression. Nature Communications. 2021;12(1):5692. doi:https://doi.org/10.1038/s41467-021-25960-2

Murphy AE, Skene NG. A balanced measure shows superior performance of pseudobulk methods in single-cell RNA-sequencing analysis. Nature Communications. 2022;13(1):7851. doi:https://doi.org/10.1038/s41467-022-35519-4

Antonsson SE, Melsted P. Batch correction methods used in single cell RNA-sequencing analyses are often poorly calibrated. bioRxiv. Published online January 1, 2024:2024.03.19.585562. doi:https://doi.org/10.1101/2024.03.19.585562

Marx V. Seeing data as tSNE and UMAP do. Nature Methods. Published online 2024. doi:https://doi.org/10.1038/s41592-024-02301-x

Boakye Serebour, T., Cribbs, AP, Baldwin, MJ, et al. Overcoming barriers to single-cell RNA sequencing adoption in low- and middle-income countries. European Journal of Human Genetics. Published online 2024. doi:https://doi.org/10.1038/s41431-024-01564-4
Tags: None
Please sign into your account to post comments.

Exploring the Dynamics of the Tumor Microenvironment

by seqadmin

The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
- Channel: Articles
07-08-2024, 03:19 PM
Exploring Human Diversity Through Large-Scale Omics

by seqadmin

In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date¹. Although the genome wasn’t fully completed until nearly 20 years later², numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
- Channel: Articles
06-25-2024, 06:43 AM
Best Practices for Single-Cell Sequencing Analysis

by seqadmin

While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
- Channel: Articles
06-06-2024, 07:15 AM

Gene Misexpression in the Healthy Human Population

by seqadmin

A recent study by researchers from the Wellcome Sanger Institute, the University of Cambridge, and AstraZeneca has discovered that 'gene misbehavior'—where...
- Channel: News
Yesterday, 06:46 AM
New Method for Rapid Genetic Diagnosis of Mendelian Disorders

by seqadmin

Despite significant advancements in genetic testing, over half of individuals worldwide with suspected Mendelian genetic disorders still...
- Channel: News
07-24-2024, 11:09 AM
Advancing Nanopore Technology for Portable Sensing Devices

by seqadmin

In a significant stride forward in the field of analytical biology, researchers from the VIB-VUB Center for Structural Biology in Belgium and the University...
- Channel: News
07-19-2024, 07:20 AM
New RNA-Based Gene Writing Technology Achieves Precise Gene Integration

by seqadmin

In a recent study published in Cell, a research team led by Li Wei and Zhou Qi from the Institute of Zoology at the Chinese Academy...
- Channel: News
07-16-2024, 05:49 AM

Seqanswers Leaderboard Ad

Announcement

Best Practices for Single-Cell Sequencing Analysis

Best Practices for Single-Cell Sequencing Analysis

About the Author

Latest Articles

ad_right_rmr

News