Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.

Best Practices for Single-Cell Sequencing Analysis

Collapse
X
Collapse
  •  

  • Best Practices for Single-Cell Sequencing Analysis

    Click image for larger version

Name:	SingleCell Analysis.2.jpg
Views:	877
Size:	164.0 KB
ID:	325784



    While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers advice for validating findings, and makes helpful recommendations.

    Analysis Challenges
    Adam Cribbs, Ph.D., Associate Professor at the University of Oxford, shared that the primary hurdle is “disentangling the biology from the technical variability.” This difficulty arises because technical inconsistencies can obscure true biological signals, making it hard to interpret the data accurately. Cribbs emphasized the importance of well-designed experiments to mitigate these issues and maximize the chance of a successful project. Proper experimental design helps to reduce technical noise and ensures that the data reflect true biological differences rather than artifacts introduced during the sequencing process. Cribbs also pointed out that although the high cost of single-cell sequencing often restricts sample sizes, it is imperative to sequence the appropriate number of samples to obtain meaningful biological insights, particularly in disease studies.

    Ming "Tommy" Tang, Ph.D., Director of Computational Biology at Immunitas and author of From Cell Line to Command Line, finds that cell annotation is the most challenging aspect of single-cell analysis. “Every single-cell dataset is unique in terms of data quality and QC has to be carried out in a dataset-specific manner,” explained Tang. Despite using automatic tools, he noted that immunologists often introduce new cell-type labels, adding to the difficulty. Annotating cells from scRNA-seq data is also challenging because gene expression levels are continuous rather than discrete, and differences in gene expression do not always correspond to differences in cellular function1.


    Common Mistakes in Single-Cell Analysis
    The complexity of single-cell analysis opens up many possibilities for errors, some of which are easier to detect than others. “People don't realize that there are inherent biases with single-cell sequencing data,” Cribbs noted. He explained that while many incorrectly consider sequencing itself the most common source of sequencing-related errors, another culprit is largely responsible for these issues. “The biggest source of errors is PCR amplification,” Cribbs stated. Some of Cribb’s most recent work in Nature Methods highlighted how PCR errors are detrimental to accurate quantification at the single-cell level2. Results from the study demonstrated that synthesizing unique molecular identifiers with homotrimeric nucleotide blocks helps to provide a solution for error correction and allows for accurate counting of sequenced molecules. Cribbs also emphasized the importance of quality control and cautioned against over-trusting the data without adequate validation.

    In addition to these issues, Tang explained that other common analysis mistakes involve differential gene expression analysis for two conditions (health vs. disease), each with multiple samples. “People just group all the cells from each condition together and do a differential gene expression at the cell level,” Tang noted. “The cells from each sample are not independent and when you use so many cells you get inherently small p-values.” Instead of this approach, Tang advised researchers to use pseudobulk methods, which have been shown to outperform numerous differential expression analysis methods3,4.

    Furthermore, Tang pointed out issues with batch correction or data integration from different datasets. Each method operates under specific assumptions, and data integration can sometimes erase biological signals. This issue was highlighted in a recent pre-print that compared seven batch correction methods for scRNA-seq data, finding that most methods, including MNN, SCVI, and LIGER, introduce artifacts5. The authors recommended Harmony for batch correction due to its consistent minimization of data distortion and preservation of biological integrity.

    Both Tang and Cribbs mentioned the controversy with high-dimensional clustering methods like UMAP. Tang shared that while it is useful for visualization, the distance between points on UMAP does not mean much. “UMAP is a non-linear dimension reduction and one should not read too much into the points on the UMAP,” he emphasized. The debate over dimension reduction techniques was also recently covered in a more nuanced discussion in Nature6. Similar to the advice provided by Tang and Cribbs, the article detailed how researchers should select parameters judiciously, avoid confirmation bias, and recognize that these tools are just starting points for analysis, not definitive conclusions.


    Best Practices for Validating Findings
    Another important aspect of single-cell analysis is ensuring the validity and accuracy of the findings. Tang encourages researchers to use multiple data types and sources of data for proper validation. “For example, validate the scRNAseq data with some protein data,” Tang explained. “If there are publicly available datasets to answer the same question, see if you have the same conclusion with a different dataset.” However, he added that validating through a wet experiment is clearly the gold standard.

    Cribbs noted that validating single-cell data requires linking differential gene expression to functional outputs. He advocated for using temporal processes and paired data for more detailed biological insights, as well as techniques like pseudotime analysis and receptor-ligand interaction studies that can help infer functional consequences. Cribbs also mentioned the need for orthogonal approaches, such as downstream functional assays and CRISPR, to confirm findings. Ultimately, Cribbs suggested forming hypotheses from single-cell data and verifying them through in vitro and in vivo model systems to ensure functional relevance.


    Resources and Recommendations
    For researchers aiming to deepen their understanding of single-cell analysis, Tang encourages R learners to use Bioconductor’s single-cell analysis book, and Python users to take the online course on single-cell best practices from the Theis Lab. Tang also regularly shares advice and guidance through his website and newsletter, which includes a detailed presentation on the best practices and unresolved issues in single-cell analysis.

    Beyond these recommendations, Cribbs advises extensively reading scientific literature to build a strong foundation. This is particularly important for researchers with backgrounds outside of biology because of its complex knowledge requirements. Cribbs stressed that while learning technical skills like coding and statistics isn’t easy, the real challenge is effectively applying them to create meaningful biological narratives. Moreover, integrating different areas of knowledge like molecular biology, statistics, and programming is a significant challenge often mastered only through constant application in Ph.D. programs.

    Tangential to single-cell research, Cribbs highlighted the importance of ethics and data sharing. This has been central to his work on large projects like the Human Cell Atlas (HCA). Given the differing restrictions across counties, understanding how data can be shared and complying with the laws to avoid legal repercussions is an unexpected yet crucial consideration. These factors were also part of a recent discussion where Cribbs and colleagues described the collaborative efforts needed to overcome barriers to single-cell RNA sequencing adoption in low- and middle-income countries7. Finally, Cribbs advised researchers to consult information governance experts to navigate these regulations and noted that while this issue extends beyond single-cell analysis, it is essential for collaboration and advancing science.

    References
    1. Pasquini G, Eduardo J, Schäfer P, Busskamp V. Automated methods for cell type annotation on scRNAseq data. Computational and Structural Biotechnology Journal. 2021;19:961-969. doi:https://doi.org/10.1016/j.csbj.2021.01.015
    2. Sun J, Philpott M, Loi D, et al. Correcting PCR amplification errors in unique molecular identifiers to generate accurate numbers of sequencing molecules. Nature Methods. 2024;21(3):401-405. doi:https://doi.org/10.1038/s41592-024-02168-y
    3. Squair JW, Gautier M, Kathe C, et al. Confronting false discoveries in single-cell differential expression. Nature Communications. 2021;12(1):5692. doi:https://doi.org/10.1038/s41467-021-25960-2
    4. Murphy AE, Skene NG. A balanced measure shows superior performance of pseudobulk methods in single-cell RNA-sequencing analysis. Nature Communications. 2022;13(1):7851. doi:https://doi.org/10.1038/s41467-022-35519-4
    5. Antonsson SE, Melsted P. Batch correction methods used in single cell RNA-sequencing analyses are often poorly calibrated. bioRxiv. Published online January 1, 2024:2024.03.19.585562. doi:https://doi.org/10.1101/2024.03.19.585562
    6. Marx V. Seeing data as tSNE and UMAP do. Nature Methods. Published online 2024. doi:https://doi.org/10.1038/s41592-024-02301-x
    7. Boakye Serebour, T., Cribbs, AP, Baldwin, MJ, et al. Overcoming barriers to single-cell RNA sequencing adoption in low- and middle-income countries. European Journal of Human Genetics. Published online 2024. doi:https://doi.org/10.1038/s41431-024-01564-4
      Please sign into your account to post comments.

    About the Author

    Collapse

    seqadmin Benjamin Atha holds a B.A. in biology from Hood College and an M.S. in biological sciences from Towson University. With over 9 years of hands-on laboratory experience, he's well-versed in next-generation sequencing systems. Ben is currently the editor for SEQanswers. Find out more about seqadmin

    Latest Articles

    Collapse

    • Exploring the Dynamics of the Tumor Microenvironment
      by seqadmin




      The complexity of cancer is clearly demonstrated in the diverse ecosystem of the tumor microenvironment (TME). The TME is made up of numerous cell types and its development begins with the changes that happen during oncogenesis. “Genomic mutations, copy number changes, epigenetic alterations, and alternative gene expression occur to varying degrees within the affected tumor cells,” explained Andrea O’Hara, Ph.D., Strategic Technical Specialist at Azenta. “As...
      07-08-2024, 03:19 PM
    • Exploring Human Diversity Through Large-Scale Omics
      by seqadmin


      In 2003, researchers from the Human Genome Project (HGP) announced the most comprehensive genome to date1. Although the genome wasn’t fully completed until nearly 20 years later2, numerous large-scale projects, such as the International HapMap Project and 1000 Genomes Project, continued the HGP's work, capturing extensive variation and genomic diversity within humans. Recently, newer initiatives have significantly increased in scale and expanded beyond genomics, offering a more detailed...
      06-25-2024, 06:43 AM
    • Best Practices for Single-Cell Sequencing Analysis
      by seqadmin



      While isolating and preparing single cells for sequencing was historically the bottleneck, recent technological advancements have shifted the challenge to data analysis. This highlights the rapidly evolving nature of single-cell sequencing. The inherent complexity of single-cell analysis has intensified with the surge in data volume and the incorporation of diverse and more complex datasets. This article explores the challenges in analysis, examines common pitfalls, offers...
      06-06-2024, 07:15 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Working...
    X