Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.

A Brief Overview and Common Challenges in Single-cell Sequencing Analysis

Collapse
X
Collapse
  •  

  • A Brief Overview and Common Challenges in Single-cell Sequencing Analysis

    Click image for larger version  Name:	Single Cell.jpg Views:	0 Size:	531.8 KB ID:	323712


    ​​​​​​The introduction of single-cell sequencing has advanced the ability to study cell-to-cell heterogeneity. Its use has improved our understanding of somatic mutations1, cell lineages2, cellular diversity and regulation3, and development in multicellular organisms4.

    Single-cell sequencing encompasses hundreds of techniques with different approaches to studying the genomes, transcriptomes, epigenomes, and other omics of individual cells. The analysis of single-cell sequencing data is a complex process, and many papers have explained this procedure in depth5-9. Instead of detailing the intricacies of the analysis for particular techniques, this article will cover the basic process, challenges, and recommendations to complete a successful analysis of single-cell sequencing data.

    Basic Process of Data Analysis

    The first step of the data analysis process is basecalling, which can occur in parallel or immediately after the sequencing run. Basecalling varies by sequencing platform, and the resulting data files may need to be converted to another format for further processing, such as FASTQ. Once the files are in the appropriate format, the indexes, unique molecular identifiers (UMI), and other molecular barcodes are used to demultiplex the reads and remove unnecessary duplicates.

    The next steps are to perform alignments and review the QC metrics applicable to the scope of the study. Standard QC metrics include expected library size, duplicate rates, number of reads per cell, number of genes detected per cell, and reads aligned to the mitochondria or ribosome. Cells and reads with low quality, along with contamination, are identified during the QC process and then filtered out of further analysis.

    After quality control, the analysis pipeline varies considerably based on the requirements of the study. In many single-cell RNA sequencing (scRNA-seq) projects, the next steps are often normalization methods and differential expression analysis; meanwhile, single-cell DNA sequencing (scDNA-seq) workflows may use a variety of techniques to identify SNPs or DNA modifications.

    The final portion of the analysis involves data visualization and interpretation. The best data visualization methods fit the study’s aim and reinforce the interpretation. For example, tSNE (t-distributed stochastic neighbor embedding), UMAP (uniform manifold approximation and projection), and PCA (principal component analysis) are common methods for analyzing and visualizing scRNA-seq datasets but are used for highlighting different aspects of the results.

    Single-cell analysis can be simplified by using popular tools like Seurat10 and SCANPY11 which were developed to process and analyze the data throughout the entire workflow. The expansion of analysis techniques has provided researchers with many free and commercial options for analyzing sequencing data; however, some tools may require the use of programming languages such as R or Python.

    Challenges and Considerations

    The success of single-cell sequencing has facilitated many significant discoveries and continues to improve our understanding of the genome, epigenome, transcriptome, and other important biological components. While beneficial in many experiments, single-cell sequencing can introduce many challenges that increase the difficulty of data analysis. The following are common challenges along with the recommended solutions.

    Challenge Solution
    Bulk sequencing versus single-cell Single-cell sequencing analysis is often more complex than bulk sequencing, and requires larger storage, higher amounts of memory, and more time to process the data and run the analysis. Choose bulk sequencing if the study doesn’t require single-cell resolution. Bulk sequencing is still a valuable resource and the preparation and analysis are typically much less demanding.
    Biological variability Cell cycle12 and transcriptional bursting13 are examples of biological variability that can affect the results if they are unrelated to the scope of the study. Use biological and technical replicates to reduce variability within the experiment. Also, consider computational pipelines that help detect biological variability14.
    Amplification errors Amplification of the target nucleic acid is typically required due to the low abundance of DNA and RNA in individual cells. The amplification step is known to introduce errors that accumulate during library prep and sequencing15,16. Select a polymerase with a lower error rate and use bioinformatic tools that assist in detecting and removing amplification errors. In addition, utilize cell barcodes and unique molecular identifiers (UMI), which can also be used to identify amplification errors.
    Amplification bias Uneven amplification during sample preparation can cause regions in the target DNA or RNA to be overrepresented, underrepresented, or never amplified17,18. In scRNA-seq experiments, it is difficult to determine whether unamplified transcripts were not expressed or simply undetected. Use bioinformatic tools that reduce the noise, detect outliers, and impute missing values from reference sequences19,20. Furthermore, review the amplification strategy and ensure the polymerase is best suited for covering the regions of interest.
    Batch effects Batch effects occur when variability in the experimental data is the result of non-biological factors, which can affect the analysis and interpretation. This variability is often the result of changes during extractions, library preparations, sequencing, and cell culturing, or when these tasks are performed at different times. Reduce possible batch effects by performing cell culturing, extractions, library preparations, and sequencing together, along with using single-cell analysis tools14,21,22 that adjust for these variations.
    Low depth and coverage The low input of DNA and RNA from individual cells makes them susceptible to low and non-uniform coverage. Amplification errors and bias, low capture efficiency, and technical variability are common causes of non-uniform coverage, which can interfere with the results of the analysis. Determine the number of reads, number of cells, and the depth and coverage needed to perform the analysis. Confirm the cell isolation, amplification, and library preparation strategies that will generate the appropriate data needed for analysis. Plan to obtain more data than needed due to any variability in the experiment.
    Future Directions and Conclusion

    Recent developments in single-cell analysis involve the incorporation of two or more techniques for a multi-omic approach23,24. Single-cell multi-omic analyses are often more rigorous but also provide a more comprehensive view than individual methods alone. The workflows that combine data sets have been used to advance our understanding of how the different omics of the cell interact and regulate each other25.

    Single-cell sequencing analysis is a complex process that involves many challenges including batch effects, biological variability, low depth and coverage, and problems during amplification. These challenges can often be avoided by proper planning, following good experimental practices, and using up-to-date bioinformatic tools. Although there are no official guidelines for analysis, following these recommendations and other accepted practices is crucial for accurately analyzing and interpreting single-cell sequencing data. Like all sequencing analysis methods, the current practices will gradually change as new developments are made and it is recommended to regularly check for updated bioinformatics tools and published practices.

    References

    1. Huang Z, Sun S, Lee M, et al. Single-cell analysis of somatic mutations in human bronchial epithelial cells in relation to aging and smoking. Nature Genetics. 2022;54(4):492-498. doi:10.1038/s41588-022-01035-w

    2. Elorbany R, Popp JM, Rhodes K, et al. Single-cell sequencing reveals lineage-specific dynamic genetic regulation of gene expression during human cardiomyocyte differentiation. Li M, ed. PLOS Genetics. 2022;18(1):e1009666. doi:10.1371/journal.pgen.1009666

    3. Sebé-Pedrós A, Saudemont B, Chomsky E, et al. Cnidarian Cell Type Diversity and Regulation Revealed by Whole-Organism Single-Cell RNA-Seq. Cell. 2018;173(6):1520-1534.e20. doi:10.1016/j.cell.2018.05.019

    4. Wang P, Chen Y, Yong J, et al. Dissecting the Global Dynamic Molecular Profiles of Human Fetal Kidney Development by Single-Cell RNA Sequencing. Cell Reports. 2018;24(13):3554-3567.e3. doi:10.1016/j.celrep.2018.08.056

    5. Nayak R, Hasija Y. A hitchhiker’s guide to single-cell transcriptomics and data analysis pipelines. Genomics. 2021;113(2):606-619. doi:10.1016/j.ygeno.2021.01.007

    6. Yan F, Powell DR, Curtis DJ, Wong NC. From reads to insight: a hitchhiker’s guide to ATAC-seq data analysis. Genome Biology. 2020;21(1). doi:10.1186/s13059-020-1929-3

    7. Kashima Y, Sakamoto Y, Kaneko K, Seki M, Suzuki Y, Suzuki A. Single-cell sequencing techniques from individual to multiomics analyses. Experimental & Molecular Medicine. 2020;52(9):1419-1427. doi:10.1038/s12276-020-00499-2

    8. Stuart T, Satija R. Integrative single-cell analysis. Nature Reviews Genetics. 2019;20(5):257-272. doi:10.1038/s41576-019-0093-7

    9. Slovin S, Carissimo A, Panariello F, et al. Single-Cell RNA Sequencing Analysis: A Step-by-Step Overview. Methods in Molecular Biology. 2021;2284:343-365. doi:10.1007/978-1-0716-1307-8_19

    10. Hao Y, Hao S, Andersen-Nissen E, et al. Integrated analysis of multimodal single-cell data. Cell. 2021;184(13):3573-3587. doi:https://doi.org/10.1016/j.cell.2021.04.048.

    11. Wolf FA, Angerer P, Theis FJ. SCANPY: large-scale single-cell gene expression data analysis. Genome Biology. 2018;19(1). doi:10.1186/s13059-017-1382-0

    12. Riba A, Oravecz A, Durik M, et al. Cell cycle gene regulation dynamics revealed by RNA velocity and deep-learning. Nature Communications. 2022;13(1). doi:10.1038/s41467-022-30545-8

    13. Tunnacliffe E, Chubb JR. What Is a Transcriptional Burst? Trends in Genetics. 2020;36(4):288-297. doi:10.1016/j.tig.2020.01.003

    14. Chu SK, Zhao S, Shyr Y, Liu Q. Comprehensive evaluation of noise reduction methods for single-cell RNA sequencing data. Briefings in Bioinformatics. 2022;23(2). doi:https://doi.org/10.1093/bib/bbab565

    15. Ning L, Liu G, Li G, Hou Y, Tong Y, He J. Current Challenges in the Bioinformatics of Single Cell Genomics. Frontiers in Oncology. 2014;4. doi:10.3389/fonc.2014.00007

    16. Huang L, Ma F, Chapman A, Lu S, Xie XS. Single-Cell Whole-Genome Amplification and Sequencing: Methodology and Applications. Annual Review of Genomics and Human Genetics. 2015;16(1):79-102. doi:10.1146/annurev-genom-090413-025352

    17. Navin NE. Cancer genomics: one cell at a time. Genome Biology. 2014;15(8). doi:10.1186/s13059-014-0452-9

    18. Valecha M, Posada D. Somatic variant calling from single-cell DNA sequencing data. Computational and Structural Biotechnology Journal. 2022;20:2978-2985. doi:10.1016/j.csbj.2022.06.013

    19. Das S, Abecasis GR, Browning BL. Genotype Imputation from Large Reference Panels. Annual Review of Genomics and Human Genetics. 2018;19(1):73-96. doi:10.1146/annurev-genom-083117-021602

    20. Li WV, Li JJ. An accurate and robust imputation method scImpute for single-cell RNA-seq data. Nature Communications. 2018;9(1). doi:10.1038/s41467-018-03405-7

    21. Tran HTN, Ang KS, Chevrier M, et al. A benchmark of batch-effect correction methods for single-cell RNA sequencing data. Genome Biology. 2020;21(1). doi:10.1186/s13059-019-1850-9

    22. Haghverdi L, Lun ATL, Morgan MD, Marioni JC. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nature Biotechnology. 2018;36(5):421-427. doi:10.1038/nbt.4091

    23. Lee J, Hyeon DY, Hwang D. Single-cell multiomics: technologies and data analysis methods. Experimental & Molecular Medicine. 2020;52(9):1428-1442. doi:10.1038/s12276-020-0420-2

    24. Dimitriu MA, Lazar-Contes I, Roszkowski M, Mansuy IM. Single-Cell Multiomics Techniques: From Conception to Applications. Frontiers in Cell and Developmental Biology. 2022;10. doi:10.3389/fcell.2022.854317

    25. Hou Y, Guo H, Cao C, et al. Single-cell triple omics sequencing reveals genetic, epigenetic, and transcriptomic heterogeneity in hepatocellular carcinomas. Cell Research. 2016;26(3):304-319. doi:10.1038/cr.2016.23

      Please sign into your account to post comments.

    About the Author

    Collapse

    seqadmin Benjamin Atha holds a B.A. in biology from Hood College and an M.S. in biological sciences from Towson University. With over 9 years of hands-on laboratory experience, he's well-versed in next-generation sequencing systems. Ben is currently the editor for SEQanswers. Find out more about seqadmin

    Latest Articles

    Collapse

    • Techniques and Challenges in Conservation Genomics
      by seqadmin



      The field of conservation genomics centers on applying genomics technologies in support of conservation efforts and the preservation of biodiversity. This article features interviews with two researchers who showcase their innovative work and highlight the current state and future of conservation genomics.

      Avian Conservation
      Matthew DeSaix, a recent doctoral graduate from Kristen Ruegg’s lab at The University of Colorado, shared that most of his research...
      03-08-2024, 10:41 AM
    • The Impact of AI in Genomic Medicine
      by seqadmin



      Artificial intelligence (AI) has evolved from a futuristic vision to a mainstream technology, highlighted by the introduction of tools like OpenAI's ChatGPT and Google's Gemini. In recent years, AI has become increasingly integrated into the field of genomics. This integration has enabled new scientific discoveries while simultaneously raising important ethical questions1. Interviews with two researchers at the center of this intersection provide insightful perspectives into...
      02-26-2024, 02:07 PM
    • Multiomics Techniques Advancing Disease Research
      by seqadmin


      New and advanced multiomics tools and technologies have opened new avenues of research and markedly enhanced various disciplines such as disease research and precision medicine1. The practice of merging diverse data from various ‘omes increasingly provides a more holistic understanding of biological systems. As Maddison Masaeli, Co-Founder and CEO at Deepcell, aptly noted, “You can't explain biology in its complex form with one modality.”

      A major leap in the field has
      ...
      02-08-2024, 06:33 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Working...
    X