Header Leaderboard Ad


A Brief Overview and Common Challenges in Single-cell Sequencing Analysis



No announcement yet.

  • A Brief Overview and Common Challenges in Single-cell Sequencing Analysis

    Click image for larger version  Name:	Single Cell.jpg Views:	0 Size:	531.8 KB ID:	323712

    ​​​​​​The introduction of single-cell sequencing has advanced the ability to study cell-to-cell heterogeneity. Its use has improved our understanding of somatic mutations1, cell lineages2, cellular diversity and regulation3, and development in multicellular organisms4.

    Single-cell sequencing encompasses hundreds of techniques with different approaches to studying the genomes, transcriptomes, epigenomes, and other omics of individual cells. The analysis of single-cell sequencing data is a complex process, and many papers have explained this procedure in depth5-9. Instead of detailing the intricacies of the analysis for particular techniques, this article will cover the basic process, challenges, and recommendations to complete a successful analysis of single-cell sequencing data.

    Basic Process of Data Analysis

    The first step of the data analysis process is basecalling, which can occur in parallel or immediately after the sequencing run. Basecalling varies by sequencing platform, and the resulting data files may need to be converted to another format for further processing, such as FASTQ. Once the files are in the appropriate format, the indexes, unique molecular identifiers (UMI), and other molecular barcodes are used to demultiplex the reads and remove unnecessary duplicates.

    The next steps are to perform alignments and review the QC metrics applicable to the scope of the study. Standard QC metrics include expected library size, duplicate rates, number of reads per cell, number of genes detected per cell, and reads aligned to the mitochondria or ribosome. Cells and reads with low quality, along with contamination, are identified during the QC process and then filtered out of further analysis.

    After quality control, the analysis pipeline varies considerably based on the requirements of the study. In many single-cell RNA sequencing (scRNA-seq) projects, the next steps are often normalization methods and differential expression analysis; meanwhile, single-cell DNA sequencing (scDNA-seq) workflows may use a variety of techniques to identify SNPs or DNA modifications.

    The final portion of the analysis involves data visualization and interpretation. The best data visualization methods fit the study’s aim and reinforce the interpretation. For example, tSNE (t-distributed stochastic neighbor embedding), UMAP (uniform manifold approximation and projection), and PCA (principal component analysis) are common methods for analyzing and visualizing scRNA-seq datasets but are used for highlighting different aspects of the results.

    Single-cell analysis can be simplified by using popular tools like Seurat10 and SCANPY11 which were developed to process and analyze the data throughout the entire workflow. The expansion of analysis techniques has provided researchers with many free and commercial options for analyzing sequencing data; however, some tools may require the use of programming languages such as R or Python.

    Challenges and Considerations

    The success of single-cell sequencing has facilitated many significant discoveries and continues to improve our understanding of the genome, epigenome, transcriptome, and other important biological components. While beneficial in many experiments, single-cell sequencing can introduce many challenges that increase the difficulty of data analysis. The following are common challenges along with the recommended solutions.

    Challenge Solution
    Bulk sequencing versus single-cell Single-cell sequencing analysis is often more complex than bulk sequencing, and requires larger storage, higher amounts of memory, and more time to process the data and run the analysis. Choose bulk sequencing if the study doesn’t require single-cell resolution. Bulk sequencing is still a valuable resource and the preparation and analysis are typically much less demanding.
    Biological variability Cell cycle12 and transcriptional bursting13 are examples of biological variability that can affect the results if they are unrelated to the scope of the study. Use biological and technical replicates to reduce variability within the experiment. Also, consider computational pipelines that help detect biological variability14.
    Amplification errors Amplification of the target nucleic acid is typically required due to the low abundance of DNA and RNA in individual cells. The amplification step is known to introduce errors that accumulate during library prep and sequencing15,16. Select a polymerase with a lower error rate and use bioinformatic tools that assist in detecting and removing amplification errors. In addition, utilize cell barcodes and unique molecular identifiers (UMI), which can also be used to identify amplification errors.
    Amplification bias Uneven amplification during sample preparation can cause regions in the target DNA or RNA to be overrepresented, underrepresented, or never amplified17,18. In scRNA-seq experiments, it is difficult to determine whether unamplified transcripts were not expressed or simply undetected. Use bioinformatic tools that reduce the noise, detect outliers, and impute missing values from reference sequences19,20. Furthermore, review the amplification strategy and ensure the polymerase is best suited for covering the regions of interest.
    Batch effects Batch effects occur when variability in the experimental data is the result of non-biological factors, which can affect the analysis and interpretation. This variability is often the result of changes during extractions, library preparations, sequencing, and cell culturing, or when these tasks are performed at different times. Reduce possible batch effects by performing cell culturing, extractions, library preparations, and sequencing together, along with using single-cell analysis tools14,21,22 that adjust for these variations.
    Low depth and coverage The low input of DNA and RNA from individual cells makes them susceptible to low and non-uniform coverage. Amplification errors and bias, low capture efficiency, and technical variability are common causes of non-uniform coverage, which can interfere with the results of the analysis. Determine the number of reads, number of cells, and the depth and coverage needed to perform the analysis. Confirm the cell isolation, amplification, and library preparation strategies that will generate the appropriate data needed for analysis. Plan to obtain more data than needed due to any variability in the experiment.
    Future Directions and Conclusion

    Recent developments in single-cell analysis involve the incorporation of two or more techniques for a multi-omic approach23,24. Single-cell multi-omic analyses are often more rigorous but also provide a more comprehensive view than individual methods alone. The workflows that combine data sets have been used to advance our understanding of how the different omics of the cell interact and regulate each other25.

    Single-cell sequencing analysis is a complex process that involves many challenges including batch effects, biological variability, low depth and coverage, and problems during amplification. These challenges can often be avoided by proper planning, following good experimental practices, and using up-to-date bioinformatic tools. Although there are no official guidelines for analysis, following these recommendations and other accepted practices is crucial for accurately analyzing and interpreting single-cell sequencing data. Like all sequencing analysis methods, the current practices will gradually change as new developments are made and it is recommended to regularly check for updated bioinformatics tools and published practices.


    1. Huang Z, Sun S, Lee M, et al. Single-cell analysis of somatic mutations in human bronchial epithelial cells in relation to aging and smoking. Nature Genetics. 2022;54(4):492-498. doi:10.1038/s41588-022-01035-w

    2. Elorbany R, Popp JM, Rhodes K, et al. Single-cell sequencing reveals lineage-specific dynamic genetic regulation of gene expression during human cardiomyocyte differentiation. Li M, ed. PLOS Genetics. 2022;18(1):e1009666. doi:10.1371/journal.pgen.1009666

    3. Sebé-Pedrós A, Saudemont B, Chomsky E, et al. Cnidarian Cell Type Diversity and Regulation Revealed by Whole-Organism Single-Cell RNA-Seq. Cell. 2018;173(6):1520-1534.e20. doi:10.1016/j.cell.2018.05.019

    4. Wang P, Chen Y, Yong J, et al. Dissecting the Global Dynamic Molecular Profiles of Human Fetal Kidney Development by Single-Cell RNA Sequencing. Cell Reports. 2018;24(13):3554-3567.e3. doi:10.1016/j.celrep.2018.08.056

    5. Nayak R, Hasija Y. A hitchhiker’s guide to single-cell transcriptomics and data analysis pipelines. Genomics. 2021;113(2):606-619. doi:10.1016/j.ygeno.2021.01.007

    6. Yan F, Powell DR, Curtis DJ, Wong NC. From reads to insight: a hitchhiker’s guide to ATAC-seq data analysis. Genome Biology. 2020;21(1). doi:10.1186/s13059-020-1929-3

    7. Kashima Y, Sakamoto Y, Kaneko K, Seki M, Suzuki Y, Suzuki A. Single-cell sequencing techniques from individual to multiomics analyses. Experimental & Molecular Medicine. 2020;52(9):1419-1427. doi:10.1038/s12276-020-00499-2

    8. Stuart T, Satija R. Integrative single-cell analysis. Nature Reviews Genetics. 2019;20(5):257-272. doi:10.1038/s41576-019-0093-7

    9. Slovin S, Carissimo A, Panariello F, et al. Single-Cell RNA Sequencing Analysis: A Step-by-Step Overview. Methods in Molecular Biology. 2021;2284:343-365. doi:10.1007/978-1-0716-1307-8_19

    10. Hao Y, Hao S, Andersen-Nissen E, et al. Integrated analysis of multimodal single-cell data. Cell. 2021;184(13):3573-3587. doi:https://doi.org/10.1016/j.cell.2021.04.048.

    11. Wolf FA, Angerer P, Theis FJ. SCANPY: large-scale single-cell gene expression data analysis. Genome Biology. 2018;19(1). doi:10.1186/s13059-017-1382-0

    12. Riba A, Oravecz A, Durik M, et al. Cell cycle gene regulation dynamics revealed by RNA velocity and deep-learning. Nature Communications. 2022;13(1). doi:10.1038/s41467-022-30545-8

    13. Tunnacliffe E, Chubb JR. What Is a Transcriptional Burst? Trends in Genetics. 2020;36(4):288-297. doi:10.1016/j.tig.2020.01.003

    14. Chu SK, Zhao S, Shyr Y, Liu Q. Comprehensive evaluation of noise reduction methods for single-cell RNA sequencing data. Briefings in Bioinformatics. 2022;23(2). doi:https://doi.org/10.1093/bib/bbab565

    15. Ning L, Liu G, Li G, Hou Y, Tong Y, He J. Current Challenges in the Bioinformatics of Single Cell Genomics. Frontiers in Oncology. 2014;4. doi:10.3389/fonc.2014.00007

    16. Huang L, Ma F, Chapman A, Lu S, Xie XS. Single-Cell Whole-Genome Amplification and Sequencing: Methodology and Applications. Annual Review of Genomics and Human Genetics. 2015;16(1):79-102. doi:10.1146/annurev-genom-090413-025352

    17. Navin NE. Cancer genomics: one cell at a time. Genome Biology. 2014;15(8). doi:10.1186/s13059-014-0452-9

    18. Valecha M, Posada D. Somatic variant calling from single-cell DNA sequencing data. Computational and Structural Biotechnology Journal. 2022;20:2978-2985. doi:10.1016/j.csbj.2022.06.013

    19. Das S, Abecasis GR, Browning BL. Genotype Imputation from Large Reference Panels. Annual Review of Genomics and Human Genetics. 2018;19(1):73-96. doi:10.1146/annurev-genom-083117-021602

    20. Li WV, Li JJ. An accurate and robust imputation method scImpute for single-cell RNA-seq data. Nature Communications. 2018;9(1). doi:10.1038/s41467-018-03405-7

    21. Tran HTN, Ang KS, Chevrier M, et al. A benchmark of batch-effect correction methods for single-cell RNA sequencing data. Genome Biology. 2020;21(1). doi:10.1186/s13059-019-1850-9

    22. Haghverdi L, Lun ATL, Morgan MD, Marioni JC. Batch effects in single-cell RNA-sequencing data are corrected by matching mutual nearest neighbors. Nature Biotechnology. 2018;36(5):421-427. doi:10.1038/nbt.4091

    23. Lee J, Hyeon DY, Hwang D. Single-cell multiomics: technologies and data analysis methods. Experimental & Molecular Medicine. 2020;52(9):1428-1442. doi:10.1038/s12276-020-0420-2

    24. Dimitriu MA, Lazar-Contes I, Roszkowski M, Mansuy IM. Single-Cell Multiomics Techniques: From Conception to Applications. Frontiers in Cell and Developmental Biology. 2022;10. doi:10.3389/fcell.2022.854317

    25. Hou Y, Guo H, Cao C, et al. Single-cell triple omics sequencing reveals genetic, epigenetic, and transcriptomic heterogeneity in hepatocellular carcinomas. Cell Research. 2016;26(3):304-319. doi:10.1038/cr.2016.23

      Please sign into your account to post comments.





    Latest Articles


    • A Brief Overview and Common Challenges in Single-cell Sequencing Analysis
      by seqadmin

      ​​​​​​The introduction of single-cell sequencing has advanced the ability to study cell-to-cell heterogeneity. Its use has improved our understanding of somatic mutations1, cell lineages2, cellular diversity and regulation3, and development in multicellular organisms4. Single-cell sequencing encompasses hundreds of techniques with different approaches to studying the genomes, transcriptomes, epigenomes, and other omics of individual cells. The analysis of single-cell sequencing data i...

      01-24-2023, 01:19 PM
    • Introduction to Single-Cell Sequencing
      by seqadmin
      Single-cell sequencing is a technique used to investigate the genome, transcriptome, epigenome, and other omics of individual cells using high-throughput sequencing. This technology has provided many scientific breakthroughs and continues to be applied across many fields, including microbiology, oncology, immunology, neurobiology, precision medicine, and stem cell research.

      The advancement of single-cell sequencing began in 2009 when Tang et al. investigated the single-cell transcriptomes
      01-09-2023, 03:10 PM
    • AVITI from Element Biosciences: Latest Sequencing Technologies—Part 6
      by seqadmin
      Element Biosciences made its sequencing market debut this year when it released AVITI, its first sequencer. The AVITI System uses avidity sequencing, a novel sequencing chemistry that delivers higher quality data, decreases cycle times, and requires lower reagent concentrations. This new instrument reportedly features lower operating and start-up costs while maintaining quality sequencing.

      Read type and length
      AVITI is a short-read benchtop sequencer that also offers an innovative...
      12-29-2022, 10:44 AM
    • DNBSEQ-G400 from Complete Genomics: Latest Sequencing Technologies—Part 5
      by seqadmin
      Complete Genomics (an MGI company) released the DNBSEQ-G400 into the U.S. market this summer. DNBSEQ-G400 is a benchtop sequencer that uses MGI’S DNA Nanoball sequencing technology (DNBSEQ) that boasts low index hopping, low amplification bias, and low PCR amplification error accumulation. In addition, the instrument is capable of using an improved antibody-based chemistry, CoolMPS sequencing reagent, that reduces errors and improves the sequencing quality.

      Read type and length...
      12-21-2022, 12:39 PM
    • Onso from Pacific Biosciences: Latest Sequencing Technologies—Part 4
      by seqadmin
      Onso was the second sequencer Pacific Biosciences (PacBio) revealed during this year’s American Society of Human Genetics meeting (read about the first sequencer here). This benchtop sequencer uses sequencing by binding (SBB), a method unlike any other instrument on the market. SBB incorporates native nucleotides, has reduced molecular scarring, and reportedly results in significantly higher accuracy base calls than traditional short-read sequencers. The following sections highlight the important...
      12-19-2022, 10:32 AM
    • Revio from Pacific Biosciences: Latest Sequencing Technologies—Part 3
      by seqadmin
      Pacific Biosciences (PacBio) revealed two new sequencers during this year’s American Society of Human Genetics meeting. The first instrument, Revio, which will be covered in this article, is a significant upgrade from their previous Sequel II device. It has lower costs, shorter run times, more SMRT Cells and zero-mode waveguides (ZMW), and significantly higher output compared to PacBio’s other sequencers.

      Read type and length
      As a part of PacBio’s long-read sequencing systems,...
      12-14-2022, 09:32 AM