Seqanswers Leaderboard Ad

Collapse

Announcement

Collapse
No announcement yet.

Modern Methods for Phased Genomes

Collapse
X
Collapse
  •  

  • Modern Methods for Phased Genomes

    Click image for larger version  Name:	Chromosomes.jpg Views:	0 Size:	644.8 KB ID:	324937


    Continual advancements in genomic and computational technologies have allowed researchers to construct precise pictures of individual genomes and identify variations that distinguish one genome from another. In order to fully understand the impact of these variants, it is essential to determine their chromosomal context in a process known as phasing. In this article, we'll explore some of the current technologies and approaches fundamental to this process.

    What is phasing?
    “We inherit half our DNA from our mother and half from our father,” explained Jonas Korlach, Ph.D., Chief Scientific Officer at Pacific Biosciences (PacBio). “Each set of this DNA we inherit will contain a unique collection of variants, which is often referred to as a haplotype. When you sequence, phasing refers to the process of identifying those unique variants on each DNA sequencing read and then separating (phasing) those reads into their respective parental haplotypes.”

    The reason that this process is so important is that accurate phasing allows researchers to connect one or more genetic variants on the same parental allele, or gene copy. “This improves the ability to associate genetic differences with disease and disease severity, genetic traits, or to know if someone is a silent carrier of a genetic disease,” noted Korlach. “For example, if you discovered two variants at different locations within a gene that had the potential to disrupt the expression of that gene, it would be important to know if those variants resided on the same copy (one bad and one good copy) or both copies (two bad copies of the gene).”

    Alex Hastie, Ph.D., Vice President of Clinical and Scientific Affairs at Bionano, added that in addition to assessing pathogenicity, “Accurate phasing can enable complex haplotype reconstruction, allowing researchers to discriminate different structures in the most variable and functionally interesting regions of the genome (e.g., MHC, 22q, etc.).”


    Techniques used for phasing
    Long-read sequencing
    Earlier phasing techniques frequently depended on short-read sequencing, and in certain instances, imputation; however, one of the most common current approaches is with long-read sequencing technologies. Researchers are able to directly phase long reads when a single sequence read spans the interval between two genomic variants. Korlach highlighted that PacBio’s sequencers excel at this process because they generate reads that are over 100 times longer than standard short reads and can easily cover the length of these regions.

    After the appropriate sequencing reads are generated, the next steps are to assemble the reads, detect the variants, and phase the haplotypes. There are various tools that can assist with this process. The de novo genome assembler Hifiasm is frequently used for phased assemblies with PacBio HiFi datasets1. Other tools like HiPhase can be used to enhance the phasing of variant calls from whole-genome datasets2. Meanwhile, Paraphase is a tool capable of phasing haplotypes from highly homologous, medically significant genes, such as SMN1/SMN2, within targeted sequencing HiFi datasets3. “Additional strategies can be implemented to help improve phasing such as using sequence data from parents to bin long reads into their respective parental haplotypes during the assembly process, known as trio-binning, and/or by including long-range chromosomal contact information from Hi-C sequencing,” Korlach stated.

    Optical genome mapping
    The optical genome mapping (OGM) technique used by Bionano can also be used for phasing in a manner that is complementary to long-read sequencing. In order to understand how it contributes to this process, Hastie detailed that “OGM utilizes ultra-high molecular weight DNA, with molecules ≥150 kbp at an average (N50) length of ≈250–400 kbp used in the genome assembly. These molecules have a label pattern introduced at a 6-mer sequence occurring every 5 kbp, on average. OGM measures the physical distance between labels and creates a barcode that can be used to create whole chromosome maps of genomes.”



    Click image for larger version  Name:	image.png Views:	0 Size:	151.8 KB ID:	324938
    Figure: Reference genome map (from hg19/hg38) with genes annotated. The blue bar is the map of an individual with a deletion in the dystrophin gene (DMD); this map was created using long molecules with labeled sequence motifs. (Courtesy of Bionano)

    The preserved native length of the molecules allows phasing of structural variant breakpoint(s) and SNPs that impact label motifs to be captured within long individual molecules in the assembly, Hastie added. “OGM adds value as a standalone technique to anchor and span complex repeats and can serve as an orthogonal quality check complement to sequencing-based phasing approaches.”

    To perform phasing with OGM, the analysis is a standardized process that uses labeled, linearized, and imaged DNA molecules. “Phasing with OGM is performed directly by interrogating long (150 Kbp–2 Mbp) contiguous molecules in the assembly for structures that can be resolved with the label patterns,” explained Hastie. “The molecules are assembled into longer maps by overlap tiling across the chromosome and phasing is done whenever there are heterozygous SVs or SNP containing label motifs.”


    Phasing in genomic research
    According to Korlach, a crucial benefit of phasing in genomic research is its potential to produce a reference-quality, haplotype-resolved assembly using PacBio HiFi sequencing reads. He believes that this idea was best conveyed by the Human Pangenome Reference Consortium (HPRC) when they wrote, “We no longer consider collapsed 3-Gbp genome assemblies as state of the art (i.e., one representation of an individual where both haplotypes are merged) but instead consider two genomes for every diploid genome assembled (i.e., 6 Gbp vs. 3 Gbp) where parental haplotypes are phased and fully resolved4.” Korlach elaborated that this presents the genome in its true diploid state as it exists within the cell. This representation improves the detection of small and structural variations, highlights epigenetic attributes like allele-specific methylation and chromatin-accessible areas, and enriches transcriptomics by uncovering allele-specific gene expression.

    Hastie, while discussing advancements beyond OGM and long-read sequencing, pointed to the rise of various complementary technologies that facilitate genome phasing. These include linked-read sequencing, Hi-C-based conformation capture sequencing, Strand-seq, and trio sequencing. Like the HPRC, he stressed that the phasing of distinct haplotypes carries the potential for capturing organisms’ full genomic diversities more accurately by avoiding the collapse of two alleles into a single hybrid allele. “At scale, this pangenome view informs population genomics and enables applications from human health to conservation genomics,” he said.

    While there have been many advances in recent years, Korlach emphasized that some of the most exciting have been on the data analysis side. “Fully haplotype-resolved datasets have enabled the construction of large pangenomes that better catalog the genetic diversity within populations. For example, the recent releases of the first draft human pangenome5 and a first regional Chinese pangenome6 dataset, alongside a pangenome bioinformatics tool kit7, have shown improvements in reference-based sequence mapping and variant calling workflows and will hopefully replace current workflows using a single reference genome such as GRCh38 in the future.”

    Looking ahead, Korlach envisions a future where long-read sequencers like the Revio sequencing system play a pivotal role. These instruments could accelerate the construction of larger and more comprehensive pangenome datasets, and by enabling researchers to scale their long-read workflows, they will be able to generate more haplotype-resolved assemblies.

    References
    1. Yu W, Luo H, Yang J, et al. Comprehensive assessment of eleven de novo HiFi assemblers on complex eukaryotic genomes and metagenomes. bioRxiv. Published online 2023. doi:https://doi.org/10.1101/2023.06.29.546998
    2. Holt JM, Saunders CT, Rowell WJ, Kronenberg Z, Wenger AM, Eberle M. HiPhase: Jointly phasing small and structural variants from HiFi sequencing. bioRxiv. Published online 2023. doi:https://doi.org/10.1101/2023.05.03.539241
    3. Chen X, Harting J, Farrow E, et al. Comprehensive SMN1 and SMN2 profiling for spinal muscular atrophy analysis using long-read PacBio HiFi sequencing. The American Journal of Human Genetics. 2023;110(2):240-250. doi:https://doi.org/10.1016/j.ajhg.2023.01.001
    4. Porubsky D, Vollger MR, Harvey WT, et al. Gaps and complex structurally variant loci in phased genome assemblies. Genome Research. 2023;33:496-510. doi:https://doi.org/10.1101/gr.277334.122
    5. Liao W, Asri M, Ebler J, et al. A draft human pangenome reference. Nature. 2023;617(7960):312-324. doi:https://doi.org/10.1038/s41586-023-05896-x
    6. Gao Y, Yang X, Chen H, et al. A pangenome reference of 36 Chinese populations. Nature. 2023;619(7968):112-121. doi:https://doi.org/10.1038/s41586-023-06173-7
    7. Chin C, Behera S, Khalak A, et al. Multiscale analysis of pangenomes enables improved representation of genomic diversity for repetitive and clinically relevant genes. Nature Methods. 2023;20(8):1213-1221. doi:https://doi.org/10.1038/s41592-023-01914-y


      Please sign into your account to post comments.

    About the Author

    Collapse

    seqadmin Benjamin Atha holds a B.A. in biology from Hood College and an M.S. in biological sciences from Towson University. With over 9 years of hands-on laboratory experience, he's well-versed in next-generation sequencing systems. Ben is currently the editor for SEQanswers. Find out more about seqadmin

    Latest Articles

    Collapse

    • Essential Discoveries and Tools in Epitranscriptomics
      by seqadmin




      The field of epigenetics has traditionally concentrated more on DNA and how changes like methylation and phosphorylation of histones impact gene expression and regulation. However, our increased understanding of RNA modifications and their importance in cellular processes has led to a rise in epitranscriptomics research. “Epitranscriptomics brings together the concepts of epigenetics and gene expression,” explained Adrien Leger, PhD, Principal Research Scientist...
      04-22-2024, 07:01 AM
    • Current Approaches to Protein Sequencing
      by seqadmin


      Proteins are often described as the workhorses of the cell, and identifying their sequences is key to understanding their role in biological processes and disease. Currently, the most common technique used to determine protein sequences is mass spectrometry. While still a valuable tool, mass spectrometry faces several limitations and requires a highly experienced scientist familiar with the equipment to operate it. Additionally, other proteomic methods, like affinity assays, are constrained...
      04-04-2024, 04:25 PM
    • Strategies for Sequencing Challenging Samples
      by seqadmin


      Despite advancements in sequencing platforms and related sample preparation technologies, certain sample types continue to present significant challenges that can compromise sequencing results. Pedro Echave, Senior Manager of the Global Business Segment at Revvity, explained that the success of a sequencing experiment ultimately depends on the amount and integrity of the nucleic acid template (RNA or DNA) obtained from a sample. “The better the quality of the nucleic acid isolated...
      03-22-2024, 06:39 AM

    ad_right_rmr

    Collapse

    News

    Collapse

    Working...
    X