Modern Methods for Phased Genomes

Modern Methods for Phased Genomes

Continual advancements in genomic and computational technologies have allowed researchers to construct precise pictures of individual genomes and identify variations that distinguish one genome from another. In order to fully understand the impact of these variants, it is essential to determine their chromosomal context in a process known as phasing. In this article, we'll explore some of the current technologies and approaches fundamental to this process.

What is phasing?
“We inherit half our DNA from our mother and half from our father,” explained Jonas Korlach, Ph.D., Chief Scientific Officer at Pacific Biosciences (PacBio). “Each set of this DNA we inherit will contain a unique collection of variants, which is often referred to as a haplotype. When you sequence, phasing refers to the process of identifying those unique variants on each DNA sequencing read and then separating (phasing) those reads into their respective parental haplotypes.”

The reason that this process is so important is that accurate phasing allows researchers to connect one or more genetic variants on the same parental allele, or gene copy. “This improves the ability to associate genetic differences with disease and disease severity, genetic traits, or to know if someone is a silent carrier of a genetic disease,” noted Korlach. “For example, if you discovered two variants at different locations within a gene that had the potential to disrupt the expression of that gene, it would be important to know if those variants resided on the same copy (one bad and one good copy) or both copies (two bad copies of the gene).”

Alex Hastie, Ph.D., Vice President of Clinical and Scientific Affairs at Bionano, added that in addition to assessing pathogenicity, “Accurate phasing can enable complex haplotype reconstruction, allowing researchers to discriminate different structures in the most variable and functionally interesting regions of the genome (e.g., MHC, 22q, etc.).”

Techniques used for phasing
Long-read sequencing
Earlier phasing techniques frequently depended on short-read sequencing, and in certain instances, imputation; however, one of the most common current approaches is with long-read sequencing technologies. Researchers are able to directly phase long reads when a single sequence read spans the interval between two genomic variants. Korlach highlighted that PacBio’s sequencers excel at this process because they generate reads that are over 100 times longer than standard short reads and can easily cover the length of these regions.

After the appropriate sequencing reads are generated, the next steps are to assemble the reads, detect the variants, and phase the haplotypes. There are various tools that can assist with this process. The de novo genome assembler Hifiasm is frequently used for phased assemblies with PacBio HiFi datasets¹. Other tools like HiPhase can be used to enhance the phasing of variant calls from whole-genome datasets². Meanwhile, Paraphase is a tool capable of phasing haplotypes from highly homologous, medically significant genes, such as SMN1/SMN2, within targeted sequencing HiFi datasets³. “Additional strategies can be implemented to help improve phasing such as using sequence data from parents to bin long reads into their respective parental haplotypes during the assembly process, known as trio-binning, and/or by including long-range chromosomal contact information from Hi-C sequencing,” Korlach stated.

Optical genome mapping
The optical genome mapping (OGM) technique used by Bionano can also be used for phasing in a manner that is complementary to long-read sequencing. In order to understand how it contributes to this process, Hastie detailed that “OGM utilizes ultra-high molecular weight DNA, with molecules ≥150 kbp at an average (N50) length of ≈250–400 kbp used in the genome assembly. These molecules have a label pattern introduced at a 6-mer sequence occurring every 5 kbp, on average. OGM measures the physical distance between labels and creates a barcode that can be used to create whole chromosome maps of genomes.”

Figure: Reference genome map (from hg19/hg38) with genes annotated. The blue bar is the map of an individual with a deletion in the dystrophin gene (DMD); this map was created using long molecules with labeled sequence motifs. (Courtesy of Bionano)

The preserved native length of the molecules allows phasing of structural variant breakpoint(s) and SNPs that impact label motifs to be captured within long individual molecules in the assembly, Hastie added. “OGM adds value as a standalone technique to anchor and span complex repeats and can serve as an orthogonal quality check complement to sequencing-based phasing approaches.”

To perform phasing with OGM, the analysis is a standardized process that uses labeled, linearized, and imaged DNA molecules. “Phasing with OGM is performed directly by interrogating long (150 Kbp–2 Mbp) contiguous molecules in the assembly for structures that can be resolved with the label patterns,” explained Hastie. “The molecules are assembled into longer maps by overlap tiling across the chromosome and phasing is done whenever there are heterozygous SVs or SNP containing label motifs.”

Phasing in genomic research
According to Korlach, a crucial benefit of phasing in genomic research is its potential to produce a reference-quality, haplotype-resolved assembly using PacBio HiFi sequencing reads. He believes that this idea was best conveyed by the Human Pangenome Reference Consortium (HPRC) when they wrote, “We no longer consider collapsed 3-Gbp genome assemblies as state of the art (i.e., one representation of an individual where both haplotypes are merged) but instead consider two genomes for every diploid genome assembled (i.e., 6 Gbp vs. 3 Gbp) where parental haplotypes are phased and fully resolved⁴.” Korlach elaborated that this presents the genome in its true diploid state as it exists within the cell. This representation improves the detection of small and structural variations, highlights epigenetic attributes like allele-specific methylation and chromatin-accessible areas, and enriches transcriptomics by uncovering allele-specific gene expression.

Hastie, while discussing advancements beyond OGM and long-read sequencing, pointed to the rise of various complementary technologies that facilitate genome phasing. These include linked-read sequencing, Hi-C-based conformation capture sequencing, Strand-seq, and trio sequencing. Like the HPRC, he stressed that the phasing of distinct haplotypes carries the potential for capturing organisms’ full genomic diversities more accurately by avoiding the collapse of two alleles into a single hybrid allele. “At scale, this pangenome view informs population genomics and enables applications from human health to conservation genomics,” he said.

While there have been many advances in recent years, Korlach emphasized that some of the most exciting have been on the data analysis side. “Fully haplotype-resolved datasets have enabled the construction of large pangenomes that better catalog the genetic diversity within populations. For example, the recent releases of the first draft human pangenome⁵ and a first regional Chinese pangenome⁶ dataset, alongside a pangenome bioinformatics tool kit⁷, have shown improvements in reference-based sequence mapping and variant calling workflows and will hopefully replace current workflows using a single reference genome such as GRCh38 in the future.”

Looking ahead, Korlach envisions a future where long-read sequencers like the Revio sequencing system play a pivotal role. These instruments could accelerate the construction of larger and more comprehensive pangenome datasets, and by enabling researchers to scale their long-read workflows, they will be able to generate more haplotype-resolved assemblies.

References
1. Yu W, Luo H, Yang J, et al. Comprehensive assessment of eleven de novo HiFi assemblers on complex eukaryotic genomes and metagenomes. bioRxiv. Published online 2023. doi:https://doi.org/10.1101/2023.06.29.546998
2. Holt JM, Saunders CT, Rowell WJ, Kronenberg Z, Wenger AM, Eberle M. HiPhase: Jointly phasing small and structural variants from HiFi sequencing. bioRxiv. Published online 2023. doi:https://doi.org/10.1101/2023.05.03.539241
3. Chen X, Harting J, Farrow E, et al. Comprehensive SMN1 and SMN2 profiling for spinal muscular atrophy analysis using long-read PacBio HiFi sequencing. The American Journal of Human Genetics. 2023;110(2):240-250. doi:https://doi.org/10.1016/j.ajhg.2023.01.001
4. Porubsky D, Vollger MR, Harvey WT, et al. Gaps and complex structurally variant loci in phased genome assemblies. Genome Research. 2023;33:496-510. doi:https://doi.org/10.1101/gr.277334.122
5. Liao W, Asri M, Ebler J, et al. A draft human pangenome reference. Nature. 2023;617(7960):312-324. doi:https://doi.org/10.1038/s41586-023-05896-x
6. Gao Y, Yang X, Chen H, et al. A pangenome reference of 36 Chinese populations. Nature. 2023;619(7968):112-121. doi:https://doi.org/10.1038/s41586-023-06173-7
7. Chin C, Behera S, Khalak A, et al. Multiscale analysis of pangenomes enables improved representation of genomic diversity for repetitive and clinically relevant genes. Nature Methods. 2023;20(8):1213-1221. doi:https://doi.org/10.1038/s41592-023-01914-y
Tags: long reads, optical mapping, phasing
Please sign into your account to post comments.

Choosing Between NGS and qPCR

by seqadmin

Next-generation sequencing (NGS) and quantitative polymerase chain reaction (qPCR) are essential techniques for investigating the genome, transcriptome, and epigenome. In many cases, choosing the appropriate technique is straightforward, but in others, it can be more challenging to determine the most effective option. A simple distinction is that smaller, more focused projects are typically better suited for qPCR, while larger, more complex datasets benefit from NGS. However,...
- Channel: Articles
10-18-2024, 07:11 AM
Non-Coding RNA Research and Technologies

by seqadmin

Non-coding RNAs (ncRNAs) do not code for proteins but play important roles in numerous cellular processes including gene silencing, developmental pathways, and more. There are numerous types including microRNA (miRNA), long ncRNA (lncRNA), circular RNA (circRNA), and more. In this article, we discuss innovative ncRNA research and explore recent technological advancements that improve the study of ncRNAs.

Nobel Prize for MicroRNA Discovery
This week,...
- Channel: Articles
10-07-2024, 08:07 AM
Recent Developments in Metagenomics

by seqadmin

Metagenomics has improved the way researchers study microorganisms across diverse environments. Historically, studying microorganisms relied on culturing them in the lab, a method that limits the investigation of many species since most are unculturable¹. Metagenomics overcomes these issues by allowing the study of microorganisms regardless of their ability to be cultured or the environments they inhabit. Over time, the field has evolved, especially with the advent...
- Channel: Articles
09-23-2024, 06:35 AM

Small Blood Stem Cell Subset Linked to Immune System Aging

by seqadmin

A recent study published in Cellular & Molecular Immunology suggests that a small subset of blood stem cells plays a key role...
- Channel: News
Today, 06:58 AM
New AI Model Designs Synthetic DNA Switches for Targeted Gene Expression in Specific Cell Types

by seqadmin

A collaboration between scientists at The Jackson Laboratory (JAX), the Broad Institute of MIT and Harvard, and Yale University...
- Channel: News
Yesterday, 08:43 AM
Microbes in Urban Spaces Adapt to Disinfectants and Scarce Resources

by seqadmin

Urban environments, defined by their dense infrastructure and human activity, may be influencing the evolution of microorganisms...
- Channel: News
10-17-2024, 07:29 AM
Genetic Barcodes and Single-Cell Sequencing Illuminate Tumor Initiation and Chemoresistance in Breast Cancer

by seqadmin

A team of researchers at the Istituto Italiano di Tecnologia (IIT) in Milan has developed a novel approach to pinpoint cells responsible for initiating...
- Channel: News
10-15-2024, 06:35 AM

Seqanswers Leaderboard Ad

Announcement

Modern Methods for Phased Genomes

Modern Methods for Phased Genomes

About the Author

Latest Articles

ad_right_rmr

News