The first complete and gapless assembly of the Y chromosome was finalized, including annotations of its gene, repeat, and organizational structure. The Y chromosome was the last full chromosome to be completed due to difficulties from its complex repeat regions. In the previous GRCh38 reference sequence, over half of the chromosome was absent. This work was completed by the Telomere-to-Telomere (T2T) consortium, which also produced the first complete assemblies of chromosome X and 8 in recent years.
Sequencing of the chromosome’s most difficult regions was enabled by utilizing PacBio’s high-fidelity (HiFi) reads and Oxford Nanopore Technologies’ (ONT) ultra-long reads, combined with refinements that included Illumina’s shorter reads. The new assembly, referred to as T2T-Y, is a 62,460,029 base pair sequence that added more than 30 million base pairs to the reference. T2T-Y was combined with the T2T-CHM13 assembly to create a new reference, T2T-CHM13+Y, to provide a comprehensive reference that includes all human chromosomes.
In addition to completing the Y chromosome, the study included several other accomplishments. Some notable results include improving the detection of human contamination within genomic databases and reducing false-positive variant calls in XY-bearing samples due to inaccuracies in the previous Y chromosome reference (GRCh38Y). The study also informed development of the Verkko assembler, a tool for diploid human genome assembly that automates the integration of HiFi and ONT reads.
The completion of the Y chromosome leads the way for larger projects that will analyze hundreds of human samples, such as the Human Pangenome Reference Consortium. Furthermore, this study improves upon sequencing and assembling methods for complex regions of genomes.
Although the results have not been officially peer-reviewed, a preprint describing this work was recently released. Read the full details in the current preprint here.
Sequencing of the chromosome’s most difficult regions was enabled by utilizing PacBio’s high-fidelity (HiFi) reads and Oxford Nanopore Technologies’ (ONT) ultra-long reads, combined with refinements that included Illumina’s shorter reads. The new assembly, referred to as T2T-Y, is a 62,460,029 base pair sequence that added more than 30 million base pairs to the reference. T2T-Y was combined with the T2T-CHM13 assembly to create a new reference, T2T-CHM13+Y, to provide a comprehensive reference that includes all human chromosomes.
In addition to completing the Y chromosome, the study included several other accomplishments. Some notable results include improving the detection of human contamination within genomic databases and reducing false-positive variant calls in XY-bearing samples due to inaccuracies in the previous Y chromosome reference (GRCh38Y). The study also informed development of the Verkko assembler, a tool for diploid human genome assembly that automates the integration of HiFi and ONT reads.
The completion of the Y chromosome leads the way for larger projects that will analyze hundreds of human samples, such as the Human Pangenome Reference Consortium. Furthermore, this study improves upon sequencing and assembling methods for complex regions of genomes.
Although the results have not been officially peer-reviewed, a preprint describing this work was recently released. Read the full details in the current preprint here.