In 2022, the first fully complete human genome with no gaps was revealed, marking a huge moment for human genetics. On release to the public, scientists described the painstaking work that goes into sequencing an over 6 billion base pair genome, with 200 million added in this new research. The new genome added 99 genes likely to code for proteins and 2,000 candidate genes that were previously unknown. Now, it’s available for all to view.
Many will be asking: "Wait, didn’t we already sequence the human genome?" In part, yes – in 2000, the Human Genome Sequencing Consortium published their first drafts of the human genome, results that subsequently paved the way for almost every facet of human genetics available today.
The most recent draft of the human genome has been used as a reference since 2013. But weighed down by impractical sequencing techniques, these drafts left out the most complex regions of our DNA, which make up around 8 percent of the total genome. This is because these sequences are highly repetitive and contain many duplicated regions – attempting to put them together in the right places is like trying to complete a jigsaw puzzle where all the pieces are the same shape and have no image on the front. Long gaps and underrepresentation of large, repeating sequences made it so that this genetic material has been excluded for the past 20 years. Scientists had to come up with more accurate methods of sequencing to illuminate the darkest corners of the genome.
“These parts of the human genome that we haven’t been able to study for 20-plus years are important to our understanding of how the genome works, genetic diseases, and human diversity and evolution,” said Karen Miga, assistant professor of biomolecular engineering at UC Santa Cruz, when the results were published in Science last year.
Much like the Human Genome Sequencing Consortium, the new reference genome (called T2T-CHM13) was produced by the Telomere-2-Telomere Consortium, a group of researchers dedicated to finally mapping each chromosome from one telomere to the other. T2T-CHM13 is now available on the UCSC Genome Browser for everyone to enjoy, complimenting the standard human reference genome, GRCh38.
The new reference genome was created using two modern sequencing techniques, called Oxford Nanopore and PacBio HiFi ultra-long read sequencing, which massively increases the length of DNA that can be read while also improving the accuracy. Through this, they could sequence strings of DNA previously unreadable by more rudimentary techniques, alongside correcting some structural errors that existed in the previous reference genomes.
Looking to the future, the consortium hopes to add even more reference genomes as part of the Human Pangenome Reference Consortium to improve diversity in human genetics, something sorely lacking at present.
“We’re adding a second complete genome, and then there will be more,” said David Haussler, director of the UC Santa Cruz Genomics Institute.
“The next phase is to think about the reference for humanity’s genome as not being a single genome sequence. This is a profound transition, the harbinger of a new era in which we will eventually capture human diversity in an unbiased way.”
An earlier version of this article was published in March 2022.