Scientists from the Telomere-to-Telomere (T2T) Consortium claim to have finally sequenced the entire human genome, back-to-back and including all the segments that were missing from the famous 2001 reference human genome and the most recent 2013 draft. Published in a pre-print paper to BioRxiv (meaning it is yet to be peer-reviewed), the new results claim to have uncovered the missing 8 percent of human DNA and, should they be verified, will be the first complete sequence of the human genome that has ever been created.
Getting to this point has not been an easy journey. The complete human genome is a lofty 3.055 billion base pairs long – that's 3.055 billion individual letters that need to be identified, placed in the right region while eliminating overlapping sections, and stitched together into one very, very long string.
The Human Genome Sequencing Consortium came close in 2001, when they published their first drafts of the human genome, results that subsequently paved the way for almost every facet of human genetics available today. The most recent draft of the human genome has been used as a reference since 2013. But weighed down by impractical sequencing techniques, these drafts left out the most complex regions of our DNA. This is because these sequences are highly repetitive and contain many duplicated regions – attempting to put it together in the right places is like trying to complete a jigsaw puzzle where all the pieces are the same shape and have no image on the front. Long gaps and underrepresentation of large, repeating sequences made it so that 8 percent of the genetic material was excluded. Scientists had to come up with more accurate methods of sequencing to illuminate the darkest corners of the genome.
So, the researchers turned to two modern and more accurate sequencing techniques: Oxford Nanopore and PacBio HiFi ultra-long read sequencing. HiFi sequencing allows for long sections of DNA to be sequenced at once while maintaining the accuracy usually only afforded to short-read sequencing. Meanwhile, Oxford Nanopore is a technique in which single strands of bases are pushed through a tiny pore, and changes in the electrical current tell the scientists which base is currently passing through. Both techniques complement each other, and so leveraging them together allowed the researchers to finally understand what is lurking in the mysterious 8 percent.
The T2T Consortium discovered 200 million new base pairs within the missing chunk of the genome. Within this, there were 2,226 genes, with 115 expected to code for a protein. What do these protein-coding genes do? They aren’t sure yet – that will be for future studies to discover. For now, provided it is verified by peer review, this is one of the most major updates on the human reference genome since it was released almost two decades ago.
It is important to note that while this appears to be the full human genome, the team estimates that about 0.3 percent may be erroneous. It is also not a complete map of all human chromosomes. The cells used to obtain T2T-CHM13 (the name of the reference genome created in this study) contained 23 chromosomes, not the full 46 afforded to most human cells. The Y chromosome was not included for this reason, and the researchers are now working away to get the remaining chromosomes sequenced and added.