Since the first complete characterization of the human genome in 2003, understanding of our DNA and how it varies between each person has become magnitudes better. We use the reference genome to look for disease variants, discover gene functions, and act as a scaffold to sequence other large pieces of DNA – mapping out the location of our genes within the chromosomes is vital to genetics. So it may surprise you to know that our current reference has a lot of gaps in it.
That is, until now. In a paper published by Nature on Wednesday, researchers have reached a huge milestone in genetics by producing the first-ever end-to-end (telomere-to-telomere) sequence of the human X chromosome. The researchers sequenced the entire 155 million base pair X chromosome, even managing to sequence highly repetitive regions that weren’t previously possible.
The team, led by Karen Miga from UC Santa Cruz Genomics Institute, used a combination of sequencing techniques to complete the chromosome and said the key to their success was the use of modern ultra-long read nanopore sequencing. Traditional sequencing technology chunks the DNA into lots of tiny fragments, before piecing them together like the world’s most complicated jigsaw puzzle. This works for the most part, but if there are bits of DNA that are extremely similar to each other, the sequencing software can struggle to fit them into the right place. Some regions of the chromosome are made up of huge amounts of repetitive DNA, and researchers in the past haven’t been able to get accurate maps of them.
"These repeat-rich sequences were once deemed intractable, but now we've made leaps and bounds in sequencing technology," Miga said in the press release.
The development of ultra-long-read nanopore sequencing has since improved this. By squeezing DNA through a tiny pore and measuring the changes in current across the pore, the technology can read long pieces of DNA accurately and with fewer gaps.
"With nanopore sequencing, we get ultra-long reads of hundreds of thousands of base pairs that can span an entire repeat region, so that bypasses some of the challenges," Miga said. Nevertheless, there were still multiple gaps in the sequence that the team had to manually resolve.
A complete reference genome will now allow researchers to compare DNA samples of patients to the reference and identify genetic changes that can contribute towards disease.
"We're starting to find that some of these regions where there were gaps in the reference sequence are actually among the richest for variation in human populations, so we've been missing a lot of information that could be important to understanding human biology and disease," Miga said.
The new sequence fixes a series of gaps that exist in the current reference genome called the Genome Reference Consortium build 38 (GRCh38) and will aid in large-scale studies for the future. In the meantime, Miga and the Telomere-to-Telomere consortium aim to sequence all the chromosomes in a specific cell line (CHM13), opening new opportunities for genetic research and understanding our genome as a whole.
However, challenges remain in applying these approaches to the rest of the genome. For example, in diploid samples (samples with two copies of each chromosome per cell), it will be difficult to prevent similar regions on each chromosome from interfering with the sequencing. The T2T consortium hope to develop the existing technology further to complete the entire genome.