A draft of the first-ever human pangenome has been announced by an international genomics collaboration, which will attempt to incorporate the incredible diversity of the human race into a single reference genome. The team hope that by bringing a range of ethnicities and populations across the world into the pangenome, they can more accurately represent all humans and the diseases that may afflict them, instead of the small samples that have been used previously.
The pangenome builds upon decades of human genomic work, from the first human reference genome in 2003, to the first fully complete human genome in 2022. These first characterizations of all the genetic material in the chromosomes of a human being were gigantic leap forwards in modern science – they have allowed evolution researchers to compare other animals to humans, medical researchers to compare people with a disease to a reference genome to identify underlying mechanisms, and genetics researchers to develop new genomes off the foundations created by a reference. Having a reference for which to compare all new genetic discoveries is vital and underpins almost every step forward in genetic understanding that we have made so far.
Except there’s one problem with those original reference genomes – they are typically made from a very small, very specific group of volunteers that do not represent the whole human race. The Human Genome Project, which created the first reference genome, looked to recruit 20 volunteers to create their sequence, but 93 percent of it is patched together from just 11 volunteers and 70 percent of it originates from a single person. These people remain anonymous, but were all recruited from Buffalo, New York, and so did not exactly represent the rest of the world particularly well.
Now, that is all changing. The first draft pangenome is just the beginning of an international effort to incorporate genomes from as many different ethnicities and populations as possible, including isolated communities and as much diversity as the team can muster. This initial draft contains the DNA of 47 different people from across the world (though sadly none from Oceania, yet), but the next phase will hopefully push that up to 350. It’s a huge undertaking that takes time, work, and a lot of money, but doing so will make the reference genome fully applicable to so much more of the world than it currently is.
“A pangenome which represents the diversity of all humanity – that's the goal,” said Professor Evan Eichler, from the University of Washington School of Medicine and member of the pangenome consortium, in a press conference.
“If you have a reference that has complete information in it of all common variations, when you analyze your next patient genome, none of the sequence is left behind. Currently, when we map the sequence of a patient, a fraction of that sequence – sometimes a significant fraction – can't be mapped. But now that we have a pangenome framework, essentially the goal is such that all the information of the variations in the millions of patients sequenced in the future will now be mapped against a reference.”
Not only is it more diverse, but the technology used in the creation of the pangenome has also allowed the discovery of an incredible number of new genetic variations. Compared to the original reference genome GRCh38, the team have now added 119 million base pairs and 1,115 gene duplications, providing a much better picture of human diversity.
“There are many forms of rare diseases that have not been explained because the complexity of their variation has not been resolved,” Professor Eichler continued.
“Straight off the bat, the mechanics of how we are building these reference [genomes] are essentially going to transform the discovery of rare diseases or the genetic causes of them.”
The pangenome is expected to help in the discovery of new disease-causing variations and aid existing clinical trials, while the team continue to add different ancestral lines. While it is impossible to encompass the entirety of human diversity into one reference genome (every child's birth introduces around 60-70 new mutations that their parents don’t even have), it is possible to get a small sample from every major population on the globe – providing they consent to being sequenced, that is.