DNA is a powerful tool to understand our history. Coded into its twisted molecules are insights into the migration of our species, where we came from, and how we came to be where we are today.
Humans may have a deep-seated fascination with their past, but untangling its history requires gobs of data and plenty of patience.
To investigate the fine-scale population structure in the United States, the researchers used one of the largest human genetic data sets assembled to date. The study is published in Nature Communications.
The team analyzed genotype data from over 777,000 individuals in order to paint a genetic portrait of how North America’s population migrated over the last few hundreds of years.
The huge population of participants were gained through DNA-testing company Ancestry, who have over 3 million customers in their AncestryDNA database. Each of these individuals has spit in a tube, shook it, and sent it to the lab for processing – that’s a lot of saliva.
With all this information, the researchers made a network of genetic connections and then used various analysis techniques to identify clusters of individuals. By using 20 million user-generated genealogical records, they next annotated these clusters to infer patterns of migration and settlement.
The use of genetics to track human migration history has been done before. However, this large-scale study focused on recent history, with a particular emphasis on specificity.
For example, the study can pinpoint a particular region of France that French Canadians came from generations ago, where they subsequently settled in Canada, and then how they spread to North America.
The pink dots are the French-Canadians and the blue dots are Acadians. Nature Communications
“We’ve never had a map like this of North America,” Eric Topol, a geneticist with Scripps who is not associated with the research, told Gizmodo. “This is a massive data set.”
The genetic landscape of post-colonial America has been shaped by many factors, including war, slavery, disease, and climate change. However, the researchers were surprised by one particularly large influence – the Mason-Dixon line.
“I have to admit I was surprised by that,” lead author Catherine Ball, chief scientific officer at Ancestry, told Wired. “This political boundary had the same effect as what you’d expect from a huge desert or mountain range.”
The Mason-Dixon line is a boundary line that was initially constructed to settle an 80-year land dispute between two colonies. It then became a symbolic divider of slave states from non-slave states during the Civil War.
In addition to demographic insights, the research could contribute to targeted biomedical research. For instance, they found that in certain clusters, there are higher frequencies of certain disease-risk variants.
The African American cluster, for example, has a risk allele for prostate cancer that has a frequency of 5.6 percent, but is rare outside the cluster at only 0.1 percent. The Finnish have a protective allele for squamous cell lung carcinoma that is 10 times more common in their cluster.
Ancestry hopes to use this data to provide even further specificity to users. The study is also a reminder in these politically divided times that we have all come from somewhere else.
These genetic clusters throughout North America reveal the distribution of ancestral birth locations. The clusters are separated into two maps for clarity. Nature Communications