Over the course of the seven Star Wars films, an intergalactic tale full of vivid, mythical characters has played out, each with their own complex backstories. This rich detail means that even die-hard fans can have trouble making sense of it all. With this in mind, a group of researchers at Ecole Polytechnique Federale De Lausanne (EPFL) in Switzerland, led by Ph.D. student Kirell Benzi, decided to attempt to categorize this forest of information by developing a series of novel algorithms.
The algorithms, known as “web scrapers,” were designed to pick up on unique characters within the contents of Wookipedia, a Star Wars version of Wikipedia. Information about when they lived during the history of the saga, and any major events that happened throughout their lives, was absorbed.
By putting this information into graph form, the researchers uncovered a wealth of new data that was previously hidden. Significantly, web scrapers carefully looked for all the times when two or more characters crossed paths, either directly or indirectly. Considering the saga spans 36,000 years of time, this was no easy task.
The most prominent species within the Star Wars universe. Kirell Benzi
Remarkably, there are 21,647 characters, with 19,612 of them having names and some degree of backstory at the very least. Out of these characters, it appears, perhaps unsurprisingly, that 78 percent of them are human. Twi’Leks, Rodians and Wookiees make up the 1st, 2nd and 3rd most populous galactic minorities.
All of these lifeforms are spread over 294 planets, over which 1,367 Jedi and 724 Sith have at some point lived. For much of the younger part of the timeline, there were only two Sith simultaneously alive – a master and an apprentice – so it’s not surprising that, overall, they’re outnumbered.
Marking the connections between all the various factions within the Star Wars universe. The larger the node, the more characters that belonged to it. Kirell Benzi
Some of the characters do not have a designated timeline, but as the algorithms are able to work out who they most likely interacted with, they can be placed at their probable points within the saga’s history. Out of all these characters, 7,500 play an “important role,” in that they are highly connected to the lives of others.
Anakin Skywalker, the future Darth Vader, is the most connected character, having crossed paths or influenced the lives of over 1,600 other characters. His master, Emperor Palpatine, comes in a close second, with the now-elusive Jedi Knight Luke Skywalker coming in third.
Left: Mapping connections between those in the Rebellion Era (blue), the Rise of the Galactic Empire era (red), and those that belong to both (green). Black nodes are characters with uncertain timelines. Right: The black nodes have been placed into various timelines based on their strongest connections. Kirell Benzi
On a more serious note, the researchers wanted to use this study to demonstrate how their algorithms could quickly extract and analyze data, while simultaneously finding all the possible connections between the data points. Although Wookipedia contains 125,455 highly detailed articles, the major part of this analysis took only two days to complete.
As Benzi told IFLScience, using this technique is already providing real-world applications. “We are working with neuroscientists to apply these [algorithms] to diagnose diseases and to better understand how the brain works,” he said.
Xavier Bresson, another project member, said in a statement that “once enough documents and archives have been digitized, this method could be useful in filling knowledge gaps that remain in historical and sociological research, and in numerous scientific fields as well.”