If you’ve ever played the game GeoGuessr, you’ll know how hard it is to locate somewhere purely by an image. The online game plonks you in a random location on Google Street View and it’s your job to determine where you are anywhere in the world only using the visible clues. But even as spatial creatures, us humans find it hard to tell our Australian outback from our South African bush.
Now, Google has developed an artificial intelligence machine capable of “superhuman levels of accuracy” when it comes to guessing a location based on just the pixels of a photograph.
Tobias Weyand, one of the brains at Google, and a team who specialize in developing “computer vision” have developed PlaNet – a deep neural network capable of determining where an image was taken, purely on its pixel.
To develop the computer’s “spatial memory” the team fed in 91 million geotagged images. Using this data, they chopped up a map into a grid consisting of over 26,000 squares, with the size of the squares proportional to how many of the images had been shot there. Big cities with highly concentrated populations tended to be made up of more squares compared to rural areas. As such, there weren’t enough images to include the ocean or the polar regions.
Throughout this process, the deep neural network picks up on trends and visual cues that appear in photographs in the same grid such as color, texture, and shape. For example, the network could pick up on the distinctive red and brown hues of the Grand Canyon’s rock from the countless photographs of the area.
Some of PlaNet's guesses. Image credit: Google/Tobias Weyand et al.
To test out the neural network, they gave it 2.3 million geotagged photographs from Flickr to see if it could guess where the image was taken. The computer correctly guessed the continent of 48 percent of images, and determined the country of 28.4 percent of them. On top of this, the study claims it can "localize 3.6 percent of the images at street-level accuracy and 10.1 percent at city-level accuracy.”
While this might not sound massively accurate, it's a lot better than humans. The team put PlaNet up against “10 well-traveled human subjects” in a game of GeoGuessr. PlaNet won 28 times out of 50 with an average error of 1,131.7 kilometers (703 miles), while the humans were off-target by an average of 2320.75 kilometers (1442 miles).
Remarkably, the researchers claim PlaNet even has the ability to determine the location of images of indoor environments by picking up on specific items, MIT Technology Review reports.
You might think humans would have an advantage, as the study explains: “In the absence of obvious and discriminative landmarks, humans can fall back on their world knowledge and use multiple cues to infer the location of a photo.
“The language of street signs or the driving direction of cars can help narrow down possible locations. Traditional computer vision algorithms typically lack this kind of world knowledge, relying on the features provided to them during training.”
However, Weyand and co suggest PlaNet actually has the advantage as it has been fed more images of places than any human could ever travel and learned to recognize subtle cues from the millions of different scenes.
You can read a full write-up of their development in a study in arXiv.
[H/T: MIT Technology Review]