As we've learned (or apparently not) time and time again, AI and machine learning technology have a racism problem. From soap dispensers that don't register dark-skinned hands to self-driving cars that are 5 percent more likely to run you over if you are black because they don't recognize darker skin tones, there are numerous examples of algorithms that don't function as they should because they weren't tested enough with non-white people in mind.
Last year, one such algorithm with apparent bias drew attention after cryptographer and infrastructure engineer Tony Arcieri tried a simple experiment on Twitter. Arcieri took two photos: One of Barack Obama and one of Mitch McConnell. He then arranged them as below.
He then uploaded them to Twitter and clicked send tweet. At this point, the Twitter algorithm crops the photos automatically. The function is intended to select the most relevant part of the photograph to display to other users.
Here's what the algorithm selected when given those two photographs.
As you can see, the algorithm selected Mitch McConnell in both instances. Arcieri and others tried variations to see if the same result happened, including changing the color of their ties and increasing the number of Obamas within the images.
However, using a different photo of Obama with a high-contrast smile did seem to reverse the situation.
So, what's going on? Well, Twitter have now confirmed via research published on Arxiv that the problem was to do with their machine-learning systems which attempt to crop images for salience.
"The saliency algorithm works by estimating what a person might want to see first within a picture so that our system could determine how to crop an image to an easily-viewable size," Twitter wrote in an update on their investigations. "Saliency models are trained on how the human eye looks at a picture as a method of prioritizing what's likely to be most important to the most people. The algorithm, trained on human eye-tracking data, predicts a saliency score on all regions in the image and chooses the point with the highest score as the center of the crop."
In essence, it tries to predict the part of the photo that users will be most interested in, based on data gained from humans. Twitter's tests of their algorithm found significant gender and race-based biases, namely:
- In comparisons of men and women, there was an 8% difference from demographic parity in favor of women.
- In comparisons of black and white individuals, there was a 4% difference from demographic parity in favor of white individuals.
- In comparisons of black and white women, there was a 7% difference from demographic parity in favor of white women.
- In comparisons of black and white men, there was a 2% difference from demographic parity in favor of white men.
Moreover, the team looked into another alleged problem with the algorithm: when cropping photographs of women, the algorithm tended to focus on their chests. However, the researchers found that this didn't happen at a "significant" rate. For every 100 images, about three cropped at a location other than the head.
In order to address the biases involved in the algorithm, Twitter decided to get rid of it entirely in a rollout earlier this month.
"We considered the tradeoffs between the speed and consistency of automated cropping with the potential risks we saw in this research," the team said. "One of our conclusions is that not everything on Twitter is a good candidate for an algorithm, and in this case, how to crop an image is a decision best made by people."
A version of this article first appeared in September 2020, it has been updated with Twitter's comments following their new research.