Typing out semi-illegible tweets while you stumble around trying to find your Uber is the go-to end to a late night out in 2016. However, those sloshed tweets could actually be used for some good, thanks to this machine learning-based algorithm.
A group of computer scientists from the University of Rochester has developed a machine-learning algorithm that they've trained to detect inebriated tweets. Their study was recently published in the journal arXiv.
The team analyzed over 11,000 geotagged tweets posted in New York City and Monroe County between July 2013 and July 2014. From this selection, they filtered all the posts that mention alcohol-related buzzwords, which included “drunk,” “tequila,” “beer," “hammered,” and “get wasted.” By giving different values and “weights” to each keyword, the computer can see if alcohol-drinking is actually mentioned, while remaining cautious around misleading words such as “shot,” “party,” or “club,” which might not necessarily be about drinking.
Using this composite of drunken data and further analysis of the words in the tweets, the machine is then able to decipher whether or not the post is about the Twitter-user themselves being drunk, and then if the Twitter-user was actually drinking at the time of tweeting. As the tweets were geotagged, the computer was also able to find the location of where the tweeter had been drinking.
“We can analyze human mobility patterns; we can study the relationship between demographics, neighborhood structure and health conditions in different zip codes, thus understanding many aspects of urban life and environments,” the researchers wrote. “Research in these areas and alcohol consumption is mainly based on surveys and census, which are costly and often incur a delay that hamper real-time analysis and response. Our results demonstrate that tweets can provide powerful and fine-grained cues of activities going on in cities.”
The computer scientists hope that this algorithm can be used as a tool to address alcohol consumption through public policy, as well as provide a model for future health-related censuses and surveys.
(H/T: MIT Technology Review)