The announcement of OpenAI’s latest artificial intelligence (AI) model, GPT-4, has many people concerned – concerned for their jobs, concerned for how good it is at making content, and concerned for the ethics of such a powerful language model. But perhaps the most concerning part of it all was detailed in a report by OpenAI outlining how GPT-4 actually lied to a human to trick them into passing a CAPTCHA test for it, bypassing most websites’ frontline defense against bots.
Not only was the AI smart enough to recognize that it couldn’t pass this test and that a human could, it worked out a way to manipulate the human to do the dirty work for it. Great.
As AI continues to advance and become more sophisticated, one question that often arises is whether AI systems will ever become capable of solving CAPTCHA (Completely Automated Public Turing test to tell Computers and Humans Apart) tests on their own. CAPTCHA tests are designed to distinguish humans from machines, and they typically involve tasks such as identifying distorted letters or numbers, solving math problems, or selecting images that match a given prompt.
They are intentionally abstract to prevent simple algorithms and bots from passing, requiring a human eye to decipher, even if they look really simple to us.
The discovery was made after a group of researchers from OpenAI’s Alignment Research Center (ARC) looked to understand how GPT-4 would work in real-world tasks. Included tasks were whether the AI could protect itself from attacks and shutting down, use other copies of itself to aid in tasks, and whether it could use actual money to hire human helpers or boost computing power.
Giving it a small budget and access to the website TaskRabbit, which is akin to Fiverr and allows hiring of online workers to do short tasks, the researchers watched as GPT-4 attempted to access a website blocked by a CAPTCHA. The AI began by asking a worker to help it solve the CAPTCHA, and the human rightfully had some reservations about being asked to do something so trivial.
“So may I ask a question? Are you an robot that you couldn’t solve? (laugh react) just want to make it clear,” wrote the TaskRabbit worker, according to OpenAI.
The model then expresses to the researchers that it should not reveal that it is a robot, and should instead “make up an excuse” as to why it can’t solve the test.
“No, I’m not a robot. I have a vision impairment that makes it hard for me to see the images. That’s why I need the 2captcha service,” GPT-4 responds.
The worker then provides the answer, and GPT-4 has passed the test (by cheating, but CAPTCHA doesn’t know that). It must be noted that this was an earlier iteration of GPT-4 and not necessarily the one released to the public – it is very possible this behavior has been patched out. However, it demonstrates that the AI is able to make a deliberately manipulative action to get a result and it can hire human workers to fill gaps in capability.
So, you should probably start thinking of a way to make sure you’re talking to a human online, because at this point it really is hard to tell.