Researchers Find Way To Make Chat GPT-4 30 Percent More Accurate

"It’s not everyday that humans develop novel techniques to achieve state-of-the-art standards using decision-making processes once thought to be unique to human intelligence. But, that’s exactly what we did."

James Felton

Senior Staff Writer

James is a published author with multiple pop-history and science books to his name. He specializes in history, space, strange science, and anything out of the ordinary.View full profile

James is a published author with multiple pop-history and science books to his name. He specializes in history, space, strange science, and anything out of the ordinary.

View full profile

A team of researchers may have found a way of improving large language model (LLM) chatbots, including improving ChatGPT-4's accuracy by around 21 percent. In a new preprint paper, yet to be peer-reviewed, the team explains how they achieved it: allowing artificial intelligence (AI) agents to reflect on their own mistakes.

The team used a process called Reflexion, which "endows an agent with dynamic memory and self-reflection capabilities to enhance its existing reasoning trace and task-specific action choice abilities", according to their paper.

"Human intelligence is notable for its ability to learn from mistakes," the team explained on Substack. "We often don't solve problems on our first try, but when we make mistakes we generate new ideas to refine our approach through self-reflection, through analyzing our missteps."

They tried to replicate this to an extent, by allowing the AI agents to analyze their own actions and mistakes. In the research, AI agents were challenged to solve various problems, from coding to a trial in AlfWorld, a text-based environment that is used to train and test AI agents. In AlfWorld, the agent was asked to complete a number of tasks, but the only way to do so was to learn about its environment through text and be rewarded with observations, like in a text adventure game.

While running the agent in AlfWorld without the reflective technique, it achieved 63 percent accuracy. When the agent was given the ability to reflect on its actions and mistakes, it was able to achieve 97 percent accuracy, solving 130 out of 134 tasks.

In one of these tasks, natural language AI was asked to find the answer to the question "Grown-Ups starred the actor who was best known for which role on 'Allo ’Allo!?" The language model first searched for Grown Ups to view a cast list, and then ’Allo ’Allo! to cross-reference. After failing to get the cast list it needed, it failed the task too.

"I searched the wrong title for the show, ’Allo ’Allo!," the AI explained its reflection process, "which resulted in no results. I should have searched the show’s main character, Gorden Kaye, to find the role he was best known for in the show."

After applying this reflective model, it was given the task again. This time it applied what it learned and finished the task in fewer steps, getting the answer correct.

These AI agents were all powered using ChatGPT-3 and GPT-3.5. In an update, the team used an agent based on ChatGPT-4, and found that when using Reflexion, the AI scored 88 percent accuracy on coding tasks, compared to 67 percent when ChatGPT-4 acted alone.

The paper is published on the preprint server arXiv.

Researchers Improve ChatGPT By Getting It To Learn From Its Own Mistakes

"It’s not everyday that humans develop novel techniques to achieve state-of-the-art standards using decision-making processes once thought to be unique to human intelligence. But, that’s exactly what we did."

AI Is Easily Tricked Into Claiming It Has Found Alien Life – This Could Cause Havoc In The Future

With "SpudCell", Scientists Have Made The Most Sophisticated Attempt At Creating An Artificial Lifeform Yet

Nuclear Fusion Startup Transforms Plasma Energy Directly Into Electricity In First For Private Sector

Could AI Find A Cure For Cancer? | IFLScience The Big Questions

What Is Archaeoastronomy? Find Out More In Issue 48 Of CURIOUS – Out Now

Where Is The Human Heart Located? | IFLScience We Have Questions

Thank you!

Can't find the email?