OpenAI has released a new version of ChatGPT, claiming that the new language learning model is capable of passing – and even excelling in – a variety of academic exams.
ChatGPT-4, which will be available on Bing as well as the OpenAI website, is more reliable and more creative than its predecessor, according to OpenAI. The team tested the model on a number of exams designed for humans, from the bar exam to biology, using publicly available papers. While no additional training was given to the model ahead of the tests, it was able to perform well on most subjects, performing in the estimated 90th percentile for the bar exam and the 86th-100th in art history.
Just as the previous model was accused of being bad at math, this version struggled more with calculus, scoring in the 43rd-59th percentile.
For casual users who just want a creative bot to chat with, the main advantage of this over previous models is the creativity of it, and its ability to better understand user instructions.
In a demonstration on their website, the chatbot was asked to summarize the plot of Cinderella in one sentence, but each word had to begin with the next letter in the alphabet. Impressively, given the constrictions of the brief, it came back with: "A beautiful Cinderella, dwelling eagerly, finally gains happiness; inspiring jealous kin, love magically nurtures opulent prince; quietly rescues, slipper triumphs, uniting very wondrously, xenial youth zealously."
The model is also able to process images. In an impressive demonstration, GPT-4 was able to take a crude hand-drawn image of a website and then create that website in html.
While the text versions of Chat GPT-4 are available to users, the image input feature is only available to subscribers at the moment.
Elsewhere in the demonstration, the model did the taxes of a fictional couple, and even wrote its own bot. Though impressive, it's still important to remember that the bot is still capable of hallucinations, and is only trained on datasets and text from before September 2021, so is unaware of any updates that have happened since then. Which, in a way, is a blessing.