Scientists Show ChatGPT Can Pass Tests For Medical Qualifications

"ChatGPT is now comfortably within the passing range."

James Felton

Senior Staff Writer

James is a published author with multiple pop-history and science books to his name. He specializes in history, space, strange science, and anything out of the ordinary.View full profile

James is a published author with multiple pop-history and science books to his name. He specializes in history, space, strange science, and anything out of the ordinary.

View full profile

A team of researchers has tested ChatGPT, an artificial intelligence (AI) chatbot, on its clinical reasoning skills using questions from the United States Medical Licensing Examination (USMLE).

The team, publishing their results on preprint server medRxiv, wrote that they chose to test the generative language AI on questions from the USMLE as it was a "high-stakes, comprehensive three-step standardized testing program covering all topics in physicians’ fund of knowledge, spanning basic science, clinical reasoning, medical management, and bioethics".

The language model, trained on massive amounts of text from the internet, was not trained on the version of the test used by the researchers; nor was it given any supplementary medical training prior to the study, which saw it answer a number of open-ended and multiple choice questions.

"In this present study, ChatGPT performed at >50% accuracy across all examinations, exceeding 60% in most analyses," the team wrote in their study.

"The USMLE pass threshold, while varying by year, is approximately 60%. Therefore, ChatGPT is now comfortably within the passing range. Being the first experiment to reach this benchmark, we believe this is a surprising and impressive result."

The team write that the performance of the AI could be improved with more prompting and interaction with the model. Where the AI performed poorly, providing answers that were less concordant, they believe it was partly due to missing information that the AI has not encountered.

However, they believe that the OpenAI bot had an advantage over models trained entirely on medical text, as it got more of an overview of the clinical context.

"Paradoxically, ChatGPT outperformed PubMedGPT (accuracy 50.8%, unpublished data), a counterpart [language learning model] with similar neural structure, but trained exclusively on biomedical domain literature," the team wrote in their discussion.

"We speculate that domain-specific training may have created greater ambivalence in the PubMedGPT model, as it absorbs real-world text from ongoing academic discourse that tends to be inconclusive, contradictory, or highly conservative or noncommittal in its language."

The team writes that AI may soon become commonplace in healthcare settings, given the speed of progress of the industry, perhaps by improving risk assessment or providing assistance and support with clinical decisions.

The study is published on preprint server medRxiv. It has not yet been peer-reviewed.

ChatGPT Can Pass Part Of The US Medical Licensing Exam

"ChatGPT is now comfortably within the passing range."

One Year On, The World’s Largest Sand Battery Is Showing How A New Way Is Possible

On July 17, 1984, Millions Of People Tuned In To Watch A Train Crash Live On Television. It Was No Ordinary Accident

Anthropic Has Publicly Released A Version Of The Mythos AI They Suggested Was Too Risky To Publicly Release

Should You Clone Your Pet? Find Out More In Issue 47 Of CURIOUS – Out Now

Are There Plants That Can Photosynthesize By Moonlight? | IFLScience We Have Questions

Do Other Animals Have Pets? Find Out More In Issue 46 Of CURIOUS – Out Now

Thank you!

Can't find the email?