AI models rival doctors on complex medical reasoning tasks, study finds
Researchers have found that an AI model outperformed human doctors on most medical reasoning tasks, from diagnoses to patient management advice.
Artificial intelligence models outperformed physicians in emergency care medical decisions, according to a new study.
Researchers at Harvard Medical School and Beth Israel Deaconess Medical Center in the United States compared artificial intelligence and physicians across a wide range of clinical reasoning tasks.
They found that large language models (LLMs) outperformed physicians across several tasks, including making emergency-room decisions based on the available information, identifying likely diagnoses, and choosing the next steps in management.
”We tested the AI model against virtually every benchmark, and it eclipsed both prior models and our physician baselines,” said Arjun Manrai, co-senior author and professor at Harvard Medical School.
“However, this does not mean AI will necessarily improve care – how and where it should be deployed remain understudied, and we desperately need rigorous prospective trials to evaluate the impact of AI on clinical practice.”
How was the AI model tested?
The researchers first evaluated o1-preview, OpenAI’s reasoning model released in 2024, to which they gave a range of clinical cases, including published case conferences and real-world emergency department records.
AI outperformed human physicians across most experiments, especially in management reasoning, clinical reasoning, documentation, and real-world emergency settings with limited information.
“Models are increasingly capable. We used to evaluate models with multiple-choice tests; now they are consistently scoring close to 100%, and we can’t track progress anymore because we’re already at the ceiling,” said co-first author Peter Brodeur, HMS clinical fellow in medicine at Beth Israel Deaconess.
RelatedIn one test, researchers asked the LLM –o1, and GPT-4o– to evaluate patients at various points in a standard emergency department setting, ranging from early triage to later admission decisions.
At each stage, the model was given only the information available at that point and asked to generate likely diagnoses and recommend what should happen next.
The biggest gap between AI and human physicians was in the triage stage, at which the patient’s information is more limited.
As with human physicians, AI models improved their diagnostic abilities as more information became available.
“Although applying AI to assist with clinical decision support is sometimes viewed as a high-risk endeavour, greater use of these tools might serve to mitigate the human and financial costs of diagnostic error, delay, and lack of access,” the authors wrote.
RelatedMore research still needed
The researchers called for prospective trials to evaluate these technologies in real-world settings and for health care systems to invest in computing infrastructure and to develop frameworks that can support the safe integration of AI tools into clinical workflows.
“A model might get the top diagnosis right but also suggest unnecessary testing that could expose a patient to harm,” Brodeur said. “Humans should be the ultimate baseline when it comes to evaluating performance and safety.”
The study has some limitations. The authors noted the study reflects only model performance and primarily focuses on the preview version of the o1 model, which has since been supplanted by newer models such as OpenAI’s o3 model.
“Although we expect performance to be sustained or improved with newer models, further studies should be done to elucidate how performance varies across models and to study how humans and LLMs may collaborate,” the authors wrote.
Go to accessibility shortcuts Share CommentsRead more
What is the deadly hantavirus? From symptoms, risks and transmission
Схожі новини
Це дивовижно: ось що полуниця насправді робить з організмом
Не Ужгород і Мукачево: 6 небанальних локацій Закарпаття, які варто побачити у травні
Ворог бив по Чернігівській області: унаслідок обстрілу загорівся будинок, є постраждалі