A recent study explores how well large language models perform in medical settings, including emergency rooms. Interestingly, one model showed better accuracy than human doctors in some cases.
Published in Science, the research comes from a team of physicians and computer scientists at Harvard Medical School and Beth Israel Deaconess Medical Center. They aimed to see how OpenAI’s models stacked up against human physicians.
In their research, they looked at 76 patients in the emergency room. They compared the diagnoses made by two attending physicians to those generated by OpenAI’s models, o1 and 4o. Evaluators, who didn’t know which diagnoses were from humans and which were from AI, reviewed their accuracy.
The findings were striking. The o1 model achieved a close or exact diagnosis in 67% of triage cases, while one physician managed 55% and another 50%. This difference was most obvious during initial assessments, where urgency and available information create challenges.
Dr. Arjun Manrai, a leader in the study, noted, “We tested the AI model against almost every benchmark, and it outperformed previous models and our physician baselines.”
However, the study wasn’t a declaration that AI is ready to take over emergency rooms. It calls for further trials to test these technologies in real-world medical settings. The researchers pointed out that their tests were limited to text-based data and emphasized that current models may struggle to understand non-text inputs fully.
Dr. Adam Rodman, another author of the study, raised an important concern. He emphasized that there’s currently no accountability framework for AI diagnoses. Patients still want human guidance in critical health decisions.
Reactions from experts highlight the need for caution. Dr. Kristen Panthagani, an emergency physician, remarked that the study’s comparisons might be misleading. She stressed that comparing AI to internal medicine doctors isn’t the same as evaluating it against those specialized in emergency medicine. Her main goal as an ER doctor is to identify life-threatening conditions, rather than pinpointing exact diagnoses right away.
Currently, a growing number of medical professionals are skeptical about the capabilities of AI in real-life scenarios. As reported by the Journal of the American Medical Association, about 74% of doctors take a cautious approach to adopting AI in their practices, fearing potential consequences for misdiagnoses.
In short, while this study showcases AI’s potential, it also underscores the importance of human expertise in healthcare. Balancing technology and traditional medical practices could pave the way for better patient care, but the conversation is just beginning.
Source link
beth israel,harvard medical school,OpenAI

