Microsoft AI tool outperforms doctors in diagnosing complex medical cases

5 hours ago 1
CAT scan images. (National Cancer Institute Photo via Unsplash)

Microsoft today announced an artificial intelligence tool that outperformed a panel of medical doctors in diagnosing complicated cases.

The Microsoft AI Diagnostic Orchestrator (MAI-DxO) faced off against 21 experienced physicians from the U.S. and United Kingdom presented with complex cases documented in the New England Journal of Medicine. MAI-DxO gave a correct diagnosis for 85.5% of the test cases while the doctors hit the mark 20% of the time.

The tool is not yet available for clinical use.

“We’re taking a big step towards medical superintelligence,” said Mustafa Suleyman, CEO of Microsoft AI, in LinkedIn post.

Microsoft used the tool with well-known AI models including GPT, Llama, Claude, Gemini, Grok and DeepSeek. The best setup was MAI-DxO paired with OpenAI’s o3.

Like a human physician, MAI-DxO diagnoses by analyzing symptoms, posing questions, and recommending medical tests. A key feature is its ability to optimize costs, preventing the ordering of superfluous diagnostics that contribute to overspending in healthcare.

While MAI-DxO outperformed the medical providers, the tech company acknowledged that under normal conditions the doctors would not be operating in isolation and would be able to consult colleagues and online and print resources.

The new diagnostic performance benchmark was created from 304 recent cases documented by the New England Journal of Medicine.

The tool builds on earlier tests of AI performance in medicine that quizzed the bots on the U.S. Medical Licensing Examination (USMLE) standardized test. Microsoft said AI tools have evolved to get nearly perfect scores on the test, but their multiple-choice structure favors “memorization over deep understanding.”

The new benchmark using complicated cases requires higher-order skills to perform “sequential diagnosis, a cornerstone of real-world medical decision making,” according to Microsoft’s blog post.

Next steps for developing the tool for public use include testing its abilities against more commonplace ailments. Before it could be deployed in healthcare practices, it would require testing in a clinical setting for safety and performance and approval from regulators.

Bay Gross, Microsoft AI’s vice president of health, described the effort in a LinkedIn post as “a proof-of-concept showing that [large language model] systems can master medicine’s most intricate diagnostic challenges by following the same step-by-step reasoning and debate process that expert physicians use every day.”

As it has with other AI applications, Microsoft leaders emphasized that the bots are not meant to replace people but to optimize their output, in this case automating routine tasks, assisting in diagnosis and creating personalized care strategies.

More details on the research are available in a paper that has not yet been accepted by a scientific journal.

Read Entire Article