AI Reasoning Model Helps Diagnose 4.8% of Unsolved Rare Disease Cases in NEJM AI Study

A new NEJM AI study details how OpenAI's o3 Deep Research model helped physicians diagnose 18 rare genetic diseases in children, adding a 4.8% yield after years of expert analysis failed. The AI surfaced evidence-linked hypotheses, not diagnoses, showing promise for scalable reanalysis of unsolved cases.

By Inside AI June 18, 2026
AI neural network visualization

June 18, 2026, (Inside AI) — A new study shows that an AI reasoning model helped physicians diagnose rare genetic diseases in 18 previously unsolved cases, adding a 4.8% diagnostic yield after years of expert analysis failed. Researchers from Boston Children's Hospital, Harvard University, and OpenAI used the OpenAI o3 Deep Research model to reanalyze de-identified data from 376 cases, surfacing evidence-linked hypotheses for clinicians to review.

The findings, published today in NEJM AI, demonstrate how AI can help experts generate leads when revisiting difficult cases, not by diagnosing patients but by connecting scattered clues across genomes, clinical records, and scientific literature. Every diagnosis was made by qualified physicians following standard confirmation processes.

Why Old Data Hides New Answers

Rare-disease diagnosis is a moving target. A patient's genome may stay static, but knowledge around it evolves: new gene-disease links emerge, variants get reclassified, and case reports accumulate. This creates a maintenance backlog, as institutions must periodically reanalyze old genomes against fresh evidence.

In this study, the AI acted as an explanation-first reasoning layer atop existing genomic pipelines. It was tasked with connecting clinical features, inheritance patterns, variant evidence, and literature into justifications a human reviewer could interrogate. For each case, researchers assembled de-identified packets with standardized phenotype terms, variant tables, and metadata.

The model proposed plausible molecular explanations and showed its work. At least two experts then reviewed each candidate using the ACMG/AMP framework. A finding counted as a diagnosis only after expert consensus, pathogenic classification, lab confirmation, and return of results to the family.

Testing the Workflow on Known Cases

Before tackling unsolved cases, the team refined the workflow on cases with established diagnoses. The model recovered the correct gene and variant in duplicate runs for 48 of 51 varied rare conditions. In 57 neuromuscular cases, it succeeded in 45. For 15 long-read genome cases, it named the correct gene in all and both disease-causing alleles in 12.

Model confidence scores tracked with accuracy: a mean minimum of 85.6 for consistent correct calls versus 42.1 for incorrect ones. These scores were not calibrated probabilities but helped experts prioritize candidates.

Surfacing Hidden Diagnoses

Applied to four unsolved cohorts—neurodevelopmental, neuromuscular, early psychosis, and sudden pediatric death—the workflow yielded 18 diagnoses. Seven were rediscoveries, where known pathogenic variants had been missed due to fragmented records. This highlights the operational challenge of synthesizing data across silos.

In one early-psychosis case, the model inferred a structural variant not in the input data. It connected low-quality calls on chromosome 22 with cardiac, immune, and psychiatric features, hypothesizing a 22q11.2 deletion linked to DiGeorge syndrome. Follow-up sequencing confirmed it.

Sometimes the model surfaced two genes better explaining complex presentations, such as LAMA2 and FOXP1 in one case, or a digenic explanation involving TTN and SRPK3 in another.

From Scattered Clues to Testable Hypotheses

The model also identified a possible novel mechanism for vitiligo. In a neurodevelopmental case, it highlighted an 11-amino-acid deletion in S1PR1, a receptor involved in immune cell movement. It integrated evidence suggesting the deletion could alter signaling, reducing pigment production while helping immune cells persist in skin.

This proposed S1PR1-vitiligo link requires experimental validation but shows AI's power to translate scattered findings into concrete hypotheses. The team also saw possible phenotype expansion: damaging variants in HSPB8 and CDK13 didn't perfectly match known disorders, hinting at broader clinical spectra.

Limits and Next Steps

The study was retrospective, cohorts were heterogeneous, and reviewers weren't blinded to model confidence. Researchers didn't measure time saved, cost, or false-positive workload. The model didn't systematically evaluate structural variants, repeat expansions, or mosaicism.

Every result passed through human adjudication. The model widened the search and focused expert analysis; it didn't decide what to return to families. Broader deployment will need privacy, security, and regulatory safeguards.

Prospective studies should compare AI-assisted reanalysis with standard practice on diagnostic yield, time, effort, and cost. Newer models like GPT‑Rosalind may offer deeper life-sciences capabilities but require separate evaluation.

The Manton Center will lead next-stage work through an OpenAI Foundation grant, aiming to build a platform-agnostic, low-cost genetics AI copilot. The long-term promise is not AI replacing diagnosis, but helping specialists identify evidence worth investigating—so unanswered questions don't stay that way forever.

More from Inside AI

  • Agentic AI

    Genesis AI Unveils Eno: A Non-Humanoid Robot with Human-Level Dexterity

    June 20, 2026
  • Artificial Intelligence (AI)

    SpaceX IPO Ignites Investor Frenzy for Orbital AI Data Centers

    June 20, 2026
  • Agentic AI

    Pinterest Debuts AI Business Assistant, Agentic Protocol, and Ask Pinterest App

    June 20, 2026
  • Artificial Intelligence (AI)

    Maharashtra Inks Free AI Training Deal with Google for 400,000 Teachers

    June 20, 2026
  • Artificial Intelligence (AI)

    Respond.io Raises $62.5M to Dominate Mid-Market Customer Conversations

    June 20, 2026
  • Artificial Intelligence (AI)

    Norway Bans AI for Young Students, Restricts Use for Teens

    June 20, 2026
  • Generative AI

    Anthropic Blackout Exposes AI Governance Crisis: Who Controls Frontier Models?

    June 20, 2026
  • Generative AI

    Norway Bans AI in Elementary Schools: Inside the New Policy

    June 19, 2026

Never Miss a Breakthrough

Join 50,000+ readers who get our daily AI intelligence briefing. No fluff, just what matters.

Inside AI is an independent publication covering artificial intelligence news, machine learning research, and the tools shaping the future of technology. No fluff. No hype. Just what matters.

Topics

  • Artificial Intelligence
  • Machine Learning
  • Generative AI
  • Agentic AI
  • Vibe Coding
  • Prompt Engineering
  • AI Tools & Reviews (Coming soon)

Company

  • Editorial Standards
  • Privacy Policy
  • Terms of Service
  • Contact

© 2026 Inside AI. All rights reserved.

Designed by Blue Flare Digital