AI safety in healthcare is a care model architecture problem
Why model benchmarks alone are not enough to make AI in healthcare safe
Last week, I wrote about the Care Model Stack: four layers of clinical infrastructure (knowledge, intelligence, application, and workflow) that have to be deliberately engineered for AI intelligence to effectively drive care delivery and how it is being applied to build an AI eConsult model at Stanford. This post discusses how to use the care model stack to design systems for AI safety.
The dominant conversation about AI safety in healthcare is focused almost entirely on AI model performance, which is important but insufficient, for the same reason that model capability alone is insufficient for clinical transformation. What drives healthcare outcomes is not what the model can do in isolation, but how the broader sociotechnical system enabled by the model impacts care. A model that performs well on benchmarks can still cause harm when the knowledge it draws on is stale, the uncertainty it detects is never surfaced, or the workflow it feeds into has no one watching what comes out.
The Care Model Stack provides a more comprehensive framework for AI safety that brings it closer to real-world implementation. If AI intelligence has to be engineered into each layer of the stack in order to land in care, then safety has to be engineered into each layer of the stack as well. A safety review that only examines model performance is the equivalent of a care model design that only considers the AI model. It addresses one layer and ignores the three others where the system can, and does, fail.
Why current approaches to AI safety fall short in healthcare
The standard approach to AI safety relies on testing model performance against a set of benchmarks, with a focus on tasks such as clinical diagnosis and reasoning and identifying when models fail in these tasks.
This approach misses how safety failures occur not just at the model, but across the layers of the care model stack that the AI model either relies on or propagates into. The knowledge base the AI draws on may drift and become outdated without anyone noticing. The AI model, using an outdated knowledge base, may appear overconfident in its outdated and incorrect outputs. The modification patterns that consultants exhibit when they edit AI-generated drafts may be captured in an audit log that no one reviews. The workflow layer may have no designated person responsible for monitoring any of it.
To make this concrete: imagine an AI nephrology eConsult program that has been running well for eighteen months. The underlying foundation AI model that powers the chart summarization and draft responses have performed well against standard medical benchmarks. But the knowledge base was built against the prior formulary, and the institution recently substituted a newer SGLT2 inhibitor with a different renal dosing threshold. Every eConsult touching SGLT2 dosing in a CKD patient is now drawing on a stale knowledge layer. The model generates recommendations for a medication that is no longer available in the hospital with incorrect dosing, but with high expressed confidence. Consultants, moving through a high-volume queue, may not catch the error, or may not even be aware of the formulary change.
Standard healthcare benchmarks for validating AI models would not catch these errors, as the model’s clinical reasoning capabilities are intact, but just reasoning off outdated knowledge. The signal that something is wrong may be sitting in modification logs: certain nephrologists have been lightly editing the SGLT2 recommendations for weeks, consistently, in the same direction. But without a system to proactively monitor these logs and connect to the versioning of the knowledge base, this hypothetical AI eConsult system would fail to detect and adapt to these changes until a safety event prompts a formal review.
This failure of AI safety did not come from the AI model. Rather, it is a knowledge layer failure propagating through an intelligence layer that cannot detect it, an application layer that is not capturing the right signal, and a workflow layer with no one to close the loop.
AI safety architecture for each layer of the care model stack
Building a safety system for an AI-enabled consult program means engineering a safety function into each of the four layers, and building connectors that carry safety signals across them. The concept of care model connectors I described, components that propagate AI intelligence into where care delivery happens, applies here in the opposite direction: safety connectors propagate signal from where care happens back into the systems and roles that govern AI behavior.
Here is what this could look like for each layer of an AI eConsult care model.
Knowledge layer: bounding and updating what the AI knows
Safety at the knowledge layer means encoding not just what the AI knows, but where its knowledge ends. Every specialty knowledge base should carry an explicit boundary specification: the patient presentations, question types, and clinical contexts that fall outside the validated domain of that template. A nephrology knowledge base excellent on CKD management designed for an adult general nephrology eConsult service may have no reliable grounding in pediatric dosing or rare hereditary nephropathies. That boundary should be documented, flagged to the clinician at the point of care as a limitation and area of uncertainty for the AI, and used to route those cases to a different kind of review.
Knowledge base versioning is also a safety function, not just a quality management one. Linking template versions to the consultations they informed creates the audit trail needed to investigate whether a shift in clinical performance correlates with a change in knowledge infrastructure. And a connector that links external change events like formulary updates or guideline revisions to the specific templates they affect (aka a “knowledge base update connector”), is what makes knowledge staleness detectable before it causes harm rather than after.
Currently, the predominant clinical consultation model in healthcare has no equivalent, as specialty knowledge lives within individual clinicians and not systematically maintained, versioned, or bounded. In order to safely apply AI into a specialty consult care model, this knowledge layer architecture and associated AI safety connectors must be designed and built.
Intelligence layer: AI should be calibrated to communicate uncertainty based on its knowledge
In addition to model accuracy, another consequential safety property of a clinical AI system is calibration: whether the model’s expressed confidence tracks its actual reliability. A model that produces uncertain outputs with high expressed confidence is as dangerous as the inexperienced physician who is confidently wrong. Clinicians testing our pilot AI eConsult system at Stanford have reported that the AI can appear highly confident even when its clinical logic is flawed. This misplaced certainty is not only off-putting to experienced practitioners but also represents a potential safety risk.
How can we solve this problem for an AI eConsult program? One idea is to generate “confidence profiles” alongside AI generated clinical recommendations that derive from the dimensions that mirrors what an experienced clinician would consider in diagnostic reasoning: completeness of the retrieved EHR data, coverage of the clinical question within the validated knowledge base, how updated is the knowledge base itself, and internal coherence of the reasoning chain. In the above nephrology scenario, the system may note that the knowledge base for CKD had not been updated with a new formulary for over three years, flagging the confidence profile to reflect that the recommendations may potentially contain outdated information.
These profiles serve two functions simultaneously. In addition to informing the clinician reviewing the current case, they accumulate as a dataset across consultations, making it possible to detect patterns over time to enable prospective safety monitoring: e.g. a particular case type consistently generating low confidence, a particular specialty showing coherence degradation as its caseload shifts.
Application layer: proactively capture safety signals
The application layer, which is where the specialist interacts with the AI system to complete the consultation, is the interface where safety signals can be proactively captured and fed back into the rest of the care model stack.
Every edit a consultant makes to an AI-generated draft recommendation is a potential safety signal. For example, in the above hypothetical nephrology eConsult scenario, there may have been an accumulation of modifications by nephrologists who were aware of the formulary change and noticed the error. A system designed to monitor these modifications and connect these patterns back into updates in the knowledge layer may surface this gap much earlier before compounded failures result in user frustration, or worse, patient safety events.
Workflow layer: the roles and routines that close the loop
The workflow layer is where safety architecture most often fails to be built. A well-designed knowledge base, a calibrated intelligence layer, and a signal-capturing interface fall short without roles and routines designed to act on what they generate.
An AI-enabled eConsult program needs at least one human “connector” between the clinical signal layer and the governance layer: a role I think of as a “Safety Manager” for an AI care model. This is a role whose function is not to investigate specific incidents but to maintain situational awareness of how the AI system is performing across the whole program. Their cadenced review covers the distribution of modification patterns, confidence profiles, and outcome correlations, not just the adverse events that surface through incident reporting. They own the feedback loop to knowledge base maintenance and to institutional AI governance, and are accountable for detecting slow drifts in the system before they become patient safety events.
Taken together, the layer-by-layer safety functions and their connectors compose a monitoring system with a specific architecture: a signal bus that receives structured events from all four layers, a drift detection engine that watches modification patterns and confidence distributions for threshold-crossing changes, a safe operating zone model that characterizes the conditions under which AI-assisted consultations perform well and monitors for drift away from them, and a Safety Manager console that brings all of it to a human with the role and authority to act.
How this changes AI safety strategy in healthcare
AI safety strategy in healthcare is organized around an incomplete question: is this model safe to deploy? The question may be sufficient when AI existed as discrete models applied to a discrete task. It makes less sense now when foundation AI models are embedded in a care model, operating continuously across a population of patients, drawing on a knowledge base that evolves, surfacing outputs through an interface that shapes clinical behavior, and feeding into workflows where the people responsible for oversight may not have been defined.
The Care Model Stack reframes the question into whether the system built around the model is designed to detect when it stops working well, learn from what it produces, and improve over time. That is a different kind of safety strategy, and it has direct implications for how health systems build, govern, and scale AI-enabled care.
Below are examples of how this framing may drive the evolution of operationalizing AI safety:
AI model validation and governance —> overall accountability of the care model stack. Pre-deployment validation certifies a model at a point in time. A stack-based safety architecture assigns accountability to each layer continuously: who owns the knowledge base, who monitors the confidence signals, who reviews the modification patterns, who connects all of it to outcomes.
Incident detection —> drift detection. Current AI safety governance is largely reactive: something goes wrong, an incident is filed, a review is conducted. The modification signal and confidence profile infrastructure described here shift the detection window earlier, to the pattern of small divergences that precede failures rather than the failures themselves. Health systems that build this infrastructure will catch safety problems that their current governance processes are structurally incapable of seeing.
AI safety as a constraint —> AI safety as an enabler of transformation. Health systems that want to expand AI into higher-acuity, higher-stakes care model applications face a credibility problem: how do they demonstrate that the AI is performing safely enough to justify the expansion? A stack-based safety architecture addresses these concerns with a defined safe operating zone built from real-world performance, outcome correlation data, and a versioned record of how the system has behaved. That evidence base is what makes it possible to extend AI into new care contexts with institutional confidence rather than institutional anxiety.
One-time deployments —> compounding infrastructure. The connectors, signal bus, and outcome linkage built for the safety architecture of an eConsult care model not program-specific. They are reusable across care models. Each new AI-enabled care model inherits a safety architecture rather than building one from scratch. The investment compounds rather than repeating.
We begin to understand that AI needs a place to land in healthcare: that clinical transformation requires engineering the stack into which AI intelligence is connected, not just deploying capable models. The same logic applies to safety. Safety in AI-enabled care models needs to be engineered into each layer of that stack, with connectors that carry safety signals across layers and into the people and processes responsible for acting on them. This will not only enable organizations to use AI more safely, but also leverage it to drive care model transformations that scale the return on intelligence in healthcare.
