By Ben Crocker, MD, Senior Vice President, Care Design and Innovation, IKS Health
LinkedIn: Ben Crocker, MD
LinkedIn: IKS Health
The debate over the return on investment (ROI) of ambient AI clinical documentation has reached a critical juncture. What began as a narrow question – Do AI scribes increase productivity enough to justify their cost? – has turned into something more revealing: a broader reckoning with how health systems define value in the clinical AI era.
Across recent studies and commentaries, a consistent message emerges. Ambient AI documentation is not underperforming. What it is doing is exposing the limits of the legacy frameworks we continue to use to evaluate it. The problem is not the technology. It is the lens.
To understand why, it helps to look back at the evolution of clinical documentation. Dictation and transcription have existed for decades, and recent advances in AI-driven speech recognition have dramatically reduced their cost and improved turnaround time. Even so, adoption remains constrained by workflow fit rather than economics alone. These models still place the burden of narration and synthesis on the clinician after the encounter, but remain well suited for clinicians and specialties that value narrative control, asynchronous workflows, or highly structured reporting.
Over the last decade, the emergence of live and asynchronous remote scribes marked a meaningful shift. By ambiently capturing the clinical encounter, they reduced documentation fatigue and moved clinicians from note authors to note reviewers and editors. Patient engagement improved and notes closed sooner. After-hours documentation declined and burnout eased. For many clinicians, live and remote scribes remain a preferred model, offering trusted human judgment and adaptability that technology alone cannot always replicate. In practice, clinicians who use these models consistently describe high satisfaction, strong trust, and a willingness to advocate for them – signals that experiential value can be both durable and deeply personal, even when financial metrics are harder to isolate.
The trade-offs, however, are structural rather than qualitative. Scribes require sustained operational investment and thoughtful deployment to perform well at scale. While they can support incremental visits or wRVU gains, their primary value has never been purely throughput-driven. In practice, these models continue to make sense in environments where continuity, training investment, and predictable performance outweigh the efficiencies of automation at scale.
When generative AI arrived, ambient documentation was framed as the long-awaited solution. The same relief, without the labor expense. Vendors and health systems alike positioned AI scribes as productivity accelerators that would finally make the math work. For clinicians burned out by documentation, the promise was compelling, and hard to ignore.
The data, however, tell a more constrained story.
Recent studies show that ambient AI documentation does not compromise billing integrity and can produce incremental financial gains. Increases in work RVUs per encounter and per week are real and statistically significant, establishing financial credibility even as they highlight the limits of RVUs as a sole measure of value. These findings matter. They establish financial credibility. At the same time, they clarify a boundary: ambient AI documentation is unlikely to be transformative if judged solely by its ability to increase throughput.
Across multiple studies, the most consistent effects of ambient AI documentation are not financial. They are experiential. Clinicians report less work exhaustion, lower task load, and reduced emotional fatigue. These effects persist even when documentation time savings are variable or minimal. The evidence draws a distinct conclusion: AI scribes deliver real productivity gains. But their primary and most durable impact lies beyond throughput alone. When success is defined solely by increased visit volume, the risk is obvious: reintroducing clinical workflow pressures that ambient documentation was designed to relieve.
This is not a failure of the technology. It reflects a mismatch between longstanding assumptions and how clinical AI documentation actually creates value. Healthcare ROI analyses have long assumed that time saved should be converted into more patient visits. This is understandable in systems built around throughput-driven revenue models. But it obscures less visible, and often more consequential, sources of value.
Ambient AI documentation primarily works through reduced cognitive load, improved patient engagement, accelerating note completion, and giving clinicians greater control over their workday. Such gains show up as avoided loss: less burnout, fewer delayed notes, lower compliance risk, and reduced workforce attrition. These outcomes compound and accrue over time, which makes them hard to capture in traditional ROI models that are designed to value more immediate gains. When ambient AI does not rapidly translate into higher visit counts, it is labeled as underperforming – even as it meaningfully improves clinician sustainability and system reliability.
Challenges arise when ambient AI is evaluated as a standalone solution rather than as part of an integrated clinical system. Clinical care is a system, not a stopwatch. The real throughput effects and long-term value of ambient AI depend on what happens upstream and downstream, not simply on how fast a note is generated.
There is also a deeper paradox at play. While individual AI models may be cheap and theoretically interchangeable, ambient AI documentation at scale is not. The reason is architectural. Delivering durable value at scale requires deep EHR integration or robust read/write access. Early deployments can deliver meaningful clinician relief and experiential ROI without this level of integration, but the most durable, compounding value emerges when ambient documentation becomes embedded clinical infrastructure rather than a standalone tool.
Realizing this value, however, depends not only on the tools themselves, but on the environments in which they operate. EHR platforms were built to ensure stability and control, but those same design principles can inadvertently constrain innovation when data access and workflow integration are tightly siloed. Ambient AI documentation – and the capabilities that depend on it – benefit most when EHRs enable secure, standardized sharing of data and embrace vendor collaboration rather than limit connection points. The future value of clinical AI will be shaped as much by interoperability and openness as by model performance itself.
This integration enables adjacent dependent capabilities beyond draft note generation, including chart summarization, order assistance, medication renewal, coding and charge capture support, prior authorization prediction, and denial prevention. At that point, ambient AI is no longer an experiment. It becomes part of the clinical backbone, sticky, difficult, and somewhat costly to reverse, but precisely where lasting value tends to accrue.
Longitudinal, real-world evidence reinforces this view. Hybrid models that combine AI-generated drafts with trained human reviewers follow a predictable pattern: early friction followed by substantial reductions in after-hours work, faster note closure, and eventual financial gains. These findings resolve much of the apparent contradiction in the literature. Early implementation pain is not failure, it is a phase. Documentation timeliness matters as much as (if not more than) typing speed. Trust matters as much as automation. Human-in-the-loop design is not a transitional compromise, but a deliberate architectural choice that balances scalability with judgement and efficiency with safety.
Taken together, the evidence points to a clear conclusion. Ambient AI documentation should be evaluated not as a discrete productivity tool, but as an enabling layer within a broader clinical and administrative AI system. And that reality demands a different approach to ROI, one that moves beyond visit counts and RVUs alone. A framework that accounts for adoption dynamics, recognizing early costs alongside steady-state value. One that prioritizes reductions in cognitive load, workforce stabilization, documentation reliability, and readiness for adjacent automation. And one that values organizational learning, how quickly systems adapt and compound returns once this infrastructure is in place.
Under this framework, modest early RVU gains serve as confirmation signals rather than the sole measure of success. The platforms most likely to succeed will treat documentation as infrastructure, embrace hybrid models grounded in safety and scalability, and expand deliberately into adjacent clinical and administrative workflows where integration and oversight compound value.
This shift cannot rest with vendors alone. Health systems are increasingly moving beyond the false binary of “AI-only versus human-only” and recognizing that deep EHR integration is essential for sustained, system-level value. Metrics can evolve to capture avoided loss, reliability, and system readiness, all dimensions of value that traditional productivity measures routinely miss.
Ambient AI clinical documentation is not underperforming. It is doing exactly what foundational technologies tend to do: quietly reshaping workflows, redistributing cognitive work, and forcing health systems to confront what they truly value. The real return on investment will not appear in the first month’s RVU report. It will be found in clinician retention, systems that scale responsibly, and organizations better prepared for what comes next. That is the ROI that matters, and the one we must now learn how to measure.
Technologies that change how work gets done rarely look impressive at first glance. But they reshape everything that follows. Ambient AI is already delivering returns, but many of its most meaningful benefits will only become visible as systems mature, integrations deepen, and clinicians regain control of their time. If we rely exclusively on yesterday’s productivity math, we risk overlooking the forms of value that will matter most tomorrow.