Efficiency theater
The curtain’s rising on AI’s second act: the one where the numbers finally matter.
Unprompted Dialogue is where scripted answers go quiet. Here, we explore the messy middle between human intention and machine interpretation—the place where real intelligence starts to sound less artificial. If you’re tired of hype, half-truths, and hollow buzzwords, this is your stop.
The accountability era
Three years into the AI cycle, boardrooms are asking a deceptively simple question:
“If AI is everywhere… where are the profits?”
According to new research from M.I.T.’s Media Lab, 95% of enterprise AI initiatives have produced no measurable P&L return. The New Yorker dubbed it the “A.I. profits drought,” comparing today’s hype cycle to the 1980s “productivity paradox,” when the computer age was visible everywhere except in productivity statistics.
Like the web and cloud software before it, AI’s economic promise is real, but unrealized. And the problem, as always, isn’t technology. It’s measurement.
In the 2000s, we learned to translate web traffic into unit economics → CAC, LTV, conversion rate.
In the 2010s, we learned to make cloud spending auditable → usage-based billing, uptime SLAs, observability.
Now, in the 2020s, AI will demand the same evolution: translating conversation quality, model safety, and automation depth into auditable financial impact.
The tools have advanced faster than the telemetry.
Executives have built astonishing AI capabilities without the accounting frameworks to prove impact. Pilots get press releases. Dashboards get prettier. But when the CFO asks, “Show me the financial delta,” the story collapses.
AI doesn’t have an adoption problem.
It has an audit problem.
The audit problem
Inside most organizations, AI’s value chain breaks at the last mile: the link between operational metrics and financial outcomes.
Dashboards overflow with activity data: intents handled, conversations resolved, seconds saved. Yet those metrics are directionally positive but financially meaningless.
“Deflections” sound efficient until you realize they mask repeat contacts and churn risk.
“Containment rate” looks impressive until you account for escalation leakage.
“Average handle time” improves while total cost per resolution worsens.
It’s efficiency theater—optics without economics.
Executives can’t defend those metrics to boards, auditors, or regulators. And in 2026, they’ll have to.
Investors are already asking where the productivity gains are. Regulators are starting to ask how AI decisions are monitored, audited, and remediated.
We’ve optimized for adoption; now we have to optimize for defensibility.
The executive scorecard
If “Stop Celebrating Deflections” was about reframing what success means, this scorecard is about proving it.
Four metrics form the backbone of a defensible AI ROI system. Each one auditable, economically traceable, and human-verifiable.
Together, they translate model behavior into business language: accuracy, completion, safety, and trust.
1. Resolution Accuracy Rate (RAR)
Question: Did the AI actually solve the problem?
The percentage of AI-handled conversations independently verified as correctly resolved.
Equivalent to completion rate in traditional service QA, but grounded in outcome validation rather than intent matching.
Maps directly to cost per resolution and repeat contact rate, the most direct P&L lever.
Why it matters: Accuracy is the foundation of ROI. Without verified resolution, efficiency is just illusion.
2. First Contact Resolution (FCR)
Question: Did the issue stay solved?
Measures whether a customer’s problem was fully resolved on the first attempt, regardless of whether the responder was human or AI.
Each failed FCR creates downstream labor, repeat contact cost, and hidden churn risk.
Ties directly to both cost efficiency (repeat volume ↓) and revenue protection (retention ↑).
Why it matters: Containment isn’t success; durable resolution is.
3. Safety Incident Rate (SIR)
Question: How often does the AI break its own rules?
Tracks the number of compliance, factual, or tone violations per 1,000 interactions.
Functions as a leading indicator for governance breakdowns and regulatory exposure.
Can be independently audited against internal policy or external standards.
Why it matters: Every incident is an exposure event — reputational, regulatory, or legal. Safety isn’t a side metric; it’s a balance-sheet risk.
4. Handoff Continuity Score (HCS)
Question: When the AI transfers to a human, does it preserve context and trust?
Measures how effectively the AI captures and conveys context during escalation.
High continuity reduces friction, duplication, and frustration — keeping CSAT and resolution cost in balance.
Low continuity increases cost and erodes brand credibility.
Why it matters: Escalation shouldn’t be seen as a failure. It’s a brand moment to be optimized. Continuity protects trust when automation ends.
Together, these four metrics create a CFO-safe measurement layer.
When the path from AI metric to P&L line is visible, credibility replaces persuasion, and AI finally becomes accountable.
Designing for auditability
At some point, every AI initiative reaches the same moment. The pilot’s over, the dashboards look good, and someone in finance asks:
“Can we prove any of this?”
That’s where most stories stall.
Auditability is how they move forward. It’s what turns experiments into systems that can stand up to scrutiny — and keep improving under it.
It starts with traceability.
Every AI decision should leave behind evidence: a conversation ID, a resolution code, a human-verified record that says, this is what happened and why.
Traceability gives memory to machines. The context that lets you investigate, learn, and adjust.
Then comes governance.
Not the compliance kind that lives in shared drives, but the operational kind that happens on cadence.
Quarterly reviews. Cross-functional checkpoints. Performance reports that hold AI to the same standard as any financial system.
Add a layer of shadow audits
A small but regular sample of re-scored conversations, examined for accuracy, empathy, and consistency.
It’s amazing how much you learn when you slow the tape down.
And finally, look downstream.
Don’t stop at the conversation. Follow what it triggers: the recontacts, refunds, retention curves.
That’s where financial truth hides.
The companies that master this won’t be the ones with the flashiest demos.
They’ll be the ones whose AI you can trust, because every number, every judgment, every outcome has a trail.
That’s what auditability gives you: not bureaucracy, but confidence.
The leadership shift
In the early computer era, productivity gains didn’t show up until managers learned how to redesign work around the technology. AI will be no different.
As John Cassidy wrote, quoting the M.I.T. research,
“Just 5% of integrated AI pilots are extracting millions in value, while the vast majority remain stuck with no measurable P&L impact.”
Management discipline determines who escapes the pilot phase and who stays stuck proving concept.
Executives who treat AI as an operational curiosity will stay in the 95%.
Executives who treat it as an auditable asset will define the next benchmark.
Progress will belong to those who can prove value, not just promise it.
The new executive mandate
AI in service has entered its accountability phase. The era of “we think it’s working” is over.
Boards are asking for evidence.
Auditors are asking for control.
CFOs are asking for numbers that hold up under pressure.
The next generation of leaders won’t be defined by how quickly they deploy AI — but by how confidently they can stand behind its results.
The systems that endure won’t just automate work.
They’ll account for it.


