MPIsaac Ventures
Back to Blog

Claude Code Verification Debt: The Agent Said Done, But Where Are the Receipts?

Michael Isaac
Michael Isaac
Operator. 30 yrs in enterprise AI.3 min read

I refreshed my Claude Code corpus mart on May 7, 2026 and asked a blunt question:

When the agent says the work is done, how often does the same assistant turn also carry a verification claim?

The corpus basis:

  • 4,234 parsed session IDs
  • 44,483 human messages
  • 450,878 assistant turns
  • 247,592 tool events
  • Timestamp range: November 25, 2025 to May 6, 2026

This is my own corpus, not a survey. The public value is the method and the operator protocol, not pretending my workflow represents everyone.

The Finding

The mart flagged:

SurfaceCount
Assistant turns450,878
Completion-claim turns16,942
Completion claims with same-turn verification claim1,662
Plausible completion candidates without same-turn verification15,280
Candidate sessions2,596

That last number is the one that matters: 90.2% of completion-claim turns landed in the unverified same-turn candidate bucket.

Careful wording: this does not mean 90.2% of completions were false. It does not mean Claude Code failed to verify later. It means the raw transcript surface is full of assistant turns that sound terminal before the evidence is visible in the same turn.

That is verification debt.

Why This Is Non-Obvious

Most operators judge an agent by whether the final message feels competent.

That is the wrong object.

The final message is narration. The tool trace is reality. A high-agency coding agent can make huge real progress and still end with a final paragraph that compresses uncertainty too aggressively.

The dangerous failure mode is not always hallucinated code. It is plausible completion:

  1. The agent gets 80% of the way there.
  2. The last test, migration, deploy, or edge case remains unresolved.
  3. The final answer sounds like a closed loop.
  4. The human moves on.

That is how agentic coding creates invisible liability.

The So What

Stop asking "is it done?"

Ask "where are the receipts?"

For Claude Code, the final answer should include at least one of:

  • exact command/check run and result
  • exact test suite and pass/fail state
  • exact file or diff evidence
  • deploy/build URL or screenshot evidence
  • explicit statement that verification was impossible
  • explicit label: implemented but unverified

That label matters. It turns a vague completion into a known risk state.

The Operator Protocol

Add this to your Claude Code working style:

Do not call the task done unless you include verification receipts.
If you changed code, include the exact command or check you ran.
If you could not verify, say "implemented but unverified" and explain why.

Then enforce it socially. When the agent says "done" without receipts, do not read the paragraph. Ask for the receipt.

The More Aggressive Version

For high-risk work, split "implementation" from "closure":

Phase 1: implement the smallest coherent patch.
Phase 2: run the narrowest meaningful verification.
Phase 3: only then summarize what is done.

This is slower on easy tasks. It is dramatically cheaper on the sessions that matter, because the expensive failures are not syntax mistakes. They are false closure states, and they concentrate in the same explicit high-error sessions where most token cost concentrates.

Pair this with a loop-depth checkpoint and false closure gets harder to reach: the agent has to produce receipts at known moments, not just when it feels finished.

Why This Can Supercharge Claude Code

The trick is not a better magic prompt. The trick is changing what counts as an acceptable terminal state.

Claude Code is already good at reading, editing, and running tools. The operator advantage is forcing the loop to end on evidence instead of confidence.

That is the public field-manual rule:

Done is not a sentence. Done is a receipt.

The methodology and reproducible query factory live in the public repo: MPIsaac-Per/claude-code-ops-audit.