Does this prove Claude Code lies about being done?

No. This is a same-turn heuristic candidate surface, not a final truth label. It proves the completion surface is large enough that operators need verification receipts instead of trusting terminal language.

What counts as a verification receipt?

A concrete command, test, diff, deployment check, screenshot, linked artifact, or explicit reason verification could not be run.

What should I change in my Claude Code workflow?

Require every done message to include evidence, and label unverified work as implemented but unverified.

Claude Code Verification Debt: The Agent Said Done, But Where Are the Receipts?

I refreshed my Claude Code corpus mart on May 7, 2026 and asked a blunt question:

When the agent says the work is done, how often does the same assistant turn also carry a verification claim?

The corpus basis:

4,234 parsed session IDs
44,483 human messages
450,878 assistant turns
247,592 tool events
Timestamp range: November 25, 2025 to May 6, 2026

This is my own corpus, not a survey. The public value is the method and the operator protocol, not pretending my workflow represents everyone.

The Finding

The mart flagged:

Surface	Count
Assistant turns	450,878
Completion-claim turns	16,942
Completion claims with same-turn verification claim	1,662
Plausible completion candidates without same-turn verification	15,280
Candidate sessions	2,596

That last number is the one that matters: 90.2% of completion-claim turns landed in the unverified same-turn candidate bucket.

Careful wording: this does not mean 90.2% of completions were false. It does not mean Claude Code failed to verify later. It means the raw transcript surface is full of assistant turns that sound terminal before the evidence is visible in the same turn.

That is verification debt.

Why This Is Non-Obvious

Most operators judge an agent by whether the final message feels competent.

That is the wrong object.

The final message is narration. The tool trace is reality. A high-agency coding agent can make huge real progress and still end with a final paragraph that compresses uncertainty too aggressively.

The dangerous failure mode is not always hallucinated code. It is plausible completion:

The agent gets 80% of the way there.
The last test, migration, deploy, or edge case remains unresolved.
The final answer sounds like a closed loop.
The human moves on.

That is how agentic coding creates invisible liability.

The So What

Stop asking "is it done?"

Ask "where are the receipts?"

For Claude Code, the final answer should include at least one of:

exact command/check run and result
exact test suite and pass/fail state
exact file or diff evidence
deploy/build URL or screenshot evidence
explicit statement that verification was impossible
explicit label: implemented but unverified

That label matters. It turns a vague completion into a known risk state.

The Operator Protocol

Add this to your Claude Code working style:

Do not call the task done unless you include verification receipts.
If you changed code, include the exact command or check you ran.
If you could not verify, say "implemented but unverified" and explain why.

Then enforce it socially. When the agent says "done" without receipts, do not read the paragraph. Ask for the receipt.

The More Aggressive Version

For high-risk work, split "implementation" from "closure":

Phase 1: implement the smallest coherent patch.
Phase 2: run the narrowest meaningful verification.
Phase 3: only then summarize what is done.

This is slower on easy tasks. It is dramatically cheaper on the sessions that matter, because the expensive failures are not syntax mistakes. They are false closure states, and they concentrate in the same explicit high-error sessions where most token cost concentrates.

Pair this with a loop-depth checkpoint and false closure gets harder to reach: the agent has to produce receipts at known moments, not just when it feels finished.

Why This Can Supercharge Claude Code

The trick is not a better magic prompt. The trick is changing what counts as an acceptable terminal state.

Claude Code is already good at reading, editing, and running tools. The operator advantage is forcing the loop to end on evidence instead of confidence.

That is the public field-manual rule:

Done is not a sentence. Done is a receipt.

The methodology and reproducible query factory live in the public repo: MPIsaac-Per/claude-code-ops-audit.