MPIsaac Ventures
Back to Blog

Stuckness Is Where Agentic Coding Gets Expensive

Michael Isaac
Michael Isaac
Operator. 30 yrs in enterprise AI.3 min read

Good coding agents hit errors.

That is not the problem.

The problem is stuckness: repeated action after repeated failure without a clean diagnosis.

In my refreshed Claude Code mart, I bucketed sessions by explicit tool-error count and looked at token volume.

The corpus basis:

  • 4,234 parsed session IDs
  • 247,592 tool events
  • 18,873 explicit tool-reported error events
  • 450,878 assistant turns

This is a correction from the broader keyword-surface draft. Keyword flags are useful for finding error-shaped text, but they fire on normal source code, search results, web pages, and delegated task summaries. The stuckness buckets below use explicit result_is_error only.

The Finding

Token-bearing sessions by error bucket:

Error bucketSessionsAvg explicit errorsAvg toolsAvg human msgsAvg total tokens
No-error sessions1,5600.012.13.21,567,962
1-2 errors7741.437.37.05,874,648
3-9 errors8975.286.213.917,617,903
10+ errors44429.6275.846.085,787,221

Then the concentration:

SurfaceShare
Sessions with 10+ explicit errors12.1%
Token volume in 10+ explicit-error sessions62.6%
Sessions with no explicit errors42.4%
Token volume in no-explicit-error sessions4.0%

This is the agentic coding cost curve.

The cheap sessions are not where the action is. The expensive sessions are where debugging, environment drift, test loops, and file churn concentrate.

Human Rescue Patterns

After a human re-entered following an explicit tool error, the agent most often went back to shell:

Next tool familyInterventionsShareNext explicit failureNext success-or-test signal
shell2,78060.1%15.3%30.8%
file_edit62113.4%5.2%63.3%
file_search4128.9%3.4%31.6%
capability2345.1%2.1%0.0%
delegation1483.2%8.1%85.8%
planning1322.9%0.0%78.8%

One surprising line: delegation and planning were small shares of rescue actions, but had the strongest next success-or-test signals.

That does not mean "delegate more" or "plan more" globally. It means after repeated failure, diagnosis beats another blind command.

The So What

Do not optimize for zero errors.

Optimize for fast error classification.

After two repeated failures, stop and name the failure class. This is also where the loop-depth checkpoint belongs: the agent should produce a diagnosis, not another command.

The failure classes:

  • auth or permission
  • missing file or path drift
  • command misuse
  • dependency or environment
  • test failure
  • product logic failure
  • external service or network

Then run the cheapest disambiguating check.

The Operator Protocol

Paste this into hard Claude Code debugging sessions:

If two consecutive attempts fail, stop execution and classify the failure.
Name the current hypothesis, evidence for it, evidence against it, and the
smallest next check. Do not retry the same action unchanged.

This is not about making the agent cautious. It is about stopping repeated failure from burning context, tokens, and operator attention.

The Non-Intuitive Takeaway

The best agentic coder is not the one who prevents errors.

The best agentic coder notices when the agent has switched from learning to thrashing, and stops it before it accidentally declares done.

That moment is where elite operators intervene.

The public factory for reproducing the stuckness queries is here: MPIsaac-Per/claude-code-ops-audit.