Is agentic coding mostly search?

Search is a first-order part of the loop, but the share depends heavily on workload and Bash-search definition. Entire reported 48.8%; my Claude Code corpus lands between 30.4% and 37.0%.

Does faster search fix agent latency?

Usually not by itself. Both Entire's benchmark and my latency cuts point to the same issue: raw tool execution is usually much smaller than the model and orchestration loop around it.

What should operators measure instead?

Measure whether search gets the agent to the right file sooner: searches before first read, first relevant result rank, output size, and the first useful inspection path.

Agentic Search Is Major, Not Half

Entire's May 6, 2026 post, How We Improved Agentic Search, put a useful public number on a thing coding-agent operators feel every day: search is not a side quest.

Their public trace set:

Surface	Count
Checkpoints	1,983
Tool calls	202,142
Search-related tool calls	98,555
Search share	48.8%

Their search split:

Search bucket	Count	Share of search calls
Read / file retrieval	48,322	49.0%
Bash search fallback	23,180	23.5%
Grep / content search	23,136	23.5%
Other	3,917	4.0%

I reran the same question against my refreshed Claude Code corpus. The answer is not identical. It is still large enough to matter.

The Replication Cut

My denominator:

4,234 parsed session IDs
247,592 tool events
timestamp range: November 25, 2025 to May 6, 2026

The tricky part is Bash. A dedicated Grep call is easy to classify. A shell command can be a real search, a setup step, a test, a deployment, or a pipeline that starts with cd and searches later. So I kept three cuts:

Local definition	Search events	Share of all tool events
Strict first-token Bash search	75,200	30.37%
Wide first-token Bash / file-discovery search	87,236	35.23%
Strict command-text Bash search	91,501	36.96%

The strict first-token cut counts file retrieval, dedicated grep/content search, and Bash commands whose first command token is a search verb. The wide cut includes more file-discovery verbs. The command-text cut catches shell pipelines where the search verb appears after a setup command.

That gives the cleaner headline:

In this corpus, search is major, not half.

Entire saw 48.8%. My corpus lands between 30.4% and 37.0%. The exact percentage is workload-shaped and taxonomy-shaped, but both datasets reject the same bad mental model: coding agents are not mostly final-answer generators. They spend a large fraction of their life trying to find the right thing to inspect.

Why The Numbers Differ

The corpora are different. Entire's public analysis comes from real-world development checkpoints from the open source entireio/cli repo. Mine is a single-operator Claude Code corpus across many repos, remote machines, issue-tracker work, browser checks, infrastructure commands, and long-running maintenance sessions.

The tool surfaces are also different. Entire's benchmark normalizes search through a search_code abstraction. Claude Code exposes Read, Grep, Glob, Bash, WebFetch, MCP tools, and delegated agents. Classifying "search" in that environment is inherently messier.

That messiness is the point. The agent does not care whether a file was found through Grep, rg, find, ls, cat, or a shell pipeline. It cares whether the returned slice helps it decide what to read next.

What Held Up

Entire's strongest result was not "search is exactly half." It was that faster search alone is not the main bottleneck. In their speed benchmark, faster indexed search cut median search_code latency from 14.7 ms to 1.7 ms, while wall clock moved only modestly because tool execution was a tiny share of total runtime.

My latency cuts point the same way:

Local surface	p50
Kept tool-result delta	450 ms
User-to-assistant delta	4,182 ms
Ratio	9.3x

The Codex telemetry overlay is even cleaner:

Codex telemetry surface	p50
Tool dispatch	11.62 ms
Model stream request	3,601.86 ms

So I would not read the search-share result as "make grep faster and the agent is fixed." The better read is: search is frequent enough that the quality of search output shapes the loop, but raw search latency is usually not the first-order wall-clock bottleneck.

The Better Evaluation Target

Entire's pgr result is the useful direction: ranking and presentation improved first-result relevance more clearly than raw speed improved end-to-end runtime.

That matches the local first-read cut. In 1,831 edit-anchored sessions, early reading correlated with total work, not cleanly with explicit failures:

Relationship	Correlation
Pre-edit reads vs total tools	0.249
Pre-edit reads vs total explicit errors	0.021
Pre-edit reads vs post-edit explicit errors	0.081

That does not say "read more and errors disappear." It says hard tasks require more exploration, and the useful question is whether the agent reaches the right inspection path sooner.

The Operator Protocol

If you are evaluating agentic code search, do not stop at search latency.

Track:

search share of all tool calls
search calls before first useful file read
first relevant result rank
output characters returned per search
repeated reformulations before inspection
downstream tool count, cost, and wall clock

The first four are the search-layer metrics. The last two are downstream system metrics. Do not collapse them into one number too early.

The practical protocol:

Optimize search for first useful inspection, not raw scan speed.
Show the agent fewer, better-ranked, better-trimmed candidates.
Then measure whether it reaches the right file sooner.

That is the piece of Entire's finding I would carry into Claude Code, Codex, Cursor, Aider, or any other coding-agent loop.

Search is not the whole system. It is one of the load-bearing surfaces inside the system.

The public query factory for the local replication is here: MPIsaac-Per/claude-code-ops-audit.