Agents · LLMs · Claude

Why Claude-Style Generalist Agents Work So Well

Claude-style agents work because they repeatedly choose the next useful action, execute it through tools, observe the result, and continue. This post explains why.

By Tomer Galanti · March 31, 2026 · 15 min read · ◆ Source-verified against leaked Claude Code architecture

Introduction

Why do Claude-style generalist agents feel so effective in practice? At first glance, it is tempting to imagine a large hand-written controller somewhere inside the system: first inspect the code, then run tests, then edit the file, then verify, then summarize. But the real picture is both simpler and more interesting.

A Claude-style agent works by combining a strong language model with a tool loop, project context, and runtime structure. The model is not handed a fixed script for each task. Instead, it repeatedly answers a more local question:

The core question Given the user request, the current context, and the result of the last action, what is the best next step?

That simple pattern turns out to be surprisingly powerful. This post explains the mechanism concretely — and cross-references every claim against the actual Claude Code source architecture, publicly leaked on March 31, 2026 via a source map file in Anthropic's npm registry.

All architectural claims have been verified against the leaked src/ directory: ~1,900 TypeScript files, 512K+ lines, built on Bun + React/Ink. Key files: QueryEngine.ts (~46K lines), Tool.ts (~29K lines), context.ts, and the src/tools/ directory with ~40 tool implementations.

This post focuses on four ideas:

The agent is a loop, not a monolithic planner.

It repeatedly chooses the next useful action. There is no fixed script per task type.

Tools turn reasoning into real operations.

Reading files, editing code, running commands, and searching the web are executable actions, not metaphors.

Context tells the agent what is plausible.

Project instructions, memory, tool descriptions, and previous outputs all shape what the model does next.

Verification makes the loop robust.

The agent does not need to be right immediately if it can test, observe failure, and recover.

A useful mental model

User request \to model chooses next action \to tool runs \to result returns \to model chooses again

This is the core of the agent. The model usually does not write code to implement file reading or shell execution. Instead, it emits a structured request — read this file, search for that pattern, run this command, edit that function — and the runtime executes it. The result returns, and the model continues from the updated state.

The agent is best understood not as a giant fixed workflow, but as a repeated decision process over an evolving workspace.

In the source, QueryEngine.ts handles this loop explicitly: streaming API responses, detecting tool-call requests, dispatching to the right tool, collecting results, and feeding them back as context for the next inference pass.

Part I — The core loop

Suppose the user says: Fix the bug in auth.py. A common misconception is that the system has a hard-coded bug-fixing pipeline hidden inside it. In reality, the behavior is a repeated decision: see the context, choose the best next action, execute, observe, repeat. Here is a plausible trajectory — mapped to real tool implementations, with uncertainty dropping at each step:

FileReadTool

Read auth.py

The request names a file. Reading is the cheapest way to reduce uncertainty. The model does not guess — it looks.

Uncertainty85%

GrepTool

Search for related tests

Tests reveal intended behavior. The agent uses ripgrep to search for existing coverage before committing to an edit.

Uncertainty65%

BashTool

Run pytest tests/test_auth.py -q

Running tests exposes actual behavior, not guessed behavior. The model selects this command from project context — pyproject.toml, CI config, or persistent memory.

Uncertainty40%

Feedback

Observe the failure

The test output narrows "something is wrong" to a specific assertion failure. Errors are information, not dead ends — they make the next step much easier.

Uncertainty20%

FileEditTool

Edit auth.py

With the file read, tests found, and failure mode identified, the edit is a narrow, well-informed operation. String replacement — precise, not a wholesale rewrite.

Uncertainty10%

BashTool

Verify — re-run tests

The loop closes. If tests pass, confidence rises. If they fail differently, the loop continues. The agent is not powerful because it never errs — it is powerful because it can close the loop between action and feedback.

Uncertainty2%

Notice what is happening. The agent is not solving the whole task in one shot. It is repeatedly asking: What should I do next, given everything I know right now? Each step reduces uncertainty. That decomposition is one of the main reasons these systems work well.

Part II — Why the next-step decomposition works

Many real tasks are too large, too uncertain, or too stateful to solve in one forward pass. If the user says "fix the bug in auth.py," the model may not yet know what the bug is, whether tests exist, which test command the repository uses, or whether the file has hidden dependencies. A one-shot answer would force the model to guess too much.

The agent loop avoids that. Software tasks have a natural causal structure that maps cleanly to discrete tool invocations:

Fig. 1 — The causal structure of a software task maps cleanly to tool invocations. Each step reduces a different kind of uncertainty.

Part III — What shapes the next action?

Why does the agent decide to read a file before editing it? Why does it run tests? The next action is shaped by four sources of information simultaneously.

1. The user request

The prompt gives the agent a goal. If the user says "fix the bug," the model has learned that editing without inspection is risky. Reading first is the safer move.

2. Tool descriptions

The model knows what tools exist and what each does. Tool descriptions are part of the reasoning environment — they help map an abstract intention like "inspect the code" into a concrete action like "invoke FileReadTool on auth.py."

In the source, tools.ts serves as the central tool registry, and Tool.ts (~29K lines) defines base types, input schemas, permission models, and progress state types for all tools.

3. Project context and memory

If the project includes instructions — build commands, test commands, conventions, or warnings — the model uses them. Context narrows the space of plausible actions from "everything" to "the sensible things for this codebase."

context.ts collects system and user context. The memdir/ system provides persistent memory across sessions. The service layer includes extractMemories/ for automatic memory extraction and teamMemorySync/ for team-level memory synchronization.

4. The result of the previous action

This is what makes the process adaptive. If the agent runs pytest and the output says the repository uses tox, that failure is new information. The next action can be better than the previous one. The agent does not need to know everything at the start — it only needs a good next move and the ability to update.

Part IV — The real tool inventory

The basic description — read, search, run, edit — captures the essence but understates the full picture. The leaked source reveals roughly 40 discrete tools:

Tool	What it does
BashTool	Shell command execution with permission checks
FileReadTool	Reads files — images, PDFs, notebooks, plain text
FileWriteTool	Creates or overwrites files
FileEditTool	Partial modification via string replacement
GrepTool	Content search via ripgrep
GlobTool	File pattern matching
WebFetchTool	Fetches URL content
WebSearchTool	Web search
AgentTool	Spawns sub-agents for parallel work
MCPTool	Model Context Protocol server invocation
LSPTool	Language Server Protocol integration
NotebookEditTool	Jupyter notebook editing
SkillTool	Reusable workflow execution
EnterPlanModeTool	Switches to planning mode
EnterWorktreeTool	Git worktree isolation
TeamCreateTool	Team-level parallel agent management

Several tools — AgentTool, TeamCreateTool, EnterPlanModeTool — reveal capabilities beyond a simple reactive loop. The system can spawn sub-agents, coordinate team-level parallel work, and switch between planning and execution modes. The "next-step" model remains the core, but it operates within a richer infrastructure than the basic description suggests.

Part V — Why generalist agents work across tasks

A Claude-style agent is often called a generalist. That does not mean it knows a perfect workflow for every domain. It means it has one reusable control pattern — understand, choose, execute, observe, repeat — that transfers across tasks.

“The generalism is not in the tools. It is in the loop.”

Bug fixing, refactoring, writing tests, updating documentation, searching the web for missing information, preparing a pull request summary — the tools change, the local decisions change, but the outer control loop stays nearly the same. The source code confirms this: the same QueryEngine.ts handles every task type.

Part VI — The gap between chatbot and agent

A plain chatbot without tools must answer from its internal text distribution alone. It can explain what it thinks might be wrong in auth.py, but it cannot inspect your repository, run your tests, or verify the result.

A tool-using agent is fundamentally different. The gap is not about abstract reasoning ability — it is about the ability to interact with the environment and correct course using real feedback.

✓

Look instead of guess.

Read the actual file, search the actual codebase, fetch the actual URL.

✓

Test instead of speculate.

Run the actual command and observe the actual output.

✓

Edit instead of merely suggest.

Apply a concrete change with FileEditTool, then verify the result.

Part VII — What the blog got right, and what it missed

Cross-referencing the original post against the leaked source reveals a nuanced picture. The core thesis — that Claude Code operates as a next-step decision loop — is accurate and well-supported. The four sources of action-shaping information all have clear counterparts in the codebase. But the blog describes the runtime as having "a small amount of runtime structure." The reality is 1,900 files and 512K+ lines of TypeScript.

The blog also omits several significant architectural features:

Multi-agent coordination

AgentTool spawns sub-agents. The coordinator/ directory handles orchestration. TeamCreateTool enables parallel work across team agents. The system runs multiple loops concurrently.

Permission system

Every tool invocation passes through a permission check that prompts the user or auto-resolves. This layer sits between "model chooses" and "tool executes" — a critical safety boundary not mentioned in the original post.

Plan mode

EnterPlanModeTool and ExitPlanModeTool allow the agent to switch between planning and execution modes. The control pattern has more structure than a single undifferentiated loop.

Skills and plugins

Reusable workflows in skills/ and third-party plugins in plugins/ extend the agent beyond its built-in tools. Users can add custom skills — making the "generalist" claim even stronger in practice.

These omissions don't invalidate the thesis. They reveal that the "simple" loop is the conceptual core of a much larger system — and that the engineering required to make that loop reliable, safe, and extensible is itself a substantial achievement.

Takeaway

Claude-style generalist agents work because they combine reasoning, action, and feedback in one loop.

The model chooses the next useful step. It does not need a full plan from the start. Tools make the step real — the agent can read, search, run, edit, and verify through ~40 discrete tool implementations. Context makes the step grounded — tool descriptions, project instructions, persistent memory, and previous outputs shape every decision. Feedback makes the loop robust — even imperfect choices get corrected when the agent sees real results.

What looks like mysterious general intelligence is often something more concrete: a system that keeps asking, very effectively, "what is the best next action now?" — backed by 512K lines of engineering to make that question answerable.

Epilogue

Once you see the pattern this way, many apparently sophisticated behaviors become easier to understand. The agent does not need a separate "debugging brain," "refactoring brain," and "documentation brain." It needs a strong enough model to choose sensible local actions, good enough tools to make those actions real, and enough feedback to keep improving its trajectory.

That is a much simpler recipe than it first appears. It also helps explain why this style of agent has become so influential — not because it solves everything in one shot, but because it usually does the next thing well enough to keep moving forward.

Why Claude-Style Generalist Agents Work So Well

Introduction

A useful mental model

Part I — The core loop

Part II — Why the next-step decomposition works

Part III — What shapes the next action?

1. The user request

2. Tool descriptions

3. Project context and memory

4. The result of the previous action

Part IV — The real tool inventory

Part V — Why generalist agents work across tasks

Part VI — The gap between chatbot and agent

Part VII — What the blog got right, and what it missed

Takeaway

Epilogue

Comments