Why Claude-Style Generalist Agents Work So Well
Claude-style agents work because they repeatedly choose the next useful action, execute it through tools, observe the result, and continue. This post explains why.
Introduction
Why do Claude-style generalist agents feel so effective in practice? At first glance, it is tempting to imagine a large hand-written controller somewhere inside the system: first inspect the code, then run tests, then edit the file, then verify, then summarize. But the real picture is both simpler and more interesting.
A Claude-style agent works by combining a strong language model with a tool loop, project context, and runtime structure. The model is not handed a fixed script for each task. Instead, it repeatedly answers a more local question:
That simple pattern turns out to be surprisingly powerful. This post explains the mechanism concretely — and cross-references every claim against the actual Claude Code source architecture, publicly leaked on March 31, 2026 via a source map file in Anthropic's npm registry.
src/ directory: ~1,900 TypeScript files, 512K+ lines, built on Bun + React/Ink. Key files: QueryEngine.ts (~46K lines), Tool.ts (~29K lines), context.ts, and the src/tools/ directory with ~40 tool implementations.
This post focuses on four ideas:
A useful mental model
This is the core of the agent. The model usually does not write code to implement file reading or shell execution. Instead, it emits a structured request — read this file, search for that pattern, run this command, edit that function — and the runtime executes it. The result returns, and the model continues from the updated state.
The agent is best understood not as a giant fixed workflow, but as a repeated decision process over an evolving workspace.
QueryEngine.ts handles this loop explicitly: streaming API responses, detecting tool-call requests, dispatching to the right tool, collecting results, and feeding them back as context for the next inference pass.
Part I — The core loop
Suppose the user says: Fix the bug in auth.py. A common misconception is that the system has a hard-coded bug-fixing pipeline hidden inside it. In reality, the behavior is a repeated decision: see the context, choose the best next action, execute, observe, repeat. Here is a plausible trajectory — mapped to real tool implementations, with uncertainty dropping at each step:
pyproject.toml, CI config, or persistent memory.Notice what is happening. The agent is not solving the whole task in one shot. It is repeatedly asking: What should I do next, given everything I know right now? Each step reduces uncertainty. That decomposition is one of the main reasons these systems work well.
Part II — Why the next-step decomposition works
Many real tasks are too large, too uncertain, or too stateful to solve in one forward pass. If the user says "fix the bug in auth.py," the model may not yet know what the bug is, whether tests exist, which test command the repository uses, or whether the file has hidden dependencies. A one-shot answer would force the model to guess too much.
The agent loop avoids that. Software tasks have a natural causal structure that maps cleanly to discrete tool invocations:
Fig. 1 — The causal structure of a software task maps cleanly to tool invocations. Each step reduces a different kind of uncertainty.
Part III — What shapes the next action?
Why does the agent decide to read a file before editing it? Why does it run tests? The next action is shaped by four sources of information simultaneously.
1. The user request
The prompt gives the agent a goal. If the user says "fix the bug," the model has learned that editing without inspection is risky. Reading first is the safer move.
2. Tool descriptions
The model knows what tools exist and what each does. Tool descriptions are part of the reasoning environment — they help map an abstract intention like "inspect the code" into a concrete action like "invoke FileReadTool on auth.py."
tools.ts serves as the central tool registry, and Tool.ts (~29K lines) defines base types, input schemas, permission models, and progress state types for all tools.
3. Project context and memory
If the project includes instructions — build commands, test commands, conventions, or warnings — the model uses them. Context narrows the space of plausible actions from "everything" to "the sensible things for this codebase."
context.ts collects system and user context. The memdir/ system provides persistent memory across sessions. The service layer includes extractMemories/ for automatic memory extraction and teamMemorySync/ for team-level memory synchronization.
4. The result of the previous action
This is what makes the process adaptive. If the agent runs pytest and the output says the repository uses tox, that failure is new information. The next action can be better than the previous one. The agent does not need to know everything at the start — it only needs a good next move and the ability to update.
Part IV — The real tool inventory
The basic description — read, search, run, edit — captures the essence but understates the full picture. The leaked source reveals roughly 40 discrete tools:
| Tool | What it does |
|---|---|
| BashTool | Shell command execution with permission checks |
| FileReadTool | Reads files — images, PDFs, notebooks, plain text |
| FileWriteTool | Creates or overwrites files |
| FileEditTool | Partial modification via string replacement |
| GrepTool | Content search via ripgrep |
| GlobTool | File pattern matching |
| WebFetchTool | Fetches URL content |
| WebSearchTool | Web search |
| AgentTool | Spawns sub-agents for parallel work |
| MCPTool | Model Context Protocol server invocation |
| LSPTool | Language Server Protocol integration |
| NotebookEditTool | Jupyter notebook editing |
| SkillTool | Reusable workflow execution |
| EnterPlanModeTool | Switches to planning mode |
| EnterWorktreeTool | Git worktree isolation |
| TeamCreateTool | Team-level parallel agent management |
Several tools — AgentTool, TeamCreateTool, EnterPlanModeTool — reveal capabilities beyond a simple reactive loop. The system can spawn sub-agents, coordinate team-level parallel work, and switch between planning and execution modes. The "next-step" model remains the core, but it operates within a richer infrastructure than the basic description suggests.
Part V — Why generalist agents work across tasks
A Claude-style agent is often called a generalist. That does not mean it knows a perfect workflow for every domain. It means it has one reusable control pattern — understand, choose, execute, observe, repeat — that transfers across tasks.
Bug fixing, refactoring, writing tests, updating documentation, searching the web for missing information, preparing a pull request summary — the tools change, the local decisions change, but the outer control loop stays nearly the same. The source code confirms this: the same QueryEngine.ts handles every task type.
Part VI — The gap between chatbot and agent
A plain chatbot without tools must answer from its internal text distribution alone. It can explain what it thinks might be wrong in auth.py, but it cannot inspect your repository, run your tests, or verify the result.
A tool-using agent is fundamentally different. The gap is not about abstract reasoning ability — it is about the ability to interact with the environment and correct course using real feedback.
FileEditTool, then verify the result.Part VII — What the blog got right, and what it missed
Cross-referencing the original post against the leaked source reveals a nuanced picture. The core thesis — that Claude Code operates as a next-step decision loop — is accurate and well-supported. The four sources of action-shaping information all have clear counterparts in the codebase. But the blog describes the runtime as having "a small amount of runtime structure." The reality is 1,900 files and 512K+ lines of TypeScript.
The blog also omits several significant architectural features:
AgentTool spawns sub-agents. The coordinator/ directory handles orchestration. TeamCreateTool enables parallel work across team agents. The system runs multiple loops concurrently.
EnterPlanModeTool and ExitPlanModeTool allow the agent to switch between planning and execution modes. The control pattern has more structure than a single undifferentiated loop.
skills/ and third-party plugins in plugins/ extend the agent beyond its built-in tools. Users can add custom skills — making the "generalist" claim even stronger in practice.
These omissions don't invalidate the thesis. They reveal that the "simple" loop is the conceptual core of a much larger system — and that the engineering required to make that loop reliable, safe, and extensible is itself a substantial achievement.
Takeaway
Claude-style generalist agents work because they combine reasoning, action, and feedback in one loop.
The model chooses the next useful step. It does not need a full plan from the start. Tools make the step real — the agent can read, search, run, edit, and verify through ~40 discrete tool implementations. Context makes the step grounded — tool descriptions, project instructions, persistent memory, and previous outputs shape every decision. Feedback makes the loop robust — even imperfect choices get corrected when the agent sees real results.
What looks like mysterious general intelligence is often something more concrete: a system that keeps asking, very effectively, "what is the best next action now?" — backed by 512K lines of engineering to make that question answerable.
Epilogue
Once you see the pattern this way, many apparently sophisticated behaviors become easier to understand. The agent does not need a separate "debugging brain," "refactoring brain," and "documentation brain." It needs a strong enough model to choose sensible local actions, good enough tools to make those actions real, and enough feedback to keep improving its trajectory.
That is a much simpler recipe than it first appears. It also helps explain why this style of agent has become so influential — not because it solves everything in one shot, but because it usually does the next thing well enough to keep moving forward.
Comments