Introduction #
Claude Code is Anthropic’s terminal-based AI coding agent. On the surface it looks like a CLI that “just talks to Claude,” but under the hood it is a stateful software layer that sits between a stateless language model and your local development environment. The model provides the reasoning; the harness provides the hands, eyes, and workspace.
In March 2026 the complete source was accidentally exposed via a 59.8 MB JavaScript source map bundled in version 2.1.88 of the @anthropic-ai/claude-code npm package (a known Bun bundler bug shipped the .map files into production). The leak gave the community an unprecedented look at roughly 512,000 lines of unobfuscated TypeScript and what they found was striking: approximately 98% of the codebase is infrastructure, not AI “decision scaffolding.” Claude Code is, at its core, a distributed-systems-style runtime for a single LLM.
This post walks through the architectural pillars that make it work and for each layer, we’ll build a simplified implementation in TypeScript to make the mechanics concrete. We start by understanding what the harness is, model the LLM call itself as a plain function, and then build every other layer around it.
The harness #
At a high level, the harness is everything that turns a reasoning model into a full fledged coding agent :
- It receives the user prompt.
- It decides what context to load (files, prior messages, session state).
- It routes to tools and commands, enforces permissions and aggregation results.
- It manages the agent loop over mutliple runs along with compaction.
Core stages in the harness #
Broadly, we can divide the harness into 3 main stages per interaction:
- Bootstrap
- Discover environment (workspace root, repo status, OS etc)
- Load configuration (permissions, default tools, MCP servers/skills, feature flags)
- Prefetch things like keychain, project scans etc.
- Query Engine (Agent loop)
- Build the model input: system prompt + CLAUDE.md + memory + conversation history + current task context (eg: file contents, user prompt, diffs).
- Let the model propose actions (tool calls, subagent spawns, plans).
- Route tool calls through a permission pipeline (allow/ask/deny) and execute the ones that pass.
- Update session memory and compact context as it approaches the token limit.
- Decide whether to continue another turn or stop, based on model output.
- Response rendering and persistence
- Stream markdown/text back to the UI
- Persist the session transcript locally so you can resume, rewind or diff across sessions.
- Update project-level memory (CLAUDE.md, trace.md etc.) based on what the model learned.
The model call as a function #
Throughout this post, the LLM is treated as a pure function: given a list of messages, it returns a response containing either text or tool-use requests. Everything else i.e. state, safety, context is the harness’s responsibility.
// The model is a pure function: messages in, response out.
// It has no memory, no side effects, no access to the filesystem.
interface Message {
role: "system" | "user" | "assistant" | "tool_result";
content: string | ToolUseBlock[];
}
interface TextBlock {
type: "text";
content: string;
}
interface ToolUseBlock {
type: "tool_use";
id: string; // unique call ID, e.g. "toolu_01A..."
name: string; // tool name, e.g. "Bash", "ReadFile"
input: Record<string, unknown>; // JSON payload matching the tool's Zod schema
}
interface ModelResponse {
content: (TextBlock | ToolUseBlock)[];
stopReason: "end_turn" | "tool_use";
}
// This is the only interface to the LLM.
// Every other component in this post wraps or feeds this function.
async function callModel(messages: Message[]): Promise<ModelResponse> {
// In production: HTTP POST to https://api.anthropic.com/v1/messages
// with model, max_tokens, tools, and the messages array.
// For this post, treat it as an opaque async function.
return await anthropicAPI.createMessage({ messages });
}1. The Agentic Loop #
Every agentic system needs a control loop. Claude Code’s control loop is deceptively simple — a single-threaded while loop:
Plan → Act → Observe → RepeatMore concretely:
- Reason (Model) — The harness sends the current prompt plus the full conversation history to the Claude API. Claude evaluates the task and responds with either a text answer (task complete) or a structured
tool_userequest (e.g., “read this file,” “run this shell command”). - Execute (Tool System) — The harness receives the
tool_useblock, parses it, validates permissions, and executes the corresponding local tool. - Observe (Tool Result) — The output is wrapped in a
tool_resultmessage and appended to the conversation history. - Repeat — The updated history is sent back to the model for the next iteration. The loop terminates when the model returns a plain text response with no further tool calls.
This is the same ReAct (Reason + Act) pattern found in most agentic frameworks, but Claude Code’s implementation is notable for what it doesn’t do: there is no explicit planning module, no chain-of-thought tree, no separate “critic” model. The loop trusts a single LLM to both plan and execute, and compensates with strong guardrails in the surrounding infrastructure.
Implementation #
class AgentLoop {
private messages: Message[] = [];
private toolRegistry: ToolRegistry;
private permissionSystem: PermissionSystem;
private hookRunner: HookRunner;
private contextManager: ContextManager;
constructor(
toolRegistry: ToolRegistry,
permissionSystem: PermissionSystem,
hookRunner: HookRunner,
contextManager: ContextManager,
) {
this.toolRegistry = toolRegistry;
this.permissionSystem = permissionSystem;
this.hookRunner = hookRunner;
this.contextManager = contextManager;
}
async run(userPrompt: string): Promise<string> {
// Inject persistent context (CLAUDE.md, memory files) before anything else
this.messages = this.contextManager.buildInitialContext();
this.messages.push({ role: "user", content: userPrompt });
// The core loop: call model, execute tools, repeat
while (true) {
// Compact context if approaching the token limit
this.messages = this.contextManager.compactIfNeeded(this.messages);
// 1. REASON: call the model
const response = await callModel(this.messages);
// Append the assistant's response to history
this.messages.push({ role: "assistant", content: response.content });
// 2. CHECK: did the model finish? (no tool calls)
if (response.stopReason === "end_turn") {
const textBlocks = response.content.filter(
(b): b is TextBlock => b.type === "text",
);
return textBlocks.map((b) => b.content).join("\n");
}
// 3. ACT: execute each tool call
const toolCalls = response.content.filter(
(b): b is ToolUseBlock => b.type === "tool_use",
);
for (const toolCall of toolCalls) {
const result = await this.executeTool(toolCall);
// 4. OBSERVE: feed the result back as a tool_result message
this.messages.push({
role: "tool_result",
content: JSON.stringify({
tool_use_id: toolCall.id,
content: result,
}),
});
}
// Loop continues — the model sees the tool results on the next iteration
}
}
private async executeTool(toolCall: ToolUseBlock): Promise<string> {
const tool = this.toolRegistry.get(toolCall.name);
// Schema validation (Zod)
const parsed = tool.inputSchema.safeParse(toolCall.input);
if (!parsed.success) {
return `Error: Invalid input — ${parsed.error.message}`;
}
// Permission check
const allowed = await this.permissionSystem.evaluate(toolCall, tool);
if (!allowed) {
return `Error: Permission denied for tool "${toolCall.name}"`;
}
// Pre-tool hooks
const hookResult = await this.hookRunner.runPreToolUse(toolCall);
if (hookResult.blocked) {
return `Blocked by hook: ${hookResult.reason}`;
}
// Execute
const output = await tool.execute(parsed.data);
// Post-tool hooks
await this.hookRunner.runPostToolUse(toolCall, output);
return output;
}
}Why single-threaded? #
A single-threaded loop keeps the execution model predictable. Each tool call is a synchronous, blocking operation from the loop’s perspective. There is no concurrent mutation of the conversation state, no race condition between tool executions. This dramatically simplifies debugging and makes the harness’s behavior reproducible — critical properties when the agent has write access to your filesystem and shell.
The trade-off is throughput: a single loop cannot parallelize independent tasks. Claude Code addresses this through sub-agents, which spawn isolated child loops for parallel work.
2. The Tool Layer #
The model never directly interacts with your filesystem, terminal, or network. Every action goes through the harness’s tool registry — a set of ~40 self-contained modules.
Tool anatomy #
Each tool is implemented as a discrete module that defines three things:
- Input schema — Validated at runtime using Zod. The model must produce a JSON payload that conforms to the schema or the call is rejected before execution.
- Permission requirements — Declarative metadata specifying what safety gates the tool must pass (read-only, write, destructive, network, etc.).
- Execution logic — The actual implementation: file I/O, bash execution, git operations, search, etc.
This design enforces a strict separation of concerns: the model is responsible for reasoning (what to do), and the harness is responsible for execution (how to do it safely). The model has no direct access to fs, child_process, or any system API — it can only express intent through structured tool calls.
Implementation #
import { z, ZodSchema } from "zod";
import * as fs from "fs/promises";
import { execSync } from "child_process";
// Tool permission categories
type PermissionCategory =
| "read"
| "write"
| "destructive"
| "network"
| "spawn";
// Every tool implements this interface
interface Tool {
name: string;
description: string;
permissionCategory: PermissionCategory;
inputSchema: ZodSchema;
execute(input: unknown): Promise<string>;
}
// The registry: a simple map of tool name → tool implementation
class ToolRegistry {
private tools = new Map<string, Tool>();
register(tool: Tool): void {
this.tools.set(tool.name, tool);
}
get(name: string): Tool {
const tool = this.tools.get(name);
if (!tool) throw new Error(`Unknown tool: ${name}`);
return tool;
}
// Returns tool definitions in the format the Anthropic API expects
toAPIFormat(): object[] {
return Array.from(this.tools.values()).map((tool) => ({
name: tool.name,
description: tool.description,
input_schema: tool.inputSchema,
}));
}
}// --- Example tool: ReadFile ---
const ReadFileTool: Tool = {
name: "ReadFile",
description: "Read the contents of a file at the given absolute path.",
permissionCategory: "read",
inputSchema: z.object({
path: z.string().describe("Absolute path to the file"),
startLine: z.number().optional().describe("1-indexed start line"),
endLine: z.number().optional().describe("1-indexed end line"),
}),
async execute(input) {
const { path, startLine, endLine } = input as {
path: string;
startLine?: number;
endLine?: number;
};
const content = await fs.readFile(path, "utf-8");
const lines = content.split("\n");
const start = (startLine ?? 1) - 1;
const end = endLine ?? lines.length;
return lines.slice(start, end).join("\n");
},
};// --- Example tool: Bash ---
const BashTool: Tool = {
name: "Bash",
description: "Execute a shell command and return stdout/stderr.",
permissionCategory: "destructive", // shell access is always high-risk
inputSchema: z.object({
command: z.string().describe("The shell command to execute"),
timeout: z.number().optional().default(30000).describe("Timeout in ms"),
}),
async execute(input) {
const { command, timeout } = input as { command: string; timeout: number };
try {
const stdout = execSync(command, {
timeout,
encoding: "utf-8",
maxBuffer: 1024 * 1024, // 1 MB cap
});
return stdout;
} catch (err: any) {
return `Exit code ${err.status}\nstderr: ${err.stderr}\nstdout: ${err.stdout}`;
}
},
};// --- Example tool: WriteFile (targeted edit, not full overwrite) ---
const EditFileTool: Tool = {
name: "EditFile",
description: "Replace a target string in a file with new content.",
permissionCategory: "write",
inputSchema: z.object({
path: z.string(),
targetContent: z.string().describe("Exact string to find and replace"),
replacementContent: z.string().describe("Content to replace it with"),
}),
async execute(input) {
const { path, targetContent, replacementContent } = input as {
path: string;
targetContent: string;
replacementContent: string;
};
const content = await fs.readFile(path, "utf-8");
if (!content.includes(targetContent)) {
return `Error: target content not found in ${path}`;
}
const updated = content.replace(targetContent, replacementContent);
await fs.writeFile(path, updated);
return `Successfully edited ${path}`;
},
};// --- Registering tools ---
const registry = new ToolRegistry();
registry.register(ReadFileTool);
registry.register(BashTool);
registry.register(EditFileTool);Built-in tools #
The leaked source revealed roughly 40 built-in tools. Some noteworthy categories:
| Category | Example tools | Notes |
|---|---|---|
| File I/O | Read, Write, Edit, MultiEdit |
The Edit tool uses targeted string replacement, not full-file overwrites — a deliberate choice to minimize blast radius. |
| Shell | Bash |
Executes commands in a sandboxed shell. Output is captured and returned as tool_result. |
| Search | Grep, Find, CodebaseSearch |
Various search strategies for navigating large codebases. |
| Git | GitDiff, GitLog, GitStatus |
First-class git operations without requiring shell exec. |
| Browser | BrowserNavigate, BrowserClick |
For agents that need to interact with web UIs. |
| Sub-agent | SpawnAgent |
Launches a child agent loop with its own isolated context. |
Why Zod? #
Schema validation at the tool boundary catches malformed requests before they reach the execution layer. If the model hallucinates a parameter name, passes the wrong type, or omits a required field, the Zod validator rejects it immediately and the error is fed back to the model — giving it a chance to self-correct in the next iteration. This is far cheaper and safer than executing an invalid command and dealing with the consequences.
3. The Permission System #
An agent with shell and filesystem access is a powerful thing and a dangerous one. Claude Code implements a deny-first, layered permission system to manage this risk.
Permission modes #
The harness supports multiple permission modes along a safety-autonomy gradient:
| Mode | Behavior |
|---|---|
| Default | Every potentially destructive action (file writes, shell commands) requires explicit user approval via an interactive prompt. This is the most restrictive mode. |
| Plan | The model can read and search freely, but must present a plan for approval before executing any mutations. |
| Auto-accept | Pre-approved tool categories (e.g., file reads, searches) execute without prompts; writes and shell commands still require approval. |
| Auto | Most actions execute without prompts. Only high-risk operations (e.g., rm -rf, network requests to unknown hosts) trigger safety gates. |
Implementation #
import * as readline from "readline";
type PermissionMode = "default" | "plan" | "auto-accept" | "auto";
interface PermissionPolicy {
denyPatterns: RegExp[]; // e.g. [/rm\s+-rf/, /curl.*\|.*sh/]
allowedPaths: string[]; // e.g. ["/Users/dev/project"]
}
class PermissionSystem {
private mode: PermissionMode;
private policy: PermissionPolicy;
constructor(mode: PermissionMode, policy: PermissionPolicy) {
this.mode = mode;
this.policy = policy;
}
async evaluate(toolCall: ToolUseBlock, tool: Tool): Promise<boolean> {
// Gate 1: Policy deny-list (always checked, regardless of mode)
if (this.isDeniedByPolicy(toolCall)) {
console.log(`🚫 Policy denied: ${toolCall.name}`);
return false;
}
// Gate 2: Mode-based evaluation
switch (this.mode) {
case "auto":
// Auto mode: allow everything that passes the policy
return true;
case "auto-accept":
// Auto-accept: reads are fine, writes need approval
if (tool.permissionCategory === "read") return true;
return await this.promptUser(toolCall);
case "plan":
// Plan mode: reads are fine, any mutation needs approval
if (tool.permissionCategory === "read") return true;
return await this.promptUser(toolCall);
case "default":
default:
// Default: everything except reads needs approval
if (tool.permissionCategory === "read") return true;
return await this.promptUser(toolCall);
}
}
private isDeniedByPolicy(toolCall: ToolUseBlock): boolean {
const inputStr = JSON.stringify(toolCall.input);
// Check deny-list patterns (e.g., rm -rf, curl piped to sh)
for (const pattern of this.policy.denyPatterns) {
if (pattern.test(inputStr)) return true;
}
// Check path restrictions
if ("path" in (toolCall.input as any)) {
const path = (toolCall.input as any).path as string;
const inAllowedPath = this.policy.allowedPaths.some((p) =>
path.startsWith(p),
);
if (!inAllowedPath) return true;
}
return false;
}
private async promptUser(toolCall: ToolUseBlock): Promise<boolean> {
const rl = readline.createInterface({
input: process.stdin,
output: process.stdout,
});
return new Promise((resolve) => {
const preview = JSON.stringify(toolCall.input).slice(0, 200);
rl.question(
`\n⚠️ Tool: ${toolCall.name}\n Input: ${preview}\n Allow? (y/n): `,
(answer) => {
rl.close();
resolve(answer.toLowerCase() === "y");
},
);
});
}
}How a tool call is evaluated #
Every tool_use request passes through a permission classifier before reaching the execution layer:
- Schema validation — Is the request well-formed? (Zod layer)
- Mode check — Does the current permission mode allow this tool category?
- Policy evaluation — Does the tool call match any deny-list patterns? (e.g., certain shell commands, paths outside the workspace)
- Hook evaluation — Do any registered
PreToolUsehooks block the call? - User prompt — If all gates pass but the mode requires confirmation, the user is prompted.
Only after all five gates pass does the tool execute.
The safety-autonomy trade-off #
Research on real-world usage patterns shows that users tend to shift toward more autonomous modes as they habituate to the tool — a “safety-autonomy gradient.” The system defaults to conservative, human-in-the-loop approval precisely because of this tendency: the most dangerous moment is when a user trusts the agent just enough to stop reading the prompts.
4. Hooks — Middleware for Agents #
Hooks are Claude Code’s mechanism for deterministic, user-defined control at lifecycle boundaries. They are conceptually identical to middleware in a web framework: shell commands that intercept events, inspect payloads, and can block or modify execution.
Hook lifecycle events #
| Hook | When it fires | Common use |
|---|---|---|
PreToolUse |
Before a tool call is executed | Block dangerous commands, enforce coding standards, log tool usage |
PostToolUse |
After a tool call completes | Validate outputs, trigger follow-up actions, audit trails |
Notification |
When the agent produces a notification | Route alerts to Slack, email, or other channels |
Stop |
When the agent signals task completion | Run post-task validation, trigger CI/CD pipelines |
Implementation #
import { execSync } from "child_process";
type HookEvent = "PreToolUse" | "PostToolUse" | "Notification" | "Stop";
interface HookDefinition {
event: HookEvent;
command: string; // shell command to execute
}
interface HookResult {
blocked: boolean;
reason?: string;
}
class HookRunner {
private hooks: HookDefinition[] = [];
register(hook: HookDefinition): void {
this.hooks.push(hook);
}
async runPreToolUse(toolCall: ToolUseBlock): Promise<HookResult> {
const relevantHooks = this.hooks.filter((h) => h.event === "PreToolUse");
for (const hook of relevantHooks) {
try {
// Pass tool call info as environment variables
execSync(hook.command, {
encoding: "utf-8",
env: {
...process.env,
TOOL_NAME: toolCall.name,
TOOL_INPUT: JSON.stringify(toolCall.input),
TOOL_ID: toolCall.id,
},
timeout: 5000,
});
// Exit code 0 → allowed
} catch (err: any) {
// Non-zero exit code → blocked
return {
blocked: true,
reason: err.stdout?.trim() || `Hook blocked: ${hook.command}`,
};
}
}
return { blocked: false };
}
async runPostToolUse(
toolCall: ToolUseBlock,
toolOutput: string,
): Promise<void> {
const relevantHooks = this.hooks.filter((h) => h.event === "PostToolUse");
for (const hook of relevantHooks) {
try {
execSync(hook.command, {
encoding: "utf-8",
env: {
...process.env,
TOOL_NAME: toolCall.name,
TOOL_INPUT: JSON.stringify(toolCall.input),
TOOL_OUTPUT: toolOutput,
},
timeout: 5000,
});
} catch {
// PostToolUse hooks are advisory — failures are logged, not fatal
console.warn(`PostToolUse hook failed: ${hook.command}`);
}
}
}
}// --- Example: registering hooks ---
const hookRunner = new HookRunner();
// Block modifications to lock files
hookRunner.register({
event: "PreToolUse",
command: `bash -c '
if echo "$TOOL_INPUT" | grep -q "package-lock.json\\|yarn.lock"; then
echo "BLOCKED: Lock file modifications are not allowed."
exit 1
fi
exit 0
'`,
});
// Log every tool execution to a file
hookRunner.register({
event: "PostToolUse",
command: `bash -c '
echo "[$(date)] $TOOL_NAME: $TOOL_INPUT" >> /tmp/agent-audit.log
'`,
});Why deterministic hooks matter #
The key insight is that hooks run outside the LLM’s non-deterministic reasoning. A hook that blocks rm -rf / will always block it, regardless of what the model believes is appropriate. This provides a hard safety boundary that cannot be prompt-injected or reasoned around.
Because hooks are just shell scripts, they can integrate with any existing tooling: linters, security scanners, policy engines, notification systems.
5. Context Management #
The most complex subsystem in the harness is context management — the machinery that maintains the illusion of a continuous, aware assistant on top of a fundamentally stateless model.
The problem #
Each API call to Claude is independent. The model has no memory between calls. The harness must:
- Reconstruct the full conversational context on every call.
- Keep that context within the model’s token limit (the context window).
- Ensure critical information (project conventions, security constraints) is never lost.
As sessions grow longer — accumulating file contents, tool outputs, back-and-forth dialogue — the context window fills up. Naive truncation loses critical information. Claude Code solves this with a multi-layer compaction pipeline.
Implementation #
import * as fs from "fs/promises";
import * as path from "path";
interface CompactionResult {
messages: Message[];
compacted: boolean;
}
class ContextManager {
private maxTokens: number;
private projectRoot: string;
constructor(projectRoot: string, maxTokens: number = 200_000) {
this.projectRoot = projectRoot;
this.maxTokens = maxTokens;
}
// Load persistent, compaction-proof context
buildInitialContext(): Message[] {
const messages: Message[] = [];
// 1. System-level: CLAUDE.md is always first (never compacted)
const claudeMd = this.loadClaudeMd();
if (claudeMd) {
messages.push({ role: "system", content: claudeMd });
}
// 2. Memory files from ~/.claude/MEMORY.md
const memory = this.loadMemoryFiles();
if (memory) {
messages.push({ role: "system", content: memory });
}
return messages;
}
// The multi-layer compaction pipeline
compactIfNeeded(messages: Message[]): Message[] {
const usage = this.estimateTokenUsage(messages);
const ratio = usage / this.maxTokens;
// Stage 1: Snip compact at 80% — evict cold messages from the middle
if (ratio > 0.8) {
messages = this.snipCompact(messages);
}
// Stage 2: Microcompact at 85% — shrink content, preserve cache keys
if (ratio > 0.85) {
messages = this.microcompact(messages);
}
// Stage 3: Auto compact at 95% — LLM-based summarization
if (ratio > 0.95) {
messages = this.autoCompact(messages);
}
return messages;
}
// Stage 1: Remove old tool results from the middle of conversation
private snipCompact(messages: Message[]): Message[] {
const keep = 5; // keep first N and last N messages
if (messages.length <= keep * 2) return messages;
const head = messages.slice(0, keep);
const tail = messages.slice(-keep);
const middle = messages.slice(keep, -keep);
// Only remove tool_result messages from the middle (they're bulky)
const filtered = middle.filter((m) => m.role !== "tool_result");
return [...head, ...filtered, ...tail];
}
// Stage 2: Truncate long tool outputs while keeping cache-friendly prefix
private microcompact(messages: Message[]): Message[] {
return messages.map((msg) => {
if (msg.role === "tool_result" && typeof msg.content === "string") {
if (msg.content.length > 2000) {
return {
...msg,
content:
msg.content.slice(0, 1000) +
"\n... [truncated] ...\n" +
msg.content.slice(-500),
};
}
}
return msg;
});
}
// Stage 3: Summarize the conversation using the LLM itself
private autoCompact(messages: Message[]): Message[] {
const systemMessages = messages.filter((m) => m.role === "system");
const conversationMessages = messages.filter((m) => m.role !== "system");
// Ask the model to summarize the conversation so far
// (In production this is a separate, cheaper model call)
const summary = this.summarizeSync(conversationMessages);
return [
...systemMessages,
{
role: "assistant" as const,
content: `[SystemCompactBoundaryMessage] Summary of previous work:\n${summary}`,
},
];
}
// Stage 4: Reactive compact — called when API returns prompt_too_long
reactiveCompact(messages: Message[]): Message[] {
// Emergency: aggressively summarize and retry
const systemMessages = messages.filter((m) => m.role === "system");
const summary = this.summarizeSync(
messages.filter((m) => m.role !== "system"),
);
return [
...systemMessages,
{
role: "assistant" as const,
content: `[ReactiveCompact] ${summary}`,
},
];
}
private loadClaudeMd(): string | null {
try {
const filePath = path.join(this.projectRoot, "CLAUDE.md");
// fs.readFileSync used here for simplicity in the initializer
return require("fs").readFileSync(filePath, "utf-8");
} catch {
return null;
}
}
private loadMemoryFiles(): string | null {
try {
const memoryPath = path.join(
process.env.HOME || "~",
".claude",
"MEMORY.md",
);
return require("fs").readFileSync(memoryPath, "utf-8");
} catch {
return null;
}
}
private estimateTokenUsage(messages: Message[]): number {
// Rough estimate: 1 token ≈ 4 characters
const totalChars = messages.reduce((sum, m) => {
const content =
typeof m.content === "string" ? m.content : JSON.stringify(m.content);
return sum + content.length;
}, 0);
return Math.ceil(totalChars / 4);
}
private summarizeSync(messages: Message[]): string {
// In production, this calls the model with a summarization prompt.
// Simplified here for illustration.
const totalMessages = messages.length;
const toolCalls = messages.filter(
(m) => typeof m.content !== "string",
).length;
return `Completed ${totalMessages} interaction steps including ${toolCalls} tool calls.`;
}
}The compaction pipeline #
The pipeline consists of five stages, each more aggressive than the last:
Stage 1: Snip Compact #
Removes older assistant and tool messages from the middle of the conversation that are deemed unlikely to be needed. Think of it as evicting cold cache lines — recent and very early messages are preserved, while the middle (often repetitive exploration) is trimmed.
Stage 2: Microcompact #
Shrinks content while preserving Anthropic API prompt cache keys. This is a cost and latency optimization: by keeping the cache-friendly prefix of the conversation intact, the harness avoids re-processing tokens the API has already seen.
Stage 3: Auto Compact #
When the context reaches ~95% of the window limit, the harness triggers an LLM-based summarization. The raw conversation history is replaced with a concise summary, marked by a SystemCompactBoundaryMessage. This is the most visible form of compaction — the agent “forgets” the raw details but retains a high-level understanding of what happened.
Stage 4: Reactive Compact #
An emergency mechanism. If the API returns a prompt_too_long error, the harness compacts context mid-request and retries automatically. This ensures the agent never hard-fails due to context overflow.
Stage 5: Context Collapse #
For very long tool chains, the harness collapses entire sequences of tool calls and results into compact representations that retain only key outcomes. A 20-step file exploration might collapse to: “Explored src/ directory; identified index.ts as entry point; found 3 test files.”
Persistent context: CLAUDE.md #
Because compaction can (and will) discard information, Claude Code uses CLAUDE.md as a persistent, compaction-proof instruction layer. This markdown file is placed in the project root and is automatically loaded into the context at the start of every session. It typically contains:
- Project conventions — Naming, styling, testing, and deployment guidelines.
- Architecture notes — Core files, libraries, and project structure.
- Workflow rules — Behaviors, constraints, and common commands.
- Compact instructions — Explicit guidance on what information must survive compaction.
CLAUDE.md is treated as a system-level instruction — it is injected before the conversation history and is never subject to compaction. This is the primary mechanism for ensuring the model “remembers” project-critical information across long sessions.
Session persistence #
Beyond CLAUDE.md, the harness maintains session state in ~/.claude/:
MEMORY.md— An index file pointing to topic-specific markdown files that are loaded automatically.- Session history — Each session’s message log, tool usage, and results are persisted as JSONL files, enabling
claude --resume <session-id>to pick up where you left off.