Skip to main content

Dissecting the Claude Code Harness - Part 2: Extensibility & Scale

·4547 words·22 mins
Yash Sachdeva
Author
Yash Sachdeva
Software Engineer | Turning Complex Problems into Simple Solutions
Software engineer by trade, problem solver by nature. I write about the systems I build in my free time and the experiences that shape them.
Table of Contents
Dissecting the Claude Code Harness - This article is part of a series.
Part : This Article

1. Sub-Agents — Scaling Beyond a Single Loop
#

The single-threaded agentic loop is simple and predictable, but it cannot parallelize work. Claude Code addresses this with sub-agents — child agent instances that run their own isolated loops.

How sub-agents work
#

When the main agent encounters a task that benefits from parallelism (e.g., “run tests, check linting, and update docs”), it can spawn sub-agents via the SpawnAgent tool. Each sub-agent:

  • Has its own isolated context window — preventing “context collapse” in the parent session.
  • Receives a scoped task description — a focused instruction, not the full conversation history.
  • Has restricted tool permissions — sub-agents can be granted a subset of the parent’s tools.
  • Returns a structured result to the parent when complete.

Implementation
#

// The SpawnAgent tool — creates a child AgentLoop with isolated context

const SpawnAgentTool: Tool = {
  name: "SpawnAgent",
  description:
    "Spawn a sub-agent with its own isolated context to perform a focused task.",
  permissionCategory: "spawn",
  inputSchema: z.object({
    task: z.string().describe("The focused task description for the sub-agent"),
    allowedTools: z
      .array(z.string())
      .optional()
      .describe("Subset of tools the sub-agent can use"),
  }),

  async execute(input) {
    const { task, allowedTools } = input as {
      task: string;
      allowedTools?: string[];
    };

    // Create a scoped tool registry for the sub-agent
    const scopedRegistry = new ToolRegistry();
    const parentTools = registry; // reference to parent's registry

    // Only register allowed tools (or all if not specified)
    const toolNames = allowedTools ?? Array.from(parentTools.listNames());
    for (const name of toolNames) {
      if (name === "SpawnAgent") continue; // prevent recursive spawning
      try {
        scopedRegistry.register(parentTools.get(name));
      } catch {
        // Tool not found — skip
      }
    }

    // Sub-agent gets its own context manager (isolated context window)
    const childContextManager = new ContextManager(
      process.cwd(),
      100_000, // sub-agents get a smaller context budget
    );

    // Sub-agent gets full permission (parent already approved the spawn)
    const childPermissions = new PermissionSystem("auto", {
      denyPatterns: [],
      allowedPaths: [process.cwd()],
    });

    const childLoop = new AgentLoop(
      scopedRegistry,
      childPermissions,
      new HookRunner(), // sub-agents inherit hook config in production
      childContextManager,
    );

    // Run the sub-agent and return its result to the parent
    const result = await childLoop.run(task);
    return `[Sub-agent completed]\n${result}`;
  },
};
// --- Parallel sub-agent orchestration ---
// The parent agent does not call SpawnAgent in parallel itself —
// it issues multiple SpawnAgent tool_use blocks in a single response,
// and the harness executes them concurrently:

async function executeToolsConcurrently(
  toolCalls: ToolUseBlock[],
  executeTool: (tc: ToolUseBlock) => Promise<string>,
): Promise<Map<string, string>> {
  const results = new Map<string, string>();

  // Separate SpawnAgent calls (can run in parallel) from others (sequential)
  const spawnCalls = toolCalls.filter((tc) => tc.name === "SpawnAgent");
  const otherCalls = toolCalls.filter((tc) => tc.name !== "SpawnAgent");

  // Run spawn calls concurrently
  const spawnResults = await Promise.all(
    spawnCalls.map(async (tc) => ({
      id: tc.id,
      result: await executeTool(tc),
    })),
  );
  for (const { id, result } of spawnResults) {
    results.set(id, result);
  }

  // Run other calls sequentially (preserve ordering guarantees)
  for (const tc of otherCalls) {
    results.set(tc.id, await executeTool(tc));
  }

  return results;
}

This is architecturally similar to a worker pool in distributed systems: the parent acts as an orchestrator, the sub-agents are workers, and the tool interface is the communication protocol.

Why isolation matters
#

Without isolation, parallel tool execution would mutate the parent’s conversation history concurrently — creating race conditions and incoherent context. By giving each sub-agent its own context, the harness maintains the single-writer invariant that keeps the system predictable.

2. MCP — Model Context Protocol
#

Claude Code supports the Model Context Protocol (MCP), an open standard for connecting AI assistants to external tools and data sources. MCP acts as a universal adapter layer:

  • Tool servers — External services that expose tools (databases, APIs, monitoring systems) via a standardized protocol.
  • Resource providers — Services that provide context (documentation, codebase indices, knowledge bases).

Implementation
#

// MCP tools are registered into the same ToolRegistry as built-in tools.
// The harness treats them identically — same schema validation,
// same permission gates, same hook system.

interface MCPServerConfig {
  name: string;
  url: string; // e.g. "http://localhost:3001/mcp"
}

async function registerMCPTools(
  server: MCPServerConfig,
  registry: ToolRegistry,
): Promise<void> {
  // 1. Discover available tools from the MCP server
  const response = await fetch(`${server.url}/tools/list`, { method: "POST" });
  const { tools } = (await response.json()) as {
    tools: { name: string; description: string; inputSchema: object }[];
  };

  // 2. Register each MCP tool as a local tool with a remote executor
  for (const mcpTool of tools) {
    registry.register({
      name: `mcp_${server.name}_${mcpTool.name}`,
      description: `[MCP: ${server.name}] ${mcpTool.description}`,
      permissionCategory: "network", // all MCP tools go through network gates
      inputSchema: z.any(), // schema comes from the MCP server

      async execute(input) {
        const result = await fetch(`${server.url}/tools/call`, {
          method: "POST",
          headers: { "Content-Type": "application/json" },
          body: JSON.stringify({ name: mcpTool.name, arguments: input }),
        });
        const { content } = (await result.json()) as {
          content: { type: string; text: string }[];
        };
        return content.map((c) => c.text).join("\n");
      },
    });
  }
}

From the harness’s perspective, MCP tools are indistinguishable from built-in tools: they go through the same schema validation, permission gates, and hook system. This means organizations can extend Claude Code’s capabilities without modifying the harness itself — a critical property for enterprise adoption.

3. Skills — On-Demand Procedural Knowledge
#

MCP gives the agent new tools (the ability to do things). Skills give the agent new expertise (knowledge of how to do things). A skill is a self-contained directory — containing instructions, scripts, templates, and configuration — that the harness injects into the conversation on demand, teaching the model a specific workflow and giving it executable utilities to carry it out, without permanently consuming context tokens.

Skill directory structure
#

A skill is not just a single file; it’s a directory:

.claude/skills/
└── deploy/
    ├── SKILL.md              # Required — entry point (instructions + config)
    ├── scripts/
    │   ├── deploy.sh          # Helper script the skill references
    │   └── health-check.py    # Another utility
    ├── assets/
    │   └── deploy-config.yaml # Reference implementation
    └── references/
        └── topic1.md         # Additional documentation

The scripts/ directory is particularly important: skills can bundle executable helpers that the model runs via the Bash tool during skill execution. This makes skills more than just instructions — they’re portable workflow packages.

The progressive-disclosure pattern
#

Skills use a three-level loading strategy designed to conserve the context window:

Level What’s loaded When Context cost
Level 1: Metadata Skill name and description from YAML frontmatter Always — injected at session start Very low (~50 tokens per skill)
Level 2: Instructions Full SKILL.md body (the “playbook”) On demand — when the skill is triggered Moderate (hundreds to low thousands of tokens)
Level 3: Supporting files Scripts, examples, templates in the skill directory Lazy — only when the running skill reads them Variable

This is analogous to how an operating system loads shared libraries: metadata (the symbol table) is always available, but the actual code is only paged in when a symbol is referenced.

The SKILL.md file
#

The SKILL.md file has two parts: YAML frontmatter (configuration) and a markdown body (instructions).

---
name: deploy
description: Deploy the application to staging or production using our CI/CD pipeline
allowed-tools: [Bash, ReadFile, Grep] # restrict which tools this skill can use
disable-model-invocation: true # prevent autonomous triggering (require /deploy)
context: fork # run in an isolated sub-agent context
---

## Steps

1. Run `npm run build` and verify it exits cleanly.
2. Run the test suite with `npm test`. If any tests fail, stop and report.
3. Check the current branch — only `main` can deploy to production.
4. For staging: run the bundled deploy script:
   ```bash
   bash scripts/deploy.sh staging
   ```
  1. For production: run bash scripts/deploy.sh production, then verify the health check using the bundled script:
    python3 scripts/health-check.py https://api.example.com/health

Rules
#

  • Never deploy if there are uncommitted changes.
  • Always run tests before deploying, even if the user says to skip them.
  • After a production deploy, post a summary to #deployments on Slack.

Frontmatter configuration
#

Field Purpose
name Becomes the /slash-command and the identifier used by the UseSkill tool. Level 1 — always in context.
description The signal Claude uses to match user intent to this skill. Level 1 — always in context.
allowed-tools Restricts which tools the model can call while this skill is active. Omit to allow all tools.
disable-model-invocation When true, prevents Claude from triggering this skill autonomously — it can only be invoked manually via /deploy. Essential for workflows with side effects.
context Set to fork to run the skill in an isolated sub-agent context, preventing it from polluting the parent session’s history.

The markdown body is Level 2 — loaded only when the skill is triggered. Notice that the instructions freely reference bundled scripts (scripts/deploy.sh, scripts/health-check.py) and harness tools (Bash, ReadFile). The model uses these references to orchestrate tool calls during execution.

How skills are triggered
#

Skills can be activated in two ways:

  1. Autonomous discovery — The model reads the skill descriptions (Level 1) and decides, based on the user’s task, that a skill is relevant. It then invokes the skill to load Level 2 instructions. This requires no user action.
  2. Manual invocation — The user types a slash command (e.g., /deploy). This is preferred for workflows with side effects, where timing matters.

Personal vs project skills
#

Scope Location Use case
Personal ~/.claude/skills/ Preferences that follow you across projects — commit message style, preferred test frameworks, code review checklists
Project /skills/ (in the repo) Team workflows that travel with the codebase — deployment procedures, coding standards, architecture patterns

Project skills are version-controlled and shared automatically with anyone who clones the repository.

Implementation
#

import * as fs from "fs/promises";
import * as path from "path";
import * as yaml from "yaml";

interface SkillConfig {
  allowedTools?: string[]; // e.g. ["Bash", "ReadFile", "Grep"]
  disableModelInvocation?: boolean; // true = manual /slash-command only
  context?: "inline" | "fork"; // fork = run in isolated sub-agent
}

interface SkillMetadata {
  name: string;
  description: string;
  basePath: string; // directory containing SKILL.md
  config: SkillConfig;
}

interface LoadedSkill extends SkillMetadata {
  instructions: string; // the markdown body (Level 2)
  scripts: string[]; // relative paths to files in scripts/
}

class SkillRegistry {
  private skills = new Map<string, SkillMetadata>();

  // Called at startup — discovers all skills and loads Level 1 (metadata only)
  async discoverSkills(searchPaths: string[]): Promise<void> {
    for (const searchPath of searchPaths) {
      const entries = await fs.readdir(searchPath, { withFileTypes: true });

      for (const entry of entries) {
        if (!entry.isDirectory()) continue;

        const skillDir = path.join(searchPath, entry.name);
        const skillFile = path.join(skillDir, "SKILL.md");

        try {
          const raw = await fs.readFile(skillFile, "utf-8");
          const metadata = this.parseFrontmatter(raw);

          // Discover bundled scripts (Level 3)
          const scripts = await this.discoverScripts(skillDir);

          this.skills.set(metadata.name, {
            ...metadata,
            basePath: skillDir,
            scripts,
          });
        } catch {
          // No SKILL.md in this directory — skip
        }
      }
    }
  }

  // Scan the scripts/ directory for executable helpers
  private async discoverScripts(skillDir: string): Promise<string[]> {
    const scriptsDir = path.join(skillDir, "scripts");
    try {
      const entries = await fs.readdir(scriptsDir);
      return entries.map((e) => path.join("scripts", e));
    } catch {
      return []; // no scripts/ directory
    }
  }

  // Level 1: returns metadata for all skills (always in context)
  getMetadataSummary(): string {
    const lines = ["Available skills:"];
    for (const [name, skill] of this.skills) {
      lines.push(`  /${name}${skill.description}`);
    }
    return lines.join("\n");
  }

  // Level 2: loads the full instructions for a specific skill
  async loadSkill(name: string): Promise<LoadedSkill> {
    const metadata = this.skills.get(name);
    if (!metadata) throw new Error(`Unknown skill: ${name}`);

    const raw = await fs.readFile(
      path.join(metadata.basePath, "SKILL.md"),
      "utf-8",
    );
    const instructions = this.extractBody(raw);
    const scripts =
      metadata.scripts ?? (await this.discoverScripts(metadata.basePath));

    return { ...metadata, instructions, scripts };
  }

  // Level 3: read a supporting file from the skill's directory
  async loadSupportingFile(
    skillName: string,
    relativePath: string,
  ): Promise<string> {
    const metadata = this.skills.get(skillName);
    if (!metadata) throw new Error(`Unknown skill: ${skillName}`);

    const filePath = path.join(metadata.basePath, relativePath);
    return await fs.readFile(filePath, "utf-8");
  }

  private parseFrontmatter(raw: string): Omit<SkillMetadata, "scripts"> {
    const match = raw.match(/^---\n([\s\S]*?)\n---/);
    if (!match) throw new Error("No frontmatter found");

    const parsed = yaml.parse(match[1]) as {
      name: string;
      description: string;
      "allowed-tools"?: string[];
      "disable-model-invocation"?: boolean;
      context?: "inline" | "fork";
    };

    return {
      name: parsed.name,
      description: parsed.description,
      basePath: "", // filled in by caller
      config: {
        allowedTools: parsed["allowed-tools"],
        disableModelInvocation: parsed["disable-model-invocation"],
        context: parsed.context,
      },
    };
  }

  private extractBody(raw: string): string {
    return raw.replace(/^---[\s\S]*?---\n*/, "").trim();
  }
}
// --- The Skill tool: a meta-tool that loads instructions into context ---

const UseSkillTool: Tool = {
  name: "UseSkill",
  description:
    "Load a skill's instructions into the conversation to guide task execution.",
  permissionCategory: "read",
  inputSchema: z.object({
    skillName: z.string().describe("Name of the skill to load"),
  }),

  async execute(input) {
    const { skillName } = input as { skillName: string };

    try {
      const skill = await skillRegistry.loadSkill(skillName);

      // The skill's instructions + metadata are returned as a tool result,
      // which means they enter the conversation history and guide
      // the model's next steps.
      const sections = [
        `[Skill loaded: ${skill.name}]`,
        `Base path: ${skill.basePath}`,
      ];

      // Surface bundled scripts so the model knows what's available
      if (skill.scripts.length > 0) {
        sections.push(
          `\nBundled scripts (can be executed via Bash):`,
          ...skill.scripts.map((s) => `  - ${s}`),
        );
      }

      // Surface tool restrictions if configured
      if (skill.config.allowedTools) {
        sections.push(
          `\nAllowed tools: ${skill.config.allowedTools.join(", ")}`,
        );
      }

      sections.push("", skill.instructions);
      return sections.join("\n");
    } catch (err: any) {
      return `Error loading skill: ${err.message}`;
    }
  },
};
// --- Skill-aware context building ---
// During context initialization, skill metadata (Level 1) is injected
// alongside CLAUDE.md so the model knows what skills exist.

class SkillAwareContextManager extends ContextManager {
  private skillRegistry: SkillRegistry;

  constructor(
    projectRoot: string,
    maxTokens: number,
    skillRegistry: SkillRegistry,
  ) {
    super(projectRoot, maxTokens);
    this.skillRegistry = skillRegistry;
  }

  override buildInitialContext(): Message[] {
    const messages = super.buildInitialContext();

    // Inject skill metadata as a system message
    // This is Level 1 — just names and descriptions, very cheap
    const skillSummary = this.skillRegistry.getMetadataSummary();
    if (skillSummary) {
      messages.push({
        role: "system",
        content: `[Skills]\n${skillSummary}\n\nYou can use the UseSkill tool to load any skill when relevant.`,
      });
    }

    return messages;
  }
}

The key insight: skills are instructions + utilities, not services
#

Unlike MCP, skills do not run as separate processes. They are loaded into the conversation as instructions, and the model uses the harness’s existing tools to act on them. But skills are not “just markdown” either — they can bundle:

  • Executable scripts (scripts/) that the model calls via the Bash tool during execution.
  • Templates and examples (examples/, resources/) that the model reads for reference.
  • Tool restrictions (allowed-tools) that scope what the model can do while the skill is active.
  • Isolation config (context: fork) that runs the skill in a sub-agent to protect the parent session.

The result is a portable workflow package — instructions plus the utilities needed to carry them out — that requires no server, no daemon, and no deployment. A skill is just a directory you can git push.

4. Skills vs MCP — When to Use Which
#

Skills and MCP are complementary but serve fundamentally different purposes. The simplest mental model: MCP gives Claude new hands; Skills give Claude new expertise.

But can’t skills just call APIs?
#

Yes — and this is worth being precise about because the overlap is real.

A skill can bundle a scripts/jira-client.py that handles OAuth, manages tokens, retries on failure, and returns structured JSON. The model reads the skill’s instructions, which describe exactly how to call the script:

## Available scripts

- `python3 scripts/jira-client.py get-issue --key <ISSUE_KEY>` — returns issue JSON
- `python3 scripts/jira-client.py create-comment --key <ISSUE_KEY> --body <TEXT>` — posts a comment

The model is perfectly capable of reasoning about this interface from the instructions. It knows the flag names, the expected values, and the script’s capabilities — because the skill told it. For simple and moderate API usage, this works well and is often the better choice because it’s simpler to set up than an MCP server.

So when does MCP actually earn its complexity? Three situations:

1. Harness-level validation (catching errors before execution)

When the model calls a script via Bash, the harness sees one parameter: a command string. If the model hallucinates a flag name (--issue-key instead of --key), the error surfaces after the script runs and returns stderr. The model then has to parse the error, understand what went wrong, and retry — burning a full agentic loop iteration.

With MCP, the tool’s JSON Schema is registered with the harness. The Zod layer validates the input before the call reaches the server:

// MCP: harness catches this BEFORE execution
tool_use: { name: "mcp_jira_get_issue", input: { issueKey: 123 } }
// → Zod error: "issueKey must be a string" — returned instantly, no execution

// Skill script: error surfaces AFTER execution
tool_use: { name: "Bash", input: { command: "python3 scripts/jira-client.py get-issue --key" } }
// → Script runs, fails with "error: --key requires an argument", model parses stderr

This matters at scale. If the model makes ten tool calls per task, even a 5% error rate means a wasted iteration every other task. Pre-execution validation eliminates an entire class of errors.

2. Tool discovery at scale

When you have 5 scripts, the model can learn their interfaces from skill instructions. When you have 50 MCP tools across 8 servers, something changes: all MCP tool schemas are always visible in the API’s tools array. The model can browse them, compare parameters, and pick the right tool without loading any skill instructions first.

With skills, tool discovery requires loading skill instructions (Level 2) before the model even knows what’s available. For large tool ecosystems — an organization with MCP servers for GitHub, Jira, Postgres, Slack, Datadog, and more — the “always visible” property of MCP schemas is a significant advantage.

3. Cross-platform portability

An MCP server works with Claude Code, Cursor, Windsurf, Copilot, and any other MCP-compatible AI assistant. A skill script in .claude/skills/deploy/scripts/ is tied to Claude Code’s Bash tool. If your team uses multiple AI tools, MCP gives you one interface that works everywhere.

What this means in practice
#

Capability Skill script MCP tool Skill workaround Verdict
Model reasoning Reads interface from instructions Reads JSON Schema from tools array N/A — both work Draw
Input validation Errors surface at runtime Zod rejects before execution Script validates its own args before calling the API Draw — both prevent the bad call; MCP is marginally faster
Discovery (5 tools) Skill descriptions cover it Schemas in tools array N/A — both work Draw
Discovery (50+ tools) Must load skill instructions All schemas always visible Rich Level 1 descriptions or a “catalog” skill Slight MCP edge — but skill catalogs close the gap
Authentication Env vars, token cache Server manages OAuth/refresh Script handles tokens itself Draw
Persistent state Fresh process each call Server holds connections Sidecar daemon via Unix socket Draw — but the sidecar is an MCP server without the protocol
Cross-platform Tied to Claude Code Any MCP-compatible assistant Ship scripts with adapter wrappers per platform MCP wins — one interface vs N adapters

The real decision: skills can do almost everything MCP does, but the workarounds add up. A sidecar daemon for persistence, a catalog skill for discovery, adapter wrappers for portability — at some point you’ve built an MCP-equivalent system without the standardized protocol. MCP’s value isn’t any single capability; it’s that one protocol solves all of these at once.

Comparison
#

Dimension Skills MCP
What it provides Procedural knowledge + utility scripts — how to do something Typed, authenticated connectivity — the ability to do something reliably
Analogy An SOP manual with utility scripts attached A typed SDK for an external system
Implementation Markdown instructions + bundled scripts (SKILL.md + scripts/) Client-server architecture via JSON-RPC
Runs as Injected instructions; bundled scripts run via Bash Persistent external process (MCP server)
API calls Yes — via curl, Python, etc. in shell scripts (untyped) Yes — via typed, schema-validated tool definitions
Token cost Very low (Level 1 always; Level 2+ on demand) Higher (full tool schemas always exposed)
Requires infrastructure No — just a directory you can git push Yes — an MCP server process must be running
Tool control Can restrict available tools via allowed-tools No built-in tool restrictions
Shareable Via git (project skills in .claude/skills/) Via server deployment or npm packages
Best for Workflows, runbooks, scripts, encoding judgment Reliable interfaces to APIs, databases, SaaS platforms

Can Skills Completely Replace MCP?
#

Yes. If you look closely at the architecture of the Claude Code harness, every capability that MCP provides can be completely replaced by a well-architected Skills implementation.

1. Replacing Pre-execution Validation Instead of relying on the harness’s Zod layer, a skill script can implement robust internal validation before making any API calls. For example, python3 scripts/billing.py charge --amount 100 --currency USD can validate that --amount is positive and --currency is a valid ISO code using argparse or pydantic before hitting the billing API. The functional result is identical: the costly call never happens. The only difference is that the validation runs in the script process rather than the harness process, surfacing errors to the model via standard output/error (which the model handles effortlessly).

2. Replacing Tool Discovery at Scale You can replace MCP’s always-visible tool schemas by using a “catalog” skill. The Level 1 metadata (name + description) is always in context, so a rich description serves as a discovery mechanism:

---
name: infra-tools
description: |
  Infrastructure CLI tools:
  - query-db: Run SQL queries against staging/production Postgres
  - deploy: Deploy services to staging or production
  - metrics: Query Datadog metrics for the last N hours
  - slack-notify: Post messages to Slack channels
---

When managing 50+ tools, a catalog skill lists all available scripts. Because Level 1 descriptions are tiny compared to full JSON Schema definitions, this approach is actually more context-efficient than loading 50 full MCP schemas into the harness at startup.

3. Replacing Persistent Connections MCP servers hold persistent connections (database pools, WebSockets, long-lived sessions). Skills can achieve this exact architecture by talking to a sidecar daemon. You run the daemon in the background to hold the persistent connections, and the skill’s Bash scripts communicate with it via Unix sockets or localhost HTTP:

# scripts/db-query.sh
# Talks to a persistent sidecar instead of opening a new connection each time
curl -s --unix-socket /tmp/db-proxy.sock \
  -X POST -d "{\"sql\": \"$1\", \"params\": $2}" \
  http://localhost/query

This transforms the skill from a stateless script into an interface for a stateful microservice, matching MCP’s persistence capability.

4. Replacing Cross-Platform Portability While MCP defines a standard JSON-RPC protocol across tools like Cursor and Windsurf, Python and Bash scripts are inherently portable themselves. To support multiple AI assistants, you simply ship your scripts with thin adapter wrappers (e.g., a Cursor extension that shells out to your python script, or a Windsurf plugin that does the same). The core logic remains in the script, making it deeply agnostic to the specific AI agent running it.

The Architecture of a Full Replacement: If you want to bypass the complexity of deploying and managing MCP servers, you can build a complete equivalent using Skills + Scripts + Sidecars + Catalogs. While this involves writing validation logic and managing daemon processes yourself, it provides supreme flexibility—you are working entirely with standard scripts and bash commands, completely decoupled from the JSON-RPC spec of the Model Context Protocol.

When to use Skills
#

  • You need procedural guidance — a repeatable workflow with specific steps, conditions, and rules.
  • You want to encode judgment — “if the PR touches the payments module, always run the fraud-detection test suite.”
  • You want consistency — the same workflow applied identically across sessions without re-explaining it.
  • You’re making one-off API calls — a quick curl in a script is simpler than standing up an MCP server.
  • You’re optimizing for context — skills load just-in-time, keeping the baseline context footprint minimal.

How they compose
#

The most powerful workflows stack Skills on top of MCP:

  1. MCP provides the connection — e.g., an MCP server exposes your JIRA API.
  2. A Skill provides the methodology — e.g., a review-pr skill says: “First use the JIRA MCP to fetch the linked ticket. Then read the changed files. Then check for breaking changes against our API compatibility guidelines. Finally, post a review comment.”

5. Putting It All Together
#

With all the layers defined, here is how the harness bootstraps and runs:

async function main() {
  // 1. Build the tool registry
  const registry = new ToolRegistry();
  registry.register(ReadFileTool);
  registry.register(BashTool);
  registry.register(EditFileTool);
  registry.register(SpawnAgentTool);
  registry.register(UseSkillTool);

  // 2. Connect MCP servers (if configured)
  await registerMCPTools(
    { name: "postgres", url: "http://localhost:3001/mcp" },
    registry,
  );

  // 3. Discover skills (personal + project)
  const skillRegistry = new SkillRegistry();
  await skillRegistry.discoverSkills([
    path.join(process.env.HOME || "~", ".claude", "skills"), // personal
    path.join(process.cwd(), ".claude", "skills"), // project
  ]);

  // 4. Configure permissions
  const permissions = new PermissionSystem("default", {
    denyPatterns: [/rm\s+-rf\s+\//, /curl.*\|.*sh/],
    allowedPaths: [process.cwd()],
  });

  // 5. Register hooks
  const hooks = new HookRunner();
  hooks.register({
    event: "PreToolUse",
    command: `bash -c 'if echo "$TOOL_INPUT" | grep -q "node_modules"; then echo "BLOCKED"; exit 1; fi'`,
  });

  // 6. Initialize skill-aware context manager
  const contextManager = new SkillAwareContextManager(
    process.cwd(),
    200_000,
    skillRegistry,
  );

  // 7. Create the agent loop and run
  const agent = new AgentLoop(registry, permissions, hooks, contextManager);
  const result = await agent.run("Deploy the app to staging");
  // The agent will autonomously discover the 'deploy' skill from metadata,
  // load its instructions via UseSkill, and follow the steps.

  console.log(result);
}

main().catch(console.error);

6. Architectural Lessons
#

Stepping back, the Claude Code harness teaches several generalizable lessons about building agentic systems:

The model is not the product
#

Only ~2% of Claude Code’s codebase is “AI-related” in the sense of prompt engineering or model interaction. The remaining 98% is operational infrastructure: state management, safety, tool execution, context optimization. If you are building an agentic system, expect a similar ratio.

Distributed systems patterns apply
#

The harness is effectively a distributed system with a single worker (the LLM) and multiple services (the tools):

Pattern Harness analogue
Worker pool Sub-agents
Service interface Tool registry
Middleware Hooks
Log rotation Context compaction
Configuration management CLAUDE.md
Circuit breaker Reactive compact + retry

If you have experience building distributed systems, you already have the mental models needed to reason about agentic architectures.

Safety is infrastructure, not a feature
#

The permission system, hooks, and schema validation are not bolted-on safety features — they are load-bearing infrastructure that the entire execution model depends on. The deny-first design, deterministic hooks, and layered gates are what make it safe to give an LLM write access to your codebase.

Statelessness is a feature, not a bug
#

The model’s statelessness is often framed as a limitation, but Claude Code leverages it as a feature. Because every API call is independent, the harness can:

  • Compact the context without side effects — the model doesn’t “notice” missing history.
  • Fork sessions — two users can branch from the same conversation and diverge.
  • Resume sessions — the harness reconstructs context from persisted state; the model doesn’t need to “wake up.”

The harness transforms a liability (no memory) into a capability (flexible state management).

Conclusion
#

Claude Code is a masterclass in the unglamorous but essential work of building agentic infrastructure. The agentic loop is simple; the tool registry is modular; the permission system is layered; the context management is multi-staged; and the extensibility surfaces (hooks, MCP, skills, sub-agents) are designed for growth without touching the core loop.

The real insight is architectural: the intelligence is in the model, but the reliability is in the harness. If you’re building systems that give LLMs agency over real-world environments, the harness is where most of your engineering effort should go.

Dissecting the Claude Code Harness - This article is part of a series.
Part : This Article

Related

Dissecting the Claude Code Harness - Part 1: The Execution Engine

·4085 words·20 mins
Introduction # Claude Code is Anthropic’s terminal-based AI coding agent. On the surface it looks like a CLI that “just talks to Claude,” but under the hood it is a stateful software layer that sits between a stateless language model and your local development environment. The model provides the reasoning; the harness provides the hands, eyes, and workspace.