Skill and Harness

Why skills and harnesses overlap in implementation

#AI #Agent #Harness #Skills #Policy #Feedback

People often conflate harnesses and skills, and while the two are conceptually distinct, in actual implementations they frequently contain each other or get deployed in each other's form. A skill is a package of capabilities, and when those capabilities are aimed at achieving some goal, they're often structured to reach that goal through a feedback loop — that is, through a harness. Conversely, a harness is a feedback loop for achieving goals, so it tends to define an agent's behavioral patterns through combinations of skills and then wrap constraints around them to generate feedback. The subtle difference is this: a skill compresses "how to behave," while a harness organizes "how to know whether that behavior is getting closer to the goal."

This distinction becomes sharper when you consider that skills include tools, sub-agents, and hooks, but are primarily closer to extensions of a "prompt" — specifically a SKILL.md. A skill can include not just instructional documents but also tool-calling conventions, sub-agent usage rules, scripts, verification commands, output formats, and prohibited conditions. In the end, a skill expresses an agent's policy in much the same way a prompt did in an earlier era. Because capability, at its core, is about whether a given input can take you from state a, via tool x, to outcome a' — and whether it can keep you from leaking to outcome b. If the old-era prompt tuned a model's response tendencies, a skill extends that role to cover the filesystem, tools, execution environment, and project rules.

Seen as layers, the relationship gets clear. A tool is the unit of a single action the agent calls; a skill is one level up, bundling those tool calls into a single policy; and a harness is the level above that — the feedback structure that keeps those policies on course toward a goal. Tools sit inside skills, and skills operate inside a harness: a stack built from the bottom up.

What a harness pins down is concrete: what can be changed and what must not be touched, how success and failure get judged, and what is rolled back when something fails. So rather than defining capability itself, a harness defines the search space where capability operates and the evaluation conditions that filter its results. For a code agent, those signals are test failures, type errors, build logs, execution results, and user feedback. Conceptually, a skill is a policy asset the agent references, while the harness is the external structure where that policy gets validated against the real environment. If a skill is the form of behavior, a harness is the selection criteria for behavior.

Harnesses vary enormously in size. At one end sits something like an "emoji-ban hook" that blocks any file write when emoji slip into the output. It's only a few lines of code, yet it's an excellent harness — a hard gate where pass and fail are binary and unambiguous, which is exactly why it works so forcefully. At the other end, an entire operational structure spanning many sessions and tools can be bundled under the name harness. A harness's strength comes not from its size but from the clarity of its feedback signal.

Skills, too, can be seen at two layers. Skill-as-concept is the idea of packaging work capability and patterns; skill-as-implementation is that idea made concrete as a SKILL.md with hooks and scripts. The layer where the boundary blurs is precisely this implementation one. Skills can be packaged together with hooks, and scripts are often provided as tools for self-feedback — so a skill already ends up containing a mini-harness that observes and corrects its own outputs. If a deployment skill includes build, test, log inspection, health check, and rollback conditions on failure, that's no longer just execution guidance — it's a small control system. Going the other direction, a harness combines multiple skills, decides which skill to invoke when, and evaluates the results through an overarching feedback structure. A harness can be a mini-harness embedded inside a skill, but it can also mean the constraints on the entire search space within which those policies operate.

So in a way, a coding agent like Claude Code is an agent runtime — but it also has skills and a harness built into it to make the agent operational in its environment, and in some respects it is the harness itself. These systems are not model-call UIs; they are agent runtimes that connect the filesystem, shell, tests, version control, approval flows, and project rules. They are skill executors with capabilities like reading code, making edits, debugging, iterating on tests, and cleaning up commits built in or addable from outside. And they are harnessed execution environments that make the agent cycle through modify, run, observe, and re-modify inside the codebase. With a clearly defined goal and feedback signal, they can perform remarkably well on the strength of their built-in capabilities alone, without needing explicit prompts or scripts as policy — or you can compose pre-packaged skills to give the agent a more refined search policy.

It's also worth considering the flip side: harnesses that get talked about most are often implemented and distributed in the form of "skills" for Claude Code, Codex, and similar platforms. A full harness is hard to port wholesale because it encompasses goals, state, evaluation, permissions, and recovery. A skill, by contrast, is easy to ship as a bundle of documents, hooks, commands, and scripts that a given runtime can read. So the act of "building a skill" is in practice a partial design of the agent's search space, feedback conditions, and failure recovery procedures. The skill becomes both a capability package and a carrier of harness fragments.

What's decisive here is that the word "skill" itself moves between two levels of abstraction. Skill-as-implementation is a protocol-level standard — SKILL.md front matter, directory conventions, even a marketplace — whereas a harness has no such standard layer and mostly stays at the conceptual level. So to ship a harness that has no spec, you have no choice but to borrow the form of a skill, which does have one. When Addy Osmani defines a skill as a "workflow with evidence-producing checkpoints and a defined exit criterion" and as "one layer of agent harness engineering," that's a case in point. The skill he points to isn't the SKILL.md standard but the concept-level skill — and that concept leans toward harness more than toward a capability package. Since the same word straddles two levels, it's no surprise that it doesn't line up exactly with skill-as-standard. It's this asymmetry that makes a skill the easiest, most pluggable way to mount a harness onto an agent runtime: a skill is a standardized loadable unit, so you can drop it straight in. Claude Code's plugins are just Claude Code's own mechanism for bundling and shipping those skills — not a standard. The skill is the standard; the plugin is one runtime's way of packaging it.

A skill-implemented harness has limits, of course. It integrates tightly with the runtime but is weak at enclosing the whole, and it tends to lock you into a specific agent runtime. A skill is a unit mounted inside the runtime, so it can't easily become the outer control structure that wraps the runtime itself. This is why a harness like ouroboros is built as runtime-agnostic orchestration that sits above any single runtime. At that scale, calling the thing a "skill" no longer fits — the name plainly shifts back to harness.

In long-running agents, the distinction between harness and skill becomes starker. The scale and complexity of long-running agents are only achievable through combinations of diverse skills, not a single one. Pursuing a goal over days or weeks, maintaining state across sessions, recovering from failures, coordinating multiple agents, and validating partial outputs — none of that is manageable with a single skill alone. The harness there refers to the whole system: it operates through those skills, sometimes with multiple agents interacting as they advance toward the goal, providing constraint and feedback signal throughout. How the harness itself becomes the environment that drives adaptation is taken up in Harness as Environment.

In the end, skills and harnesses are not mutually exclusive. A skill is an execution unit that compresses policy; a harness is the feedback environment where that policy is explored and validated. Conceptually, the difference is that a skill is a behavioral policy and a harness is a feedback structure — but in implementation, skills embed harnesses and harnesses get deployed as skills. That they can be distinguished yet still overlap comes down to the fact that these are concepts. Protocols can't overlap — their specs are mutually exclusive — but concepts diverge in emphasis while sharing extension.

References

"Agent Skills (Open Standard)." agentskills.io — originally developed by Anthropic, 2026.

Addy Osmani. "Building Reliable Long-Running AI Agents." Addyo Substack, 2026.

Addy Osmani. "Agent Skills." O'Reilly Radar, 2026.

Q00. "ouroboros (Agent OS)." GitHub, 2026.