Requirements as Latent State

Why spec-driven development is a requirements-inference architecture

#AI #Agent #Requirements #Spec-Driven #Harness #Feedback #Control

Anyone who has used an AI coding agent long enough will recognize a particular recurring scene. The user believes they stated their requirements clearly. The agent interprets them differently and implements something off-target. More detailed explanation seems to help, but the longer the explanation gets, the more new ambiguities it introduces. The problem does not end with natural language being imprecise. The deeper problem is that we mistake requirements for "things contained in sentences."

The actual substance of a requirement does not live in the sentence itself — it exists as a latent state, and natural language is just one noisy surface expression of it. What requirements actually consist of is a tangle of the user's goals, organizational conventions, constraints of existing systems, implicit expectations, past failure experiences, and boundaries of cost and responsibility. The natural language sentence is just an observed sample of that latent state breaking the surface. When a user says "make the search better," that sentence could mean ranking quality, response speed, recall completeness, or query-assist UI. What the agent needs to do is not execute the sentence as written, but use the natural language expression as evidence to reconstruct the underlying requirement.

From this perspective, the repeated failures in AI development are not something better prompt-writing will fix. The problem is that agents treat the natural language expression as the requirement itself, and jump straight to execution. Natural language is flexible, but that flexibility means the same underlying need can appear in many different expressions, and the same expression can point to many different underlying needs. Requirements analysis, then, is not document interpretation — it is latent-state inference. The user's words, existing code, test failures, domain constraints, user feedback, and approval signals all become evidence for estimating what the latent requirement actually is.

This is where EARS comes in. EARS stands for Easy Approach to Requirements Syntax — a grammar that structures requirements into consistent natural language patterns. "When some event occurs, the system shall perform some action." "While a particular state persists, the system shall perform some action." The point is to separate condition from system behavior, so that triggers, states, subjects, and actions are all distinct rather than blurred together in free prose. The goal is to reduce ambiguity. That said, EARS was originally designed as a syntax for stably documenting system requirements. In an AI coding agent context, the subject of a requirement is not necessarily "the system." It could be a service, component, agent, function, file, workflow, or data pipeline. That is where an extended grammar like GEARS becomes meaningful — it generalizes the EARS idea to ask where, under what conditions, what subject, and what action should apply, across a much wider range of development contexts.

If you treat EARS or GEARS as sentence templates, their significance shrinks. They are not pretty ways to write requirements — they are normalizations of the observation model for more reliably estimating latent requirements. Free-form natural language has high variance as an observation. Separating condition, state, subject, action, and output reduces the degrees of freedom the agent has to reason over. These grammars do not fully capture the substance of a requirement, but they at least give the agent a clearer observation surface to interpret against.

Spec-Driven Development is the practice of making these normalized requirements the primary artifact of development. In traditional development, specs are either reference documents written before implementation or documents abandoned after it. In spec-driven development, files like spec.md, plan.md, and tasks.md become live state that the agent reads and acts on. The specification is no longer a document only humans read — it is a state file the agent consults to choose its next action. But here is the crucial point: the spec is not the substance of the requirement. The spec is the current estimate of the latent requirement. It is the linguistic expression of a posterior built from the user's words and existing context. When new feedback arrives, when contradictions surface during implementation, when tests keep failing — the spec has to be updated. So good spec-driven development is not about treating the spec as absolute; it is about continuously validating and revising it.

This is where the harness becomes necessary. The harness defines the execution environment: what state the agent can observe, what it is allowed to modify, how it judges success and failure, and how far it can roll back when something goes wrong. In control-theoretic terms, it bundles current state, target state, error, control input, state transition, and feedback path into one structure. If requirements are a latent state, then the harness is the control loop that checks whether the agent's current estimate of that latent state is correct — by running things and seeing what happens. Skills, by contrast, are packaged capabilities for performing specific tasks: a skill for writing tests, a skill for deploying, a skill for converting documents into EARS format. Each is a reusable action policy. Skills are closer to actuators; the harness is closer to a controller. Actuators without a controller give you open-loop automation. A controller without actuators produces no actual state transitions.

AskUserQuestion and similar user-input loops need to be understood in this same context. When an agent is reconstructing the latent requirement state and uncertainty rises high enough, asking a question is an information-gathering action — a way to obtain another observation. If there are multiple plausible interpretations, the implementation differs significantly depending on which is right, and a wrong execution would be hard to reverse, then the agent should ask. But if it asks about every minor ambiguity, it stops being useful. So a well-designed harness should include the conditions under which asking is warranted. The approval loop works the same way. When an agent is about to make a change to external state that is hard to undo, a proposal and approval should come first. Approval is not a procedure for offloading responsibility onto the human — it is the process by which the user re-observes the agent's current estimate of the latent requirement and renders a judgment.

WorldLoops-style structures make this even clearer. Rather than immediately converting an incoming signal into external execution, the pattern is: open a pending loop, generate a proposal, get approval, record the local state transition, leave a receipt. What matters is not "what the agent did," but "what signal it interpreted as what requirement, what proposal it generated, and what transition it recorded after what approval." Without this record, when a requirements estimate turns out to be wrong later, there is no way to trace back to which observation, which interpretation, or which execution step introduced the error. A receipt is not a log — it is a trace of latent-requirement inference.

This also reframes the role of memory. Agent memory is not a space for storing large amounts of past conversation. What matters more is the update rule: how stored interactions influence current requirement estimation, when to trust historical patterns, and when to discard them. If stored records just accumulate in a vector database, that is storage — not experience. For experience to become adaptation, stored observations must update the current posterior, and that change must be reflected in execution policy.

Seen from this angle, traditional requirements engineering and agent-based development differ in a fundamental way. In the traditional picture, requirements are discovered, documented, agreed upon, and then handed off for implementation. In agent-based development, requirements do not exist in finished form from the start. They stabilize gradually through the back-and-forth of user-agent interaction, the resistance of the codebase, test failures, deployment constraints, and organizational approval processes. The spec is both a starting point and a continuously updated state estimate.

The way I see it, the essence of spec-driven development is not "translating natural language requirements into code." It is building a closed-loop system that infers the latent state called requirements from unstable natural language observations, fixes that estimate as a state file called the spec, and keeps updating it through execution and feedback. EARS and GEARS are grammars that reduce variance in observations. spec-kit is a mechanism that anchors estimated requirements as actionable development state. AskUserQuestion and the approval loop are observation channels for handling uncertainty and responsibility boundaries. Skills are action policies available in specific states. The harness is the control surface that bundles observation, judgment, approval, and recovery. In the end, a good AI development agent is not a model that understands instructions well. It is a system that infers stable intent from unstable expression, preserves that inference as explicit state, and feeds the errors revealed during execution back into its understanding of the requirements. That is the point at which spec-driven development stops being a document-writing practice and becomes the requirements-inference architecture for the age of agents.

References

Mavin, Alistair, et al. 2009. "Easy Approach to Requirements Syntax (EARS)." IEEE RE, 2009.

"Generalized EARS: The AI-Ready Spec Syntax." Medium, 2026.

"spec-kit: Spec-Driven Development." GitHub, 2026.

"Spec-Driven Development with ADK." Google Codelabs, 2026.

"User Input — Claude Code Agent SDK." Claude Code Docs, 2026.

"WorldLoops." GitHub, 2026.

"Ouroboros: Agent OS." GitHub, 2026.