AI May 20, 2026 8 min read

AI assisted

Codex App Server Python SDK — JSON-RPC v2 over stdio

A Python SDK over the codex app-server stdio interface — install, first call, thread model, main methods

#Codex #OpenAI #Python SDK #JSON-RPC #GPT-5 #Coding Agent #stdio

Codex CLI is OpenAI's open-source coding-agent CLI (github.com/openai/codex). By default it runs interactively in a terminal, but the codex app-server subcommand puts it into JSON-RPC v2 server mode, where external programs can drive the agent over stdio — request objects in, notification and response objects back.

The Python SDK covered here is a wrapper around that stdio interface. Its official source is at github.com/openai/codex/tree/main/sdk/python. It maps snake_case Python fields to the wire format's camelCase via Pydantic, and exposes both synchronous (Codex, Thread) and async (AsyncCodex) call paths. This post covers installation, the first call pattern, the thread model, the main methods, and operational gotchas.

Why this matters

Until now, OpenAI's Codex coding agent has been usable only as a CLI or TUI. To embed GPT-5-driven agents in external workflows, integrators had to spawn subprocesses and parse stdout, or fork the CLI directly.

This SDK removes that detour. It is the OpenAI-side counterpart to Anthropic's Claude Agent SDK, which made the Claude Code runtime embeddable in Python and TypeScript.

The practical effect is a canonical path for plugging GPT-5-based coding agents into evaluation harnesses, automation, and internal tools — without reverse-engineering the CLI.

One-line summary: Claude Agent SDK ↔ Codex App Server SDK — both vendors' coding-agent runtimes are now embeddable via first-party SDKs.

What it is

When codex app-server starts, it speaks JSON-RPC v2 over stdio — request objects go in, notification and response objects come back. The SDK (openai-codex-app-server-sdk) sits directly on top of that transport. It handles subprocess lifecycle, protocol framing, and type conversion; your application code works with Python objects.

The wire format uses camelCase field names (approvalPolicy, baseInstructions, modelProvider). The SDK exposes everything in snake_case instead, and the Pydantic wire-model layer handles the translation in both directions. When reading API reference material that predates the SDK, every camelCase parameter maps to its snake_case equivalent at the Python call site.

Generated wire models live in codex_app_server.generated.v2_all and are returned by lower-level methods. The convenience wrappers (Codex, Thread, RunResult) sit above them and cover the common cases without requiring you to import the generated layer directly.

Installation

The package depends on a separate platform-specific binary package, openai-codex-cli-bin, which carries the Codex runtime. Published SDK builds pin an exact version of that runtime package with the same version as the SDK itself. For local development from the repo:

uv venv
uv pip install -r requirements.txt -e .
source .venv/bin/activate

The examples and the Jupyter walkthrough notebook bootstrap the pinned runtime package automatically via _bootstrap.py, so they run without manual binary setup.

For local development where the runtime package is not installed, pass an explicit binary path via AppServerConfig:

from codex_app_server import Codex
from codex_app_server.config import AppServerConfig

with Codex(config=AppServerConfig(codex_bin="/path/to/codex")) as codex:
    ...

Requirements: Python 3.10 or later, uv, and a configured local Codex auth session.

First call

Codex() is eager. Construction starts the subprocess and calls initialize immediately, so the context manager is the correct pattern — it ensures close() is called on exit even when exceptions occur.

from codex_app_server import Codex

with Codex() as codex:
    thread = codex.thread_start(model="gpt-5")
    result = thread.run("Say hello in one sentence.")
    print(result.final_response)
    print(len(result.items))

thread_start creates a new conversation thread on the server. thread.run(...) sends the prompt as a turn, consumes all incoming notifications until turn/completed arrives, and returns a RunResult. The call blocks until the turn completes.

result.final_response holds the assistant's text as a plain string. result.items is a list of ThreadItem objects representing every item the server emitted during the turn. result.usage carries token counts when the server reports them.

For async code, AsyncCodex is a drop-in replacement with the same shape. It initializes lazily rather than eagerly, so context entry is the standard path for explicit startup and shutdown:

import asyncio
from codex_app_server import AsyncCodex

async def main() -> None:
    async with AsyncCodex() as codex:
        thread = await codex.thread_start(model="gpt-5.4", config={"model_reasoning_effort": "high"})
        result = await thread.run("Summarize Rust ownership in 2 bullets.")
        print(result.final_response)

asyncio.run(main())

The thread model

A thread is conversation state; a turn is one model execution inside that thread. Multiple calls to thread.run(...) on the same Thread object produce a multi-turn conversation where each turn has access to the preceding context:

with Codex() as codex:
    thread = codex.thread_start(model="gpt-5.4", config={"model_reasoning_effort": "high"})

    first = thread.run("Summarize Rust ownership in 2 bullets.")
    second = thread.run("Now explain it to a Python developer.")

    print(first.final_response)
    print(second.final_response)

To continue a thread across process restarts, use thread_resume with a stored thread ID:

with Codex() as codex:
    thread = codex.thread_resume("thr_123")
    result = thread.run("Continue where we left off.")
    print(result.final_response)

result.final_response is None when the turn completes without a final-answer item or a phase-less assistant message item. This can happen when the model produces only tool-use or structured output items with no accompanying text. The turn itself is not an error in this case; check result.items directly.

The current experimental build enforces one active turn consumer per Codex instance. Starting a second thread.run(...) or TurnHandle.stream() while another is in progress raises RuntimeError.

Main methods

Codex.thread_start creates a new thread. The model parameter selects which GPT model the server uses for the thread. config accepts a dict for per-thread server-side parameters such as model_reasoning_effort. Other notable parameters: base_instructions for a system-level instruction string, cwd to set the working directory the server sees, ephemeral for throwaway threads, and sandbox for sandboxed execution.

Codex.thread_resume reattaches to an existing thread by ID. Accepts the same override parameters as thread_start. Use this when your application restarts between turns or when you store thread IDs externally.

Codex.thread_fork creates a new thread branching from a given thread ID, inheriting its history up to the fork point.

Thread.run is the common-case convenience path. It accepts a plain string or an Input value, starts the turn, and blocks until completion. Returns RunResult with final_response, items, and usage. For most application code this is the only turn method needed.

Thread.turn returns a TurnHandle for low-level control. Use this when you need event-by-event streaming, mid-turn steering, or interrupt:

turn = thread.turn(TextInput("Explain SIMD in 3 short bullets."))

for event in turn.stream():
    if event.method == "item/agentMessage/delta":
        delta = getattr(event.payload, "delta", "")
        if delta:
            print(delta, end="", flush=True)
    if event.method == "turn/completed":
        print()
        break

TurnHandle.steer(input) injects new input mid-turn. TurnHandle.interrupt() stops the current turn. TurnHandle.run() is the alternative to stream() — it blocks until completion and returns the canonical generated Turn model from codex_app_server.generated.v2_all. stream() and run() are mutually exclusive per handle.

Codex.models lists available models from the server. Pass include_hidden=True to surface non-default entries.

Thread.compact triggers context compaction on the server side for long-running threads.

Patterns from examples

Streaming CLI loop. The 11_cli_mini_app example shows the minimal pattern for a multi-turn interactive session: start one thread, loop on input(), call thread.turn(TextInput(...)), stream events, print item/agentMessage/delta chunks in real time, and capture token usage from ThreadTokenUsageUpdatedNotification events at the end of each turn. The thread persists for the session lifetime so conversation context accumulates.

Error handling with retry. The 10_error_handling_and_retry example wraps a turn in retry_on_overload, which retries on ServerBusyError with exponential backoff and jitter. The pattern separates retryable errors (transient overload) from non-retryable ones (InvalidParamsError, MethodNotFoundError). For the latter, fixing inputs or version compatibility is the right path — blind retries will not help.

from codex_app_server import retry_on_overload, ServerBusyError, JsonRpcError

result = retry_on_overload(
    lambda: thread.turn(TextInput("Summarize retry best practices.")).run(),
    max_attempts=3,
    initial_delay_s=0.25,
    max_delay_s=2.0,
)

Operational notes

Runtime binary pinning. The SDK package version and the openai-codex-cli-bin version must match. A mismatch causes protocol failures that surface as initialization errors. When upgrading, update both packages together. Published SDK wheels declare an exact pinned dependency so pip install openai-codex-app-server-sdk pulls the matching runtime automatically.

Constructor failure. Codex() fails at construction, not at first use, because it starts the process and calls initialize in __init__. Common causes: the runtime package is not installed, the codex_bin path is wrong, or local Codex auth is missing. Check these before assuming a protocol problem.

stdio buffering. The transport communicates over the subprocess's stdin and stdout. Any code that writes to the subprocess's stdout outside the protocol — including stray print() calls inside plugins or scripts that Codex runs — can corrupt the JSON-RPC framing. The server process must produce only valid JSON-RPC objects on stdout.

Hanging turns. A turn is considered complete only when turn/completed arrives for that turn ID. With thread.run(...) this is handled automatically. With TurnHandle.stream(), the caller must consume notifications until the completion event arrives. Stopping iteration early leaves the turn open and blocks further use of that client instance.

Model selection. The model parameter on thread_start and on individual thread.run(...) / thread.turn(...) calls accepts model identifiers recognized by the Codex server (gpt-5, gpt-5.4, and variants). The config dict on thread_start supports model_reasoning_effort for controlling reasoning depth on supported models. codex.models() returns the full list of models the connected server instance reports.

One active turn per client. The current experimental build enforces a single active turn consumer across all threads on one Codex instance. To run turns in parallel, create separate Codex instances.

Closing

Use this SDK when your Python application needs to drive Codex programmatically — scripted code review pipelines, multi-turn agents, or tooling that integrates Codex into a larger workflow. The thread.run(...) path handles the majority of use cases in a few lines; TurnHandle.stream() covers the cases that need real-time output or mid-turn control. How this compares to the Claude Code SDK as a programming interface is a separate topic for another post.