5 min read
AI assisted

LightRAG Without Apache AGE — Graph Storage in Recursive CTE

Implementing LightRAG's BaseGraphStorage on plain PostgreSQL with RCTE — why a 1-hop-dominant retrieval pattern fits flat SQL

lightrag_pg_not_age implements LightRAG's BaseGraphStorage interface on plain PostgreSQL using Recursive CTE — no Apache AGE, no Cypher wrapper, no graph extension. The retrieval call distribution is 1-hop-lookup dominant, and flat SQL with B-tree indexes turns out to be the most efficient expression of that pattern.


LightRAG's graph abstraction

LightRAG exposes its knowledge graph through BaseGraphStorage — an abstract class with 18 methods. Any conforming implementation serves as the graph layer: NetworkX (the in-process default), Neo4j, Apache AGE, or something custom.

The official PostgreSQL backend is PGGraphStorage. It depends on Apache AGE, which grafts Cypher onto PostgreSQL via a cypher() wrapper function. Every graph operation passes through AGE's parsing and plan generation before PostgreSQL can execute it.

The question lightrag_pg_not_age starts from: can the same 18-method interface be satisfied using plain PostgreSQL, with no extensions beyond a standard postgres:17-alpine image?


Why RCTE fits LightRAG

The answer depends entirely on what those 18 methods do at runtime. LightRAG's actual retrieval call distribution is:

Method Share
get_node 35%
get_node_edges 25%
upsert_node 15%
node_degree 15%
has_node 10%

85% of calls are 1-hop operations — point lookups and direct-edge scans. The remaining 15% is upsert_node, a write, not a traversal. Graph database strengths — variable-depth path queries, shortestPath, cycle detection — are entirely absent from the retrieval critical path.

get_knowledge_graph runs a BFS but is bounded: depth 2, max_nodes 50 cap. It serves visualization, not retrieval. RCTE's weakness — deep BFS frontier explosion — is intentionally unreachable.

get_node with a B-tree index on (workspace, id) is a point lookup; get_node_edges with (workspace, src_id) is a small range scan. Routing either through cypher() adds parse and plan-generation cost with no benefit for these operations.

The benchmark_gdb Phase 1–8 series decomposes the 290x gap: cypher() wrapper cost (~100x, isolated via AgensGraph) plus PostgreSQL's planner limits on graph traversal. AgensGraph removes the wrapper cost but not the planner limit — all Cypher-on-PostgreSQL approaches trail flat SQL on this workload.


Implementation

Two tables, auto-created on initialization:

CREATE TABLE lightrag_graph_nodes (workspace TEXT, id TEXT, properties JSONB, ...);
CREATE TABLE lightrag_graph_edges (workspace TEXT, src_id TEXT, tgt_id TEXT, properties JSONB, ...);

workspace partitions separate LightRAG instances within the same database. properties is JSONB, carrying entity metadata without schema changes as LightRAG's model evolves.

The implementation is lightrag-upstream/lightrag/kg/pg_rcte_impl.pyPgRcteGraphStorage covering all 18 abstract methods, no stubs. Five batch overrides eliminate N+1 patterns. get_knowledge_graph is a single Recursive CTE BFS, bounded by depth-2 and max_nodes-50. Upserts use INSERT ON CONFLICT DO UPDATE.

Configuration: POSTGRES_HOST, POSTGRES_PORT, POSTGRES_USER, POSTGRES_PASSWORD, POSTGRES_DATABASE. Switching an existing deployment is one constructor change — graph_storage="PgRcteGraphStorage". Everything else stays the same.


Benchmarks

The bench/ directory contains four scripts: bench_e2e.py (end-to-end pipeline timing), bench_graph_ops.py (per-backend operation throughput), bench_lightrag_mix.py (the 35/25/15/10/15 call-distribution replay), and compare_three_way.py (side-by-side across backends). Environment: Colima VM, 8 vCPU / 10 GB, each backend container at 2 GB / 2 cores, workers=10.

Synthetic graph (Barabasi-Albert, 8,000 nodes / 39,975 edges)

Backend Seed (s) RPS p50 (ms) p95 (ms)
NetworkX 0.8 47,730 0.2 0.3
RCTE 19.6 12,776 0.7 1.3
Neo4j 85.9 1,684 4.3 13.2
AGE 351.8 175 4.8 213.3

EZIS real LightRAG data (8,050 nodes / 25,844 edges)

Backend Seed (s) RPS p50 (ms) p95 (ms)
NetworkX 0.6 47,113 0.2 0.3
RCTE 14.3 9,169 0.7 1.4
Neo4j 66.5 1,693 4.2 13.2
AGE 208.7 159 4.6 238.6

NetworkX is an in-process reference with no network round-trip. Among the three remote backends: RCTE at 12,776 RPS versus Neo4j at 1,684 RPS is 7.6x; RCTE versus AGE is 58–73x. AGE p95 at 213–238 ms under 10 concurrent workers shows latency collapse under concurrency. RCTE p95 holds at 1.3–1.4 ms.

For the broader multi-engine picture, benchmark_gdb Phase 6 reports RCTE at 22.5K RPS, Neo4j at 14.5K RPS, AGE at 78 RPS under u=50 on a general GraphRAG OLTP mix (Q1–Q7 at 30/30/20/10/3/4/3). The ratio narrows versus the pure LightRAG distribution because the general mix carries more traversal-weighted queries. The Phase 1–8 series is the reference for the full eight-engine comparison and gap decomposition.


When to use it

The target case is LightRAG on PostgreSQL without Apache AGE or AgensGraph. AGE requires a non-standard build or a managed service that ships it — standard postgres:17-alpine does not include it. PgRcteGraphStorage provides full BaseGraphStorage compliance on a stock instance.

The equivalence test suite — 34 functional tests, four-backend correctness comparison on synthetic and real data — confirms that switching backends does not change LightRAG's output. Recommended indexes: (workspace, id) on nodes and (workspace, src_id) on edges.

The limit: workloads with variable-depth traversal on the hot path — shortestPath, multi-hop expansion beyond depth 2, cycle detection — need a native graph database. LightRAG's design avoids those patterns; that is why flat SQL works here.


Closing

The pattern generalizes: measure the actual graph operations a workload runs, then pick the storage representation that matches the measured distribution. "We need a graph database" is a claim that rarely survives contact with benchmarks on lookup-dominant workloads running on standard relational infrastructure.