DB May 20, 2026 8 min read

AI assisted

Replacing Elasticsearch with PostgreSQL

textsearch_ko (MeCab) + pg_textsearch BM25 + pgvector HNSW + DB-side RRF matches Elasticsearch on Korean search quality and runs 2–5× faster on a single node. Eight phases of measurement, head-to-head against ES, Qdrant, and Vespa.

#PostgreSQL #Elasticsearch #Korean #BM25 #pgvector #Hybrid Search #RRF #MeCab #Qdrant #Vespa #RAG

Can PostgreSQL alone replace Elasticsearch for Korean search? I measured it end-to-end across eight phases — from morphological analyzers to hybrid retrieval.

The short answer: yes.

textsearch_ko (MeCab) + pg_textsearch BM25 + pgvector HNSW + DB-side RRF
→ MIRACL NDCG 0.77 @1.79ms / EZIS NDCG 0.86 @0.92ms

Quality matches Elasticsearch, and single-node latency is 2–5× faster. Vector-first systems like Qdrant and Vespa fall structurally behind on Korean text search.

Why this experiment

Korean search is not English search.

Stemming "running" to "run" in English is a handful of rules. Reducing "먹었다" to "먹-" in Korean requires a morphological analyzer. Without one, you cannot link "먹는", "먹고", "먹었던" to the same document, and BM25 quality collapses.

Elasticsearch ships with nori, a Korean morphological analyzer, and solves this out of the box. PostgreSQL's stock full-text search does not support Korean. So "Korean search = Elasticsearch" became the default.

Does it have to be?

The textsearch_ko extension plugs MeCab morphology into tsvector. pg_textsearch adds BM25 scoring. Add pgvector for dense retrieval, fuse BM25 and dense in a single SQL CTE with RRF, and you have a full hybrid pipeline inside PostgreSQL.

The question was whether this assembly actually holds up. "An extension exists" is not the same as "production quality". I measured across eight phases.

What I learned

The tokenizer decides everything

Most of Korean BM25 quality is explained by the tokenizer alone. Phase 1 compared MeCab, Kiwi, and Okt; MeCab won on the throughput-quality tradeoff and became the default for every later phase.

Tokenizer impact on Korean BM25 quality across systems

Systems with morphological analysis (PostgreSQL + MeCab, Elasticsearch + nori) hit NDCG 0.61–0.64 on MIRACL BM25. Vespa, which only does ICU Unicode word-boundary splitting, gets 0.41. Qdrant's multilingual tokenizer — Meilisearch's charabia under the hood, also Unicode-only — gets 0.36.

Without morphology, "데이터베이스" and "데이터" never link. "검색했다" and "검색" become separate tokens. No amount of BM25 tuning rescues a wrong tokenization.

One unexpected finding: nori works well under OR matching (NDCG 0.61) but collapses to 0.13 under AND matching. decompound_mode: mixed over-decomposes compounds, and AND retrieval requires every token to be present, so recall caves. PostgreSQL's textsearch_ko held 0.64 under the same AND condition.

The optimal method flips by domain

Running two datasets with opposite characters in parallel was the most important design choice in the study.

Domain reversal: BM25 vs Dense vs Hybrid across MIRACL and EZIS

On MIRACL (Korean Wikipedia), dense retrieval (0.79) clearly beats BM25 (0.64). Semantic similarity matters more than keyword overlap.

On EZIS (Oracle DB manuals), BM25 (0.92) crushes dense (0.80). Exact-term matches like "ORA-01555" or "DBMS_STATS" matter more than semantic proximity.

The same PostgreSQL stack with the same hybrid configuration produces opposite winners depending on the data. There is no universally best method. Hybrid (RRF) is the safe default precisely because it hedges across this flip.

PostgreSQL is not slower than dedicated engines

Intuitively a specialized engine should outpace a general-purpose database. The numbers say otherwise.

PostgreSQL's DB-side RRF runs BM25 and dense queries inside a single SQL CTE and merges the results. One round trip between the app and the database. ES and Qdrant go through HTTP/JSON. That network overhead can dwarf the query itself.

On MIRACL 10K: PostgreSQL RRF p50 = 1.79ms, ES = 5.18ms, Qdrant = 4.54ms, Vespa = 4.14ms. Single-node, warm-cache, retrieval-only — caveats noted. BGE-M3 embedding inference (~200ms) is excluded; query embeddings were precomputed.

Hybrid search latency comparison across PostgreSQL, ES, Qdrant, Vespa

Quality holds up too

Hybrid search NDCG quality by system

On MIRACL Hybrid, PostgreSQL (0.77) edges out Elasticsearch (0.75). On EZIS, ES (0.88) slightly beats PG (0.86), but I did not test statistical significance. Qdrant (0.69 / 0.84) and Vespa (0.45 / 0.81) bleed from their weak BM25 legs.

The key point: hybrid quality is bounded by BM25 quality. A weak BM25 leg makes hybrid worse than dense alone. Vespa's MIRACL hybrid (0.45) underperforming its dense-only number (0.79) is the proof.

Qdrant has no self-hosted BM25

Qdrant is structurally the strongest of the vector-first engines — quantization, filtering, multi-tenancy, distributed scale are all best-in-class. But its Korean text story is weak.

qdrant/bm25 exists as a server-side model, but it's Qdrant Cloud only. Self-hosted has no equivalent. The built-in TextIndexParams multilingual tokenizer is charabia from Meilisearch — Unicode word-boundary splitting, not morphology. So "먹었다" and "먹는" stay disconnected. Tokenizing externally with MeCab and ingesting as sparse vectors works but is TF × IDF, not true BM25 — no document-length normalization (k1, b), and NDCG sits at 0.36.

Vespa does not ship Korean morphology

Vespa expresses BM25 + ANN hybrid declaratively, very cleanly. But the default tokenizer is ICU, which does not do Korean morphology.

I tried three integration paths. (1) Lucene Linguistics with Nori — lucene-analysis-nori is not bundled, OSGi isolation blocks loading. (2) A custom Linguistics component (vespa-kuromoji-linguistics pattern) — the Java component loads, but the content node (proton, in C++) keeps using its own ICU tokenizer at index time, producing index/query token mismatch. (3) set_language — meaningless without Nori in Lucene Linguistics. Unlike PostgreSQL or Elasticsearch, Vespa's custom tokenizer does not apply consistently across indexing and query.

ICU BM25 generated enough noise that 0.1*bm25 + closeness linear fusion produced 0.45 — worse than dense-only (0.79).

Headline numbers

Cross-system (Phase 8)

System	MIRACL Hybrid	EZIS Hybrid	p50
PostgreSQL (RRF)	0.7683	0.8641	1.79ms
ES 8.17 (nori, retriever.rrf)	0.7501	0.8769	5.18ms
Qdrant 1.15 (MeCab sparse + dense)	0.6924	0.8394	4.54ms
Vespa (ICU + HNSW)	0.4463	0.8125	4.14ms

Inside PostgreSQL (Phase 7)

Method	MIRACL NDCG@10	EZIS NDCG@10	p50
pg_textsearch BM25 (MeCab)	0.6385	0.9162	0.44ms
Dense (BGE-M3 HNSW)	0.7904	0.8041	1.2ms
RRF hybrid (DB-side)	0.7683	0.8641	1.79ms
Bayesian hybrid (DB-side)	0.7272	0.9249	9.55ms

BM25 scaling (Phase 6–7)

BM25 latency scaling across 1K/10K/100K documents

pg_textsearch holds 0.62ms at 100K documents. The hand-rolled pl/pgsql implementation blows up to 85ms. Index size is 18MB vs 501MB — a 27× gap.

Datasets

Dataset	Character	Queries	Corpus	Bias
MIRACL-ko	General Wikipedia	213	10K	Dense-favored
EZIS Oracle Manual	Technical reference	~120	~200	BM25-favored

Running two opposite-character datasets in parallel was the design choice that kept the study honest — it prevents the illusion that any single method is universally best.

Phases

Phase	What	Key finding
0	Prepare MIRACL-ko + EZIS	—
1	Compare analyzers (MeCab vs Kiwi vs Okt)	Kiwi-CoNg wins on quality, MeCab on speed → MeCab chosen for throughput/stability
2	Wire Korean morphology into `tsvector`	`textsearch_ko` connects MeCab → tsvector
3	Compare in-PG BM25 implementations	pl/pgsql and pg_textsearch are the two viable options
4	BM25 vs neural (Dense, SPLADE)	Domain flips winner, hybrid is the safe default
5	Production hardening (incremental, concurrency)	pl/pgsql v2 + BGE-M3 confirmed
6	VectorChord-BM25 scaling (1K/10K/100K)	VectorChord is 24× faster than pl/pgsql
7	pg_textsearch hybrid (RRF, Bayesian)	pg_textsearch fastest at every scale, RRF chosen
8	Cross-system comparison (ES / Qdrant / Vespa)	PG stack matches on quality, 2–5× latency win

Caveats

All latencies are single-node, warm-cache, retrieval-only.
Dense latency excludes BGE-M3 embedding inference (~200ms).
PG numbers reuse Phase 7 measurements; this is cross-phase, not same-time head-to-head.
BM25 query semantics differ across systems: PG uses AND, ES uses OR, Vespa uses weakAnd.
No bootstrap CI or p-value testing for statistical significance.

So what should you use?

For a new project, starting with PostgreSQL alone is the rational default. Tokenizer, BM25, vectors, and hybrid fusion all live inside one transaction, one node, one backup, one permission model. Index-source sync and data-consistency issues disappear.

If you already run Elasticsearch and don't depend on nori AND matching, there's no reason to migrate. Korean quality is on par with PG under nori's OR semantics. The single-node latency win and operational simplicity are real, but not migration-forcing.

Qdrant and Vespa are not the first choice for Korean text search. They are vector-first systems where Korean BM25 has to be bolted on from the outside.

All code and per-phase analysis is published at github.com/ysys143/textsearch.