Elasticsearch → Static JSON / D1 Migration — You Don't Need a Search Engine for Small Datasets

I implemented three search backends — Elasticsearch, Cloudflare D1 (SQLite), and Static JSON (in-memory) — for a Japanese text search API running on Cloudflare Pages, and compared their performance.

Background

I’ve been running a full-text search API for classical Japanese texts. The existing setup used an external Elasticsearch cluster, but I wanted to explore alternatives for several reasons:

Reduce external service dependencies
Keep everything within Cloudflare Pages
The dataset is small (~1,800 records) — a full-text search engine might be overkill

Dataset

Metric	Value
Records	1,812
Total text (UTF-8)	~2.5 MB
Average per record	~1.4 KB

Each record contains classical Japanese text (a few to ~15 lines), a page number, volume name, and IIIF canvas URL.

Three Approaches

1. Elasticsearch (existing)

Executes wildcard queries against an external Elasticsearch cluster via the fetch API.

{
  wildcard: { 'original_text_lines.keyword': `*${query}*` }
}

The index uses an ngram analyzer, but the actual queries use wildcard (*query*), meaning the ngram index provides no benefit.

2. Cloudflare D1 (SQLite)

Stores data in Cloudflare D1 and uses LIKE for substring matching.

SELECT id, page, original_text, vol_str, canvas
FROM texts
WHERE original_text LIKE '%query%'
ORDER BY page ASC
LIMIT 20 OFFSET 0

Faceted counts use SQL GROUP BY:

SELECT vol_str, COUNT(*) as doc_count
FROM texts
WHERE original_text LIKE '%query%'
GROUP BY vol_str
ORDER BY vol_str ASC

D1’s batch() API allows running search, count, and aggregation queries in a single round-trip.

3. Static JSON (in-memory search)

Bundles all data as a JSON file within the Worker and searches using Array.filter + String.includes.

filtered = data.filter(r =>
  queries.some(q => r.original_text.includes(q))
)

Aggregation is done with a Map for manual counting. The JSON file is ~2.7MB, well within Cloudflare Pages’ 25MB bundle limit.

Benchmark Results

Query: a common Japanese word, averaged over 5 runs with all normalization options enabled (dakuon unification, historical kana conversion, etc.).

Backend	Search Time	Hits	Storage
Static JSON	0.72ms	111	~1.0 MB
D1 (SQLite)	17.1ms	111	~1.1 MB
Elasticsearch	83.0ms	105	7.6 MB

Speed Comparison

Static JSON ████ 0.72ms (1x)
D1          ████████████████████████████████████████████ 17.1ms (24x)
ES          ████████████████████████████████████████████████████████████████████████████████████████████████████████████████ 83.0ms (115x)

Static JSON is ~24x faster than D1 and ~115x faster than Elasticsearch.

Storage Comparison

Static JSON ██████ 1.0 MB
D1          ██████ 1.1 MB
ES          ████████████████████████████████████████████████ 7.6 MB

Elasticsearch consumes ~3x more storage due to ngram index overhead.

Hit Count Discrepancy

Elasticsearch returns 105 hits vs. 111 for the other two. This is due to subtle behavioral differences between wildcard queries and LIKE/includes matching.

Why Static JSON Is Fastest

No network round-trip — data lives in Worker memory
No SQL parsing — String.includes is a V8-optimized native operation
Linear scan over 1,800 records is trivially fast — JavaScript can compare a few thousand strings in under 1ms

Why D1 Is Slower Than Static JSON

D1 is still 5x faster than ES, but compared to Static JSON it has overhead from:

SQL parsing and query plan generation
Storage layer I/O
LIKE '%query%' forces a full table scan (no index can help)

Why Elasticsearch Is Slowest

Network round-trip from Cloudflare Edge to the ES cluster
Wildcard query overhead
JSON response parsing

That said, 83ms is perfectly usable. For datasets with tens or hundreds of thousands of records, Elasticsearch’s indexing would likely reverse these results.

Implementation Details

Backend Switching

A common search interface allows switching backends via query parameter:

// /api/search?q=query&backend=static
// /api/search?q=query&backend=d1
// /api/search?q=query&backend=elasticsearch

Japanese Text Normalization

Query normalization (dakuon unification, historical kana conversion, etc.) is backend-agnostic JavaScript processing. Query variations are generated first, then OR-searched across any backend.

Input: いづれ
↓ Transform
Variations: [いつれ, ひつれ, ゐつれ]
↓ Search per backend
Static: queries.some(q => text.includes(q))
D1:     WHERE text LIKE '%いつれ%' OR text LIKE '%ひつれ%' OR ...
ES:     bool.should: [wildcard: *いつれ*, wildcard: *ひつれ*, ...]

Benchmark Endpoint

/api/benchmark?q=query&iterations=5 returns comparison results in JSON format.

Conclusion

For ~1,800 records and 2.5MB of text, Elasticsearch was unnecessary. Static JSON (in-memory search) delivers more than sufficient performance.

Choosing the Right Approach

Dataset Size	Recommended
Up to a few thousand records	Static JSON
Thousands to tens of thousands	D1 (SQLite)
Tens of thousands and above	Elasticsearch or similar

Our Choice

We adopted Static JSON, completely eliminating the Elasticsearch dependency. Search speed improved ~115x, and the entire application now runs within Cloudflare Pages with no external services.

Background#

Dataset#

Three Approaches#

1. Elasticsearch (existing)#

2. Cloudflare D1 (SQLite)#

3. Static JSON (in-memory search)#

Benchmark Results#

Speed Comparison#

Storage Comparison#

Hit Count Discrepancy#

Why Static JSON Is Fastest#

Why D1 Is Slower Than Static JSON#

Why Elasticsearch Is Slowest#

Implementation Details#

Backend Switching#

Japanese Text Normalization#

Benchmark Endpoint#

Conclusion#

Choosing the Right Approach#

Our Choice#

References#