I implemented three search backends — Elasticsearch, Cloudflare D1 (SQLite), and Static JSON (in-memory) — for a Japanese text search API running on Cloudflare Pages, and compared their performance.

Background

I’ve been running a full-text search API for classical Japanese texts. The existing setup used an external Elasticsearch cluster, but I wanted to explore alternatives for several reasons:

  • Reduce external service dependencies
  • Keep everything within Cloudflare Pages
  • The dataset is small (~1,800 records) — a full-text search engine might be overkill

Dataset

MetricValue
Records1,812
Total text (UTF-8)~2.5 MB
Average per record~1.4 KB

Each record contains classical Japanese text (a few to ~15 lines), a page number, volume name, and IIIF canvas URL.

Three Approaches

1. Elasticsearch (existing)

Executes wildcard queries against an external Elasticsearch cluster via the fetch API.

{
  wildcard: { 'original_text_lines.keyword': `*${query}*` }
}

The index uses an ngram analyzer, but the actual queries use wildcard (*query*), meaning the ngram index provides no benefit.

2. Cloudflare D1 (SQLite)

Stores data in Cloudflare D1 and uses LIKE for substring matching.

SELECT id, page, original_text, vol_str, canvas
FROM texts
WHERE original_text LIKE '%query%'
ORDER BY page ASC
LIMIT 20 OFFSET 0

Faceted counts use SQL GROUP BY:

SELECT vol_str, COUNT(*) as doc_count
FROM texts
WHERE original_text LIKE '%query%'
GROUP BY vol_str
ORDER BY vol_str ASC

D1’s batch() API allows running search, count, and aggregation queries in a single round-trip.

Bundles all data as a JSON file within the Worker and searches using Array.filter + String.includes.

filtered = data.filter(r =>
  queries.some(q => r.original_text.includes(q))
)

Aggregation is done with a Map for manual counting. The JSON file is ~2.7MB, well within Cloudflare Pages’ 25MB bundle limit.

Benchmark Results

Query: a common Japanese word, averaged over 5 runs with all normalization options enabled (dakuon unification, historical kana conversion, etc.).

BackendSearch TimeHitsStorage
Static JSON0.72ms111~1.0 MB
D1 (SQLite)17.1ms111~1.1 MB
Elasticsearch83.0ms1057.6 MB

Speed Comparison

Static JSON ████ 0.72ms (1x)
D1          ████████████████████████████████████████████ 17.1ms (24x)
ES          ████████████████████████████████████████████████████████████████████████████████████████████████████████████████ 83.0ms (115x)

Static JSON is ~24x faster than D1 and ~115x faster than Elasticsearch.

Storage Comparison

Static JSON ██████ 1.0 MB
D1          ██████ 1.1 MB
ES          ████████████████████████████████████████████████ 7.6 MB

Elasticsearch consumes ~3x more storage due to ngram index overhead.

Hit Count Discrepancy

Elasticsearch returns 105 hits vs. 111 for the other two. This is due to subtle behavioral differences between wildcard queries and LIKE/includes matching.

Why Static JSON Is Fastest

  1. No network round-trip — data lives in Worker memory
  2. No SQL parsingString.includes is a V8-optimized native operation
  3. Linear scan over 1,800 records is trivially fast — JavaScript can compare a few thousand strings in under 1ms

Why D1 Is Slower Than Static JSON

D1 is still 5x faster than ES, but compared to Static JSON it has overhead from:

  • SQL parsing and query plan generation
  • Storage layer I/O
  • LIKE '%query%' forces a full table scan (no index can help)

Why Elasticsearch Is Slowest

  • Network round-trip from Cloudflare Edge to the ES cluster
  • Wildcard query overhead
  • JSON response parsing

That said, 83ms is perfectly usable. For datasets with tens or hundreds of thousands of records, Elasticsearch’s indexing would likely reverse these results.

Implementation Details

Backend Switching

A common search interface allows switching backends via query parameter:

// /api/search?q=query&backend=static
// /api/search?q=query&backend=d1
// /api/search?q=query&backend=elasticsearch

Japanese Text Normalization

Query normalization (dakuon unification, historical kana conversion, etc.) is backend-agnostic JavaScript processing. Query variations are generated first, then OR-searched across any backend.

Input: いづれ
↓ Transform
Variations: [いつれ, ひつれ, ゐつれ]
↓ Search per backend
Static: queries.some(q => text.includes(q))
D1:     WHERE text LIKE '%いつれ%' OR text LIKE '%ひつれ%' OR ...
ES:     bool.should: [wildcard: *いつれ*, wildcard: *ひつれ*, ...]

Benchmark Endpoint

/api/benchmark?q=query&iterations=5 returns comparison results in JSON format.

Conclusion

For ~1,800 records and 2.5MB of text, Elasticsearch was unnecessary. Static JSON (in-memory search) delivers more than sufficient performance.

Choosing the Right Approach

Dataset SizeRecommended
Up to a few thousand recordsStatic JSON
Thousands to tens of thousandsD1 (SQLite)
Tens of thousands and aboveElasticsearch or similar

Our Choice

We adopted Static JSON, completely eliminating the Elasticsearch dependency. Search speed improved ~115x, and the entire application now runs within Cloudflare Pages with no external services.

References