Overview
I built an API server for searching the Koui Genji Monogatari (Collated Tale of Genji) Text DB, so here are my notes.
https://genji-api.aws.ldas.jp/

Background
The following page publishes the text data of "Koui Genji Monogatari" in a TEI/XML-compliant format.
https://kouigenjimonogatari.github.io/
This text data is registered in Elasticsearch to create an API that enables searching by text segments.
Usage
The usage documentation page using OpenAPI and Swagger is accessible at the following URL:
https://genji-api.aws.ldas.jp/
Key Features
Query Expansion
For example, the following URL is an example with "Yugao" as the search keyword. The input/output format follows JSON:API.
The following result is returned. Variations are generated from the input keyword "ๅค้ก" (Yugao), and the search is performed based on these.
{
"data": [],
"meta": {
"query": "ๅค้ก",
"transformedQueries": [
"ๅค้ก",
"ใใใใ",
"ใใตใใ",
"ใใตใใป",
"ใใใใป",
"ๅคใใ",
"ๅคใใป",
"ใใ้ก",
"ใใต้ก"
],
"transformOptions": {
"expandRepeatMarks": true,
"unifyKanjiKana": true,
"unifyHistoricalKana": true,
"unifyPhoneticChanges": true,
"unifyDakuon": true
},
"filters": {
"expandRepeatMarks": true,
"unifyKanjiKana": true,
"unifyHistoricalKana": true,
"unifyPhoneticChanges": true,
"unifyDakuon": true,
"vol_str": "04 ๅค้ก"
},
"sort": "page",
"limit": 20,
"offset": 0,
"total": 7,
"aggregations": {
"vol_str": {
"doc_count_error_upper_bound": 0,
"sum_other_doc_count": 0,
"buckets": [
{
"key": "04 ๅค้ก",
"doc_count": 7
}
]
}
}
}
}
As a result, occurrences of "ใใตใใป," "ๅคใใป," and "ๅค้ก" appearing in the body text can all be searched at once.

The search keyword expansion allows toggling search options ON/OFF. For details, please check the Swagger UI mentioned above.
The following OR search query is sent to Elasticsearch:
{
"query": {
"bool": {
"should": [
{
"wildcard": {
"original_text_lines.keyword": "*ๅค้ก*"
}
},
{
"wildcard": {
"original_text_lines.keyword": "*ใใใใ*"
}
},
{
"wildcard": {
"original_text_lines.keyword": "*ใใตใใ*"
}
},
{
"wildcard": {
"original_text_lines.keyword": "*ใใตใใป*"
}
},
{
"wildcard": {
"original_text_lines.keyword": "*ใใใใป*"
}
},
{
"wildcard": {
"original_text_lines.keyword": "*ๅคใใ*"
}
},
{
"wildcard": {
"original_text_lines.keyword": "*ๅคใใป*"
}
},
{
"wildcard": {
"original_text_lines.keyword": "*ใใ้ก*"
}
},
{
"wildcard": {
"original_text_lines.keyword": "*ใใต้ก*"
}
}
],
"minimum_should_match": 1,
"filter": {
"terms": {
"vol_str": [
"04 ๅค้ก"
]
}
}
}
},
"size": 20,
"from": 0,
"sort": [
{
"page": {
"order": "asc"
}
}
]
}
The rules used for conversion can be checked at the following URL:
https://genji-api.aws.ldas.jp/normalization/rules
{
"data": {
"type": "normalization-rules",
"attributes": {
"rules": {
"historicalKana": {
"ใ": "ใ",
"ใ": "ใ",
"ใ": "ใ",
"ใฏ": "ใฏ",
"ใฐ": "ใค",
"ใฑ": "ใจ",
"ใฒ": "ใช",
"ใใ": "ใ",
"ใใ": "ใ",
"ใฏใฎ": "ใซ",
"ใฐใฎ": "ใฌ"
},
"dakuon": {
"ใ": "ใ",
"ใ": "ใ",
"ใ": "ใ",
"ใ": "ใ",
"ใ": "ใ",
"ใ": "ใ",
"ใ": "ใ",
"ใ": "ใ",
"ใ": "ใ",
"ใ": "ใ",
"ใ ": "ใ",
"ใข": "ใก",
"ใฅ": "ใค",
"ใง": "ใฆ",
"ใฉ": "ใจ",
"ใฐ": "ใฏ",
"ใณ": "ใฒ",
"ใถ": "ใต",
"ใน": "ใธ",
"ใผ": "ใป",
"ใฑ": "ใฏ",
"ใด": "ใฒ",
"ใท": "ใต",
"ใบ": "ใธ",
"ใฝ": "ใป",
"ใฌ": "ใซ",
"ใฎ": "ใญ",
"ใฐ": "ใฏ",
"ใฒ": "ใฑ",
"ใด": "ใณ",
"ใถ": "ใต",
"ใธ": "ใท",
"ใบ": "ใน",
"ใผ": "ใป",
"ใพ": "ใฝ",
"ใ": "ใฟ",
"ใ": "ใ",
"ใ
": "ใ",
"ใ": "ใ",
"ใ": "ใ",
"ใ": "ใ",
"ใ": "ใ",
"ใ": "ใ",
"ใ": "ใ",
"ใ": "ใ",
"ใ": "ใ",
"ใ": "ใ",
"ใ": "ใ",
"ใ": "ใ",
"ใ": "ใ"
},
"kanjiKana": {
"ๆกๅฃบ": "ใใใคใป",
"ๅธๆจ": "ใฏใฏใใ",
"็ฉบ่": "ใใคใใฟ",
"ๅค้ก": "ใใใใ",
"่ฅ็ดซ": "ใใใใใใ",
"ๆซๆ่ฑ": "ใใใคใใฏใช",
"็ด
่่ณ": "ใใฟใใฎใ",
"่ฑๅฎด": "ใฏใชใฎใใ",
"่ต": "ใใใ",
"่ณขๆจ": "ใใใ",
"่ฑๆฃ้": "ใฏใชใกใใใจ",
"้ ็ฃจ": "ใใพ",
"ๆ็ณ": "ใใใ",
"ๆพชๆจ": "ใฟใใคใใ",
"่ฌ็": "ใใใใต",
"้ขๅฑ": "ใใใ",
"็ตตๅ": "ใใใใ",
"ๆพ้ขจ": "ใพใคใใ",
"่้ฒ": "ใใใใ",
"ๆ้ก": "ใใใใ",
"ๅฐๅฅณ": "ใใจใ",
"็้ฌ": "ใใพใใฅใ",
"ๅ้ณ": "ใฏใคใญ",
"่ก่ถ": "ใใกใใ",
"่ข": "ใปใใ",
"่": "ใปใใ",
"ๅธธๅค": "ใจใใชใค",
"็ฏ็ซ": "ใใใใฒ",
"้ๅ": "ใฎใใ",
"่กๅนธ": "ใฟใใ",
"่ค่ขด": "ใตใกใฏใใพ",
"็ๆจๆฑ": "ใพใใฏใใ",
"ๆข
ๆ": "ใใใใ",
"่ค่ฃ่": "ใตใกใฎใใใฏ",
"่ฅ่ไธ": "ใใใชใใใ",
"่ฅ่ไธ": "ใใใชใ",
"่ฅ่": "ใใใช",
"ๆๆจ": "ใใใใ",
"ๆจช็ฌ": "ใใใตใ",
"้ด่ซ": "ใใใใ",
"ๅค้ง": "ใใใใ",
"ๅพกๆณ": "ใฟใฎใ",
"ๅนป": "ใพใปใใ",
"ๅๅฎฎ": "ใซใใใฟใ",
"็ด
ๆข
": "ใใใฏใ",
"็ซนๆฒณ": "ใใใใ",
"ๆฉๅงซ": "ใฏใใฒใ",
"ๆคๆฌ": "ใใใใใจ",
"็ท่ง": "ใใใพใ",
"ๆฉ่จ": "ใใใใฒ",
"ๅฎฟๆจ": "ใใจใใ",
"ๆฑๅฑ": "ใใใพใ",
"ๆตฎ่": "ใใใตใญ",
"่ป่": "ใใใใ",
"ๆ็ฟ": "ใฆใชใใ",
"ๅคขๆตฎๆฉ": "ใใใฎใใใฏใ",
"้ฒ้ ": "ใใใใใ",
"็": "ใใพ",
"้ฌ": "ใใคใ",
"ๅค": "ใใ",
"้ก": "ใใ",
"็ดซ": "ใใใใ",
"็ด
่": "ใใฟใก",
"ๆฑ้": "ใใใ",
"่คๅฃบ": "ใตใกใคใป",
"ๆๅ
": "ใใใฟใค",
"ๆบๆฐ": "ใใใ",
"็ฉ่ช": "ใใฎใใใ",
"็ดซๅผ้จ": "ใใใใใใใถ",
"ๅ
ๆบๆฐ": "ใฒใใใใใ",
"ๆกๅฃบๅธ": "ใใใคใผใฆใ",
"ๆด่กฃ": "ใใใ",
"ๅพกๆฏๆ": "ใฟใใใฉใใ",
"ๅ
ฅ้": "ใซใ
ใใฉใ",
"ๅคง่ฃ": "ใ ใใใ",
"ไธญๅฎฎ": "ใกใ
ใใใ",
"ๅฅณ้ข": "ใซใใใใ",
"ๅฎฎ": "ใฟใ",
"ๅ": "ใใฟ",
"ไธ": "ใใ",
"ๆฎฟ": "ใจใฎ",
"ๅพกๅ": "ใใพใ",
"ๅงซๅ": "ใฒใใใฟ",
"่ฅๅ": "ใใใใฟ",
"ๅ
่ฃ": "ใ ใใ",
"ๅพกๆ": "ใใใ",
"้": "ใใจ",
"ๅ
ญๆก": "ใใใใใ",
"ไบๆก": "ใซใใใ",
"ไธๆก": "ใใใใใ",
"ๅๆก": "ใใใใ",
"ไบๆก": "ใใใใ",
"ไธๆก": "ใใกใใใ",
"ๅ
ซๆก": "ใฏใกใใใ",
"ไนๆก": "ใใใใ",
"ๅๆก": "ใใ
ใใใใ"
},
"kanaKanji": {
"ใใใคใป": "ๆกๅฃบ",
"ใฏใฏใใ": "ๅธๆจ",
"ใใคใใฟ": "็ฉบ่",
"ใใใใ": "ๅค้ก",
"ใใใใใใ": "่ฅ็ดซ",
"ใใใคใใฏใช": "ๆซๆ่ฑ",
"ใใฟใใฎใ": "็ด
่่ณ",
"ใฏใชใฎใใ": "่ฑๅฎด",
"ใใใ": "่ต",
"ใใใ": "่ณขๆจ",
"ใฏใชใกใใใจ": "่ฑๆฃ้",
"ใใพ": "้ ็ฃจ",
"ใใใ": "ๆ็ณ",
"ใฟใใคใใ": "ๆพชๆจ",
"ใใใใต": "่ฌ็",
"ใใใ": "้ขๅฑ",
"ใใใใ": "็ตตๅ",
"ใพใคใใ": "ๆพ้ขจ",
"ใใใใ": "่้ฒ",
"ใใใใ": "ๆ้ก",
"ใใจใ": "ๅฐๅฅณ",
"ใใพใใฅใ": "็้ฌ",
"ใฏใคใญ": "ๅ้ณ",
"ใใกใใ": "่ก่ถ",
"ใปใใ": "่",
"ใจใใชใค": "ๅธธๅค",
"ใใใใฒ": "็ฏ็ซ",
"ใฎใใ": "้ๅ",
"ใฟใใ": "่กๅนธ",
"ใตใกใฏใใพ": "่ค่ขด",
"ใพใใฏใใ": "็ๆจๆฑ",
"ใใใใ": "ๆข
ๆ",
"ใตใกใฎใใใฏ": "่ค่ฃ่",
"ใใใชใใใ": "่ฅ่ไธ",
"ใใใชใ": "่ฅ่ไธ",
"ใใใช": "่ฅ่",
"ใใใใ": "ๆๆจ",
"ใใใตใ": "ๆจช็ฌ",
"ใใใใ": "้ด่ซ",
"ใใใใ": "ๅค้ง",
"ใฟใฎใ": "ๅพกๆณ",
"ใพใปใใ": "ๅนป",
"ใซใใใฟใ": "ๅๅฎฎ",
"ใใใฏใ": "็ด
ๆข
",
"ใใใใ": "็ซนๆฒณ",
"ใฏใใฒใ": "ๆฉๅงซ",
"ใใใใใจ": "ๆคๆฌ",
"ใใใพใ": "็ท่ง",
"ใใใใฒ": "ๆฉ่จ",
"ใใจใใ": "ๅฎฟๆจ",
"ใใใพใ": "ๆฑๅฑ",
"ใใใตใญ": "ๆตฎ่",
"ใใใใ": "่ป่",
"ใฆใชใใ": "ๆ็ฟ",
"ใใใฎใใใฏใ": "ๅคขๆตฎๆฉ",
"ใใใใใ": "้ฒ้ ",
"ใใพ": "็",
"ใใคใ": "้ฌ",
"ใใ": "ๅค",
"ใใ": "้ก",
"ใใใใ": "็ดซ",
"ใใฟใก": "็ด
่",
"ใใใ": "ๆฑ้",
"ใตใกใคใป": "่คๅฃบ",
"ใใใฟใค": "ๆๅ
",
"ใใใ": "ๆบๆฐ",
"ใใฎใใใ": "็ฉ่ช",
"ใใใใใใใถ": "็ดซๅผ้จ",
"ใฒใใใใใ": "ๅ
ๆบๆฐ",
"ใใใคใผใฆใ": "ๆกๅฃบๅธ",
"ใใใ": "ๆด่กฃ",
"ใฟใใใฉใใ": "ๅพกๆฏๆ",
"ใซใ
ใใฉใ": "ๅ
ฅ้",
"ใ ใใใ": "ๅคง่ฃ",
"ใกใ
ใใใ": "ไธญๅฎฎ",
"ใซใใใใ": "ๅฅณ้ข",
"ใฟใ": "ๅฎฎ",
"ใใฟ": "ๅ",
"ใใ": "ไธ",
"ใจใฎ": "ๆฎฟ",
"ใใพใ": "ๅพกๅ",
"ใฒใใใฟ": "ๅงซๅ",
"ใใใใฟ": "่ฅๅ",
"ใ ใใ": "ๅ
่ฃ",
"ใใใ": "ๅพกๆ",
"ใใจ": "้",
"ใใใใใ": "ๅ
ญๆก",
"ใซใใใ": "ไบๆก",
"ใใใใใ": "ไธๆก",
"ใใใใ": "ๅๆก",
"ใใใใ": "ไบๆก",
"ใใกใใใ": "ไธๆก",
"ใฏใกใใใ": "ๅ
ซๆก",
"ใใใใ": "ไนๆก",
"ใใ
ใใใใ": "ๅๆก"
},
"phoneticChange": {
"ใต": "ใ",
"ใ": "ใ",
"ใค": "ใฃ",
"ใฏ": "ใ",
"ใธ": "ใ",
"ใ": "ใ",
"ใฒ": "ใ",
"ใ": "ใ",
"ใฌ": "ใ",
"ใ": "ใฆ",
"ใ ": "ใณ",
"ใ": "ใ",
"ใ": "ใฏ",
"ใ": "ใจ",
"ใฒ": "ใช",
"ใ": "ใค",
"ใฏ": "ใฆ",
"ใ": "ใณ"
}
},
"stats": {
"historicalKanaRules": 11,
"dakuonRules": 50,
"kanjiKanaRules": 96,
"kanaKanjiRules": 95,
"phoneticChangeRules": 18,
"totalRules": 270
},
"options": {
"unifyHistoricalKana": "Historical kana unification (ใ->ใ, ใ->ใ)",
"unifyDakuon": "Voiced consonant unification (ใ->ใ, ใ->ใ)",
"unifyKanjiKana": "Kanji-kana unification (็->ใใพ)",
"unifyPhoneticChanges": "Phonetic change unification (ใต->ใ, ใฏ->ใ)"
},
"description": {
"historicalKana": "Unify historical kana usage to modern kana usage",
"dakuon": "Unify voiced and semi-voiced consonants to voiceless consonants",
"kanjiKana": "Convert kanji to corresponding kana",
"kanaKanji": "Convert kana to corresponding kanji",
"phoneticChange": "Unify phonetic changes (particles, etc.)"
}
}
},
"meta": {
"version": "1.0.0",
"lastUpdated": "2025-06-25T07:08:42.608Z"
}
}
Summary
While there may be some incomplete aspects, I have introduced an example of building a search API server that includes a mechanism for absorbing orthographic variations in the original text.
I hope this serves as a useful reference.


