Overview

I built an API server for searching the Koui Genji Monogatari (Collated Tale of Genji) Text DB, so here are my notes.

https://genji-api.aws.ldas.jp/

Background

The following page publishes the text data of "Koui Genji Monogatari" in a TEI/XML-compliant format.

https://kouigenjimonogatari.github.io/

This text data is registered in Elasticsearch to create an API that enables searching by text segments.

Usage

The usage documentation page using OpenAPI and Swagger is accessible at the following URL:

https://genji-api.aws.ldas.jp/

Key Features

Query Expansion

For example, the following URL is an example with "Yugao" as the search keyword. The input/output format follows JSON:API.

https://genji-api.aws.ldas.jp/search?q=ๅค•้ก”&page[limit]=20&page[offset]=0&sort=page&filter[expandRepeatMarks]=true&filter[unifyKanjiKana]=true&filter[unifyHistoricalKana]=true&filter[unifyPhoneticChanges]=true&filter[unifyDakuon]=true&filter[vol_str]=04 ๅค•้ก”

The following result is returned. Variations are generated from the input keyword "ๅค•้ก”" (Yugao), and the search is performed based on these.

{
  "data": [],
  "meta": {
    "query": "ๅค•้ก”",
    "transformedQueries": [
      "ๅค•้ก”",
      "ใ‚†ใ†ใ‹ใŠ",
      "ใ‚†ใตใ‹ใŠ",
      "ใ‚†ใตใ‹ใป",
      "ใ‚†ใ†ใ‹ใป",
      "ๅค•ใ‹ใŠ",
      "ๅค•ใ‹ใป",
      "ใ‚†ใ†้ก”",
      "ใ‚†ใต้ก”"
    ],
    "transformOptions": {
      "expandRepeatMarks": true,
      "unifyKanjiKana": true,
      "unifyHistoricalKana": true,
      "unifyPhoneticChanges": true,
      "unifyDakuon": true
    },
    "filters": {
      "expandRepeatMarks": true,
      "unifyKanjiKana": true,
      "unifyHistoricalKana": true,
      "unifyPhoneticChanges": true,
      "unifyDakuon": true,
      "vol_str": "04 ๅค•้ก”"
    },
    "sort": "page",
    "limit": 20,
    "offset": 0,
    "total": 7,
    "aggregations": {
      "vol_str": {
        "doc_count_error_upper_bound": 0,
        "sum_other_doc_count": 0,
        "buckets": [
          {
            "key": "04 ๅค•้ก”",
            "doc_count": 7
          }
        ]
      }
    }
  }
}

As a result, occurrences of "ใ‚†ใตใ‹ใป," "ๅค•ใ‹ใป," and "ๅค•้ก”" appearing in the body text can all be searched at once.

The search keyword expansion allows toggling search options ON/OFF. For details, please check the Swagger UI mentioned above.

The following OR search query is sent to Elasticsearch:

{
  "query": {
    "bool": {
      "should": [
        {
          "wildcard": {
            "original_text_lines.keyword": "*ๅค•้ก”*"
          }
        },
        {
          "wildcard": {
            "original_text_lines.keyword": "*ใ‚†ใ†ใ‹ใŠ*"
          }
        },
        {
          "wildcard": {
            "original_text_lines.keyword": "*ใ‚†ใตใ‹ใŠ*"
          }
        },
        {
          "wildcard": {
            "original_text_lines.keyword": "*ใ‚†ใตใ‹ใป*"
          }
        },
        {
          "wildcard": {
            "original_text_lines.keyword": "*ใ‚†ใ†ใ‹ใป*"
          }
        },
        {
          "wildcard": {
            "original_text_lines.keyword": "*ๅค•ใ‹ใŠ*"
          }
        },
        {
          "wildcard": {
            "original_text_lines.keyword": "*ๅค•ใ‹ใป*"
          }
        },
        {
          "wildcard": {
            "original_text_lines.keyword": "*ใ‚†ใ†้ก”*"
          }
        },
        {
          "wildcard": {
            "original_text_lines.keyword": "*ใ‚†ใต้ก”*"
          }
        }
      ],
      "minimum_should_match": 1,
      "filter": {
        "terms": {
          "vol_str": [
            "04 ๅค•้ก”"
          ]
        }
      }
    }
  },
  "size": 20,
  "from": 0,
  "sort": [
    {
      "page": {
        "order": "asc"
      }
    }
  ]
}

The rules used for conversion can be checked at the following URL:

https://genji-api.aws.ldas.jp/normalization/rules

{
  "data": {
    "type": "normalization-rules",
    "attributes": {
      "rules": {
        "historicalKana": {
          "ใ‚": "ใ„",
          "ใ‚‘": "ใˆ",
          "ใ‚’": "ใŠ",
          "ใƒฏ": "ใƒฏ",
          "ใƒฐ": "ใ‚ค",
          "ใƒฑ": "ใ‚จ",
          "ใƒฒ": "ใ‚ช",
          "ใใ‚Ž": "ใ‹",
          "ใใ‚Ž": "ใŒ",
          "ใ‚ฏใƒฎ": "ใ‚ซ",
          "ใ‚ฐใƒฎ": "ใ‚ฌ"
        },
        "dakuon": {
          "ใŒ": "ใ‹",
          "ใŽ": "ใ",
          "ใ": "ใ",
          "ใ’": "ใ‘",
          "ใ”": "ใ“",
          "ใ–": "ใ•",
          "ใ˜": "ใ—",
          "ใš": "ใ™",
          "ใœ": "ใ›",
          "ใž": "ใ",
          "ใ ": "ใŸ",
          "ใข": "ใก",
          "ใฅ": "ใค",
          "ใง": "ใฆ",
          "ใฉ": "ใจ",
          "ใฐ": "ใฏ",
          "ใณ": "ใฒ",
          "ใถ": "ใต",
          "ใน": "ใธ",
          "ใผ": "ใป",
          "ใฑ": "ใฏ",
          "ใด": "ใฒ",
          "ใท": "ใต",
          "ใบ": "ใธ",
          "ใฝ": "ใป",
          "ใ‚ฌ": "ใ‚ซ",
          "ใ‚ฎ": "ใ‚ญ",
          "ใ‚ฐ": "ใ‚ฏ",
          "ใ‚ฒ": "ใ‚ฑ",
          "ใ‚ด": "ใ‚ณ",
          "ใ‚ถ": "ใ‚ต",
          "ใ‚ธ": "ใ‚ท",
          "ใ‚บ": "ใ‚น",
          "ใ‚ผ": "ใ‚ป",
          "ใ‚พ": "ใ‚ฝ",
          "ใƒ€": "ใ‚ฟ",
          "ใƒ‚": "ใƒ",
          "ใƒ…": "ใƒ„",
          "ใƒ‡": "ใƒ†",
          "ใƒ‰": "ใƒˆ",
          "ใƒ": "ใƒ",
          "ใƒ“": "ใƒ’",
          "ใƒ–": "ใƒ•",
          "ใƒ™": "ใƒ˜",
          "ใƒœ": "ใƒ›",
          "ใƒ‘": "ใƒ",
          "ใƒ”": "ใƒ’",
          "ใƒ—": "ใƒ•",
          "ใƒš": "ใƒ˜",
          "ใƒ": "ใƒ›"
        },
        "kanjiKana": {
          "ๆกๅฃบ": "ใใ‚Šใคใป",
          "ๅธšๆœจ": "ใฏใฏใใŽ",
          "็ฉบ่‰": "ใ†ใคใ›ใฟ",
          "ๅค•้ก”": "ใ‚†ใ†ใŒใŠ",
          "่‹ฅ็ดซ": "ใ‚ใ‹ใ‚€ใ‚‰ใ•ใ",
          "ๆœซๆ‘˜่Šฑ": "ใ™ใˆใคใ‚€ใฏใช",
          "็ด…่‘‰่ณ€": "ใ‚‚ใฟใ˜ใฎใŒ",
          "่Šฑๅฎด": "ใฏใชใฎใˆใ‚“",
          "่‘ต": "ใ‚ใŠใ„",
          "่ณขๆœจ": "ใ•ใ‹ใ",
          "่Šฑๆ•ฃ้‡Œ": "ใฏใชใกใ‚‹ใ•ใจ",
          "้ ˆ็ฃจ": "ใ™ใพ",
          "ๆ˜Ž็Ÿณ": "ใ‚ใ‹ใ—",
          "ๆพชๆจ™": "ใฟใŠใคใใ—",
          "่“ฌ็”Ÿ": "ใ‚ˆใ‚‚ใใต",
          "้–ขๅฑ‹": "ใ›ใใ‚„",
          "็ตตๅˆ": "ใˆใ‚ใ‚ใ›",
          "ๆพ้ขจ": "ใพใคใ‹ใ›",
          "่–„้›ฒ": "ใ†ใ™ใใ‚‚",
          "ๆœ้ก”": "ใ‚ใ•ใ‹ใŠ",
          "ๅฐ‘ๅฅณ": "ใŠใจใ‚",
          "็މ้ฌ˜": "ใŸใพใ‹ใฅใ‚‰",
          "ๅˆ้Ÿณ": "ใฏใคใญ",
          "่ƒก่ถ": "ใ“ใกใ‚ˆใ†",
          "่žข": "ใปใŸใ‚‹",
          "่›": "ใปใŸใ‚‹",
          "ๅธธๅค": "ใจใ“ใชใค",
          "็ฏ็ซ": "ใ‹ใ‹ใ‚Šใฒ",
          "้‡Žๅˆ†": "ใฎใ‚ใ",
          "่กŒๅนธ": "ใฟใ‚†ใ",
          "่—ค่ขด": "ใตใกใฏใ‹ใพ",
          "็œŸๆœจๆŸฑ": "ใพใใฏใ—ใ‚‰",
          "ๆข…ๆž": "ใ†ใ‚ใ‹ใˆ",
          "่—ค่ฃ่‘‰": "ใตใกใฎใ†ใ‚‰ใฏ",
          "่‹ฅ่œไธŠ": "ใ‚ใ‹ใชใ˜ใ‚‡ใ†",
          "่‹ฅ่œไธ‹": "ใ‚ใ‹ใชใ’",
          "่‹ฅ่œ": "ใ‚ใ‹ใช",
          "ๆŸๆœจ": "ใ‹ใ—ใ‚ใ",
          "ๆจช็ฌ›": "ใ‚ˆใ“ใตใˆ",
          "้ˆด่™ซ": "ใ™ใ™ใ‚€ใ—",
          "ๅค•้œง": "ใ‚†ใ†ใใ‚Š",
          "ๅพกๆณ•": "ใฟใฎใ‚Š",
          "ๅนป": "ใพใปใ‚ใ—",
          "ๅŒ‚ๅฎฎ": "ใซใŠใ†ใฟใ‚„",
          "็ด…ๆข…": "ใ“ใ†ใฏใ„",
          "็ซนๆฒณ": "ใŸใ‘ใ‹ใ‚",
          "ๆฉ‹ๅงซ": "ใฏใ—ใฒใ‚",
          "ๆคŽๆœฌ": "ใ—ใ„ใ‹ใ‚‚ใจ",
          "็ท่ง’": "ใ‚ใ‘ใพใ",
          "ๆ—ฉ่•จ": "ใ•ใ‚ใ‚‰ใฒ",
          "ๅฎฟๆœจ": "ใ‚„ใจใ‚Šใ",
          "ๆฑๅฑ‹": "ใ‚ใ™ใพใ‚„",
          "ๆตฎ่ˆŸ": "ใ†ใใตใญ",
          "่œป่›‰": "ใ‹ใ‘ใ‚ใ†",
          "ๆ‰‹็ฟ’": "ใฆใชใ‚‰ใ„",
          "ๅคขๆตฎๆฉ‹": "ใ‚†ใ‚ใฎใ†ใใฏใ—",
          "้›ฒ้š ": "ใใ‚‚ใ‹ใใ‚Œ",
          "็މ": "ใŸใพ",
          "้ฌ˜": "ใ‹ใคใ‚‰",
          "ๅค•": "ใ‚†ใ†",
          "้ก”": "ใ‹ใŠ",
          "็ดซ": "ใ‚€ใ‚‰ใ•ใ",
          "็ด…่‘‰": "ใ‚‚ใฟใก",
          "ๆœฑ้›€": "ใ™ใ•ใ",
          "่—คๅฃบ": "ใตใกใคใป",
          "ๆƒŸๅ…‰": "ใ“ใ‚Œใฟใค",
          "ๆบๆฐ": "ใ’ใ‚“ใ˜",
          "็‰ฉ่ชž": "ใ‚‚ใฎใŒใŸใ‚Š",
          "็ดซๅผ้ƒจ": "ใ‚€ใ‚‰ใ•ใใ—ใใถ",
          "ๅ…‰ๆบๆฐ": "ใฒใ‹ใ‚‹ใ’ใ‚“ใ˜",
          "ๆกๅฃบๅธ": "ใใ‚Šใคใผใฆใ„",
          "ๆ›ด่กฃ": "ใ“ใ†ใ„",
          "ๅพกๆฏๆ‰€": "ใฟใ‚„ใ™ใฉใ“ใ‚",
          "ๅ…ฅ้“": "ใซใ‚…ใ†ใฉใ†",
          "ๅคง่‡ฃ": "ใ ใ„ใ˜ใ‚“",
          "ไธญๅฎฎ": "ใกใ‚…ใ†ใใ†",
          "ๅฅณ้™ข": "ใซใ‚‡ใ†ใ„ใ‚“",
          "ๅฎฎ": "ใฟใ‚„",
          "ๅ›": "ใใฟ",
          "ไธŠ": "ใ†ใˆ",
          "ๆฎฟ": "ใจใฎ",
          "ๅพกๅ‰": "ใŠใพใˆ",
          "ๅงซๅ›": "ใฒใ‚ใŽใฟ",
          "่‹ฅๅ›": "ใ‚ใ‹ใŽใฟ",
          "ๅ†…่ฃ": "ใ ใ„ใ‚Š",
          "ๅพกๆ‰€": "ใ”ใ—ใ‚‡",
          "้‡Œ": "ใ•ใจ",
          "ๅ…ญๆก": "ใ‚ใใ˜ใ‚‡ใ†",
          "ไบŒๆก": "ใซใ˜ใ‚‡ใ†",
          "ไธ‰ๆก": "ใ•ใ‚“ใ˜ใ‚‡ใ†",
          "ๅ››ๆก": "ใ—ใ˜ใ‚‡ใ†",
          "ไบ”ๆก": "ใ”ใ˜ใ‚‡ใ†",
          "ไธƒๆก": "ใ—ใกใ˜ใ‚‡ใ†",
          "ๅ…ซๆก": "ใฏใกใ˜ใ‚‡ใ†",
          "ไนๆก": "ใใ˜ใ‚‡ใ†",
          "ๅๆก": "ใ˜ใ‚…ใ†ใ˜ใ‚‡ใ†"
        },
        "kanaKanji": {
          "ใใ‚Šใคใป": "ๆกๅฃบ",
          "ใฏใฏใใŽ": "ๅธšๆœจ",
          "ใ†ใคใ›ใฟ": "็ฉบ่‰",
          "ใ‚†ใ†ใŒใŠ": "ๅค•้ก”",
          "ใ‚ใ‹ใ‚€ใ‚‰ใ•ใ": "่‹ฅ็ดซ",
          "ใ™ใˆใคใ‚€ใฏใช": "ๆœซๆ‘˜่Šฑ",
          "ใ‚‚ใฟใ˜ใฎใŒ": "็ด…่‘‰่ณ€",
          "ใฏใชใฎใˆใ‚“": "่Šฑๅฎด",
          "ใ‚ใŠใ„": "่‘ต",
          "ใ•ใ‹ใ": "่ณขๆœจ",
          "ใฏใชใกใ‚‹ใ•ใจ": "่Šฑๆ•ฃ้‡Œ",
          "ใ™ใพ": "้ ˆ็ฃจ",
          "ใ‚ใ‹ใ—": "ๆ˜Ž็Ÿณ",
          "ใฟใŠใคใใ—": "ๆพชๆจ™",
          "ใ‚ˆใ‚‚ใใต": "่“ฌ็”Ÿ",
          "ใ›ใใ‚„": "้–ขๅฑ‹",
          "ใˆใ‚ใ‚ใ›": "็ตตๅˆ",
          "ใพใคใ‹ใ›": "ๆพ้ขจ",
          "ใ†ใ™ใใ‚‚": "่–„้›ฒ",
          "ใ‚ใ•ใ‹ใŠ": "ๆœ้ก”",
          "ใŠใจใ‚": "ๅฐ‘ๅฅณ",
          "ใŸใพใ‹ใฅใ‚‰": "็މ้ฌ˜",
          "ใฏใคใญ": "ๅˆ้Ÿณ",
          "ใ“ใกใ‚ˆใ†": "่ƒก่ถ",
          "ใปใŸใ‚‹": "่›",
          "ใจใ“ใชใค": "ๅธธๅค",
          "ใ‹ใ‹ใ‚Šใฒ": "็ฏ็ซ",
          "ใฎใ‚ใ": "้‡Žๅˆ†",
          "ใฟใ‚†ใ": "่กŒๅนธ",
          "ใตใกใฏใ‹ใพ": "่—ค่ขด",
          "ใพใใฏใ—ใ‚‰": "็œŸๆœจๆŸฑ",
          "ใ†ใ‚ใ‹ใˆ": "ๆข…ๆž",
          "ใตใกใฎใ†ใ‚‰ใฏ": "่—ค่ฃ่‘‰",
          "ใ‚ใ‹ใชใ˜ใ‚‡ใ†": "่‹ฅ่œไธŠ",
          "ใ‚ใ‹ใชใ’": "่‹ฅ่œไธ‹",
          "ใ‚ใ‹ใช": "่‹ฅ่œ",
          "ใ‹ใ—ใ‚ใ": "ๆŸๆœจ",
          "ใ‚ˆใ“ใตใˆ": "ๆจช็ฌ›",
          "ใ™ใ™ใ‚€ใ—": "้ˆด่™ซ",
          "ใ‚†ใ†ใใ‚Š": "ๅค•้œง",
          "ใฟใฎใ‚Š": "ๅพกๆณ•",
          "ใพใปใ‚ใ—": "ๅนป",
          "ใซใŠใ†ใฟใ‚„": "ๅŒ‚ๅฎฎ",
          "ใ“ใ†ใฏใ„": "็ด…ๆข…",
          "ใŸใ‘ใ‹ใ‚": "็ซนๆฒณ",
          "ใฏใ—ใฒใ‚": "ๆฉ‹ๅงซ",
          "ใ—ใ„ใ‹ใ‚‚ใจ": "ๆคŽๆœฌ",
          "ใ‚ใ‘ใพใ": "็ท่ง’",
          "ใ•ใ‚ใ‚‰ใฒ": "ๆ—ฉ่•จ",
          "ใ‚„ใจใ‚Šใ": "ๅฎฟๆœจ",
          "ใ‚ใ™ใพใ‚„": "ๆฑๅฑ‹",
          "ใ†ใใตใญ": "ๆตฎ่ˆŸ",
          "ใ‹ใ‘ใ‚ใ†": "่œป่›‰",
          "ใฆใชใ‚‰ใ„": "ๆ‰‹็ฟ’",
          "ใ‚†ใ‚ใฎใ†ใใฏใ—": "ๅคขๆตฎๆฉ‹",
          "ใใ‚‚ใ‹ใใ‚Œ": "้›ฒ้š ",
          "ใŸใพ": "็މ",
          "ใ‹ใคใ‚‰": "้ฌ˜",
          "ใ‚†ใ†": "ๅค•",
          "ใ‹ใŠ": "้ก”",
          "ใ‚€ใ‚‰ใ•ใ": "็ดซ",
          "ใ‚‚ใฟใก": "็ด…่‘‰",
          "ใ™ใ•ใ": "ๆœฑ้›€",
          "ใตใกใคใป": "่—คๅฃบ",
          "ใ“ใ‚Œใฟใค": "ๆƒŸๅ…‰",
          "ใ’ใ‚“ใ˜": "ๆบๆฐ",
          "ใ‚‚ใฎใŒใŸใ‚Š": "็‰ฉ่ชž",
          "ใ‚€ใ‚‰ใ•ใใ—ใใถ": "็ดซๅผ้ƒจ",
          "ใฒใ‹ใ‚‹ใ’ใ‚“ใ˜": "ๅ…‰ๆบๆฐ",
          "ใใ‚Šใคใผใฆใ„": "ๆกๅฃบๅธ",
          "ใ“ใ†ใ„": "ๆ›ด่กฃ",
          "ใฟใ‚„ใ™ใฉใ“ใ‚": "ๅพกๆฏๆ‰€",
          "ใซใ‚…ใ†ใฉใ†": "ๅ…ฅ้“",
          "ใ ใ„ใ˜ใ‚“": "ๅคง่‡ฃ",
          "ใกใ‚…ใ†ใใ†": "ไธญๅฎฎ",
          "ใซใ‚‡ใ†ใ„ใ‚“": "ๅฅณ้™ข",
          "ใฟใ‚„": "ๅฎฎ",
          "ใใฟ": "ๅ›",
          "ใ†ใˆ": "ไธŠ",
          "ใจใฎ": "ๆฎฟ",
          "ใŠใพใˆ": "ๅพกๅ‰",
          "ใฒใ‚ใŽใฟ": "ๅงซๅ›",
          "ใ‚ใ‹ใŽใฟ": "่‹ฅๅ›",
          "ใ ใ„ใ‚Š": "ๅ†…่ฃ",
          "ใ”ใ—ใ‚‡": "ๅพกๆ‰€",
          "ใ•ใจ": "้‡Œ",
          "ใ‚ใใ˜ใ‚‡ใ†": "ๅ…ญๆก",
          "ใซใ˜ใ‚‡ใ†": "ไบŒๆก",
          "ใ•ใ‚“ใ˜ใ‚‡ใ†": "ไธ‰ๆก",
          "ใ—ใ˜ใ‚‡ใ†": "ๅ››ๆก",
          "ใ”ใ˜ใ‚‡ใ†": "ไบ”ๆก",
          "ใ—ใกใ˜ใ‚‡ใ†": "ไธƒๆก",
          "ใฏใกใ˜ใ‚‡ใ†": "ๅ…ซๆก",
          "ใใ˜ใ‚‡ใ†": "ไนๆก",
          "ใ˜ใ‚…ใ†ใ˜ใ‚‡ใ†": "ๅๆก"
        },
        "phoneticChange": {
          "ใต": "ใ†",
          "ใ‚€": "ใ‚“",
          "ใค": "ใฃ",
          "ใฏ": "ใ‚",
          "ใธ": "ใˆ",
          "ใ‚’": "ใŠ",
          "ใฒ": "ใ„",
          "ใ": "ใ†",
          "ใฌ": "ใ‚“",
          "ใƒ•": "ใ‚ฆ",
          "ใƒ ": "ใƒณ",
          "ใƒ„": "ใƒƒ",
          "ใƒ": "ใƒฏ",
          "ใƒ˜": "ใ‚จ",
          "ใƒฒ": "ใ‚ช",
          "ใƒ’": "ใ‚ค",
          "ใ‚ฏ": "ใ‚ฆ",
          "ใƒŒ": "ใƒณ"
        }
      },
      "stats": {
        "historicalKanaRules": 11,
        "dakuonRules": 50,
        "kanjiKanaRules": 96,
        "kanaKanjiRules": 95,
        "phoneticChangeRules": 18,
        "totalRules": 270
      },
      "options": {
        "unifyHistoricalKana": "Historical kana unification (ใ‚‘->ใˆ, ใ‚->ใ„)",
        "unifyDakuon": "Voiced consonant unification (ใŒ->ใ‹, ใš->ใ™)",
        "unifyKanjiKana": "Kanji-kana unification (็މ->ใŸใพ)",
        "unifyPhoneticChanges": "Phonetic change unification (ใต->ใ†, ใฏ->ใ‚)"
      },
      "description": {
        "historicalKana": "Unify historical kana usage to modern kana usage",
        "dakuon": "Unify voiced and semi-voiced consonants to voiceless consonants",
        "kanjiKana": "Convert kanji to corresponding kana",
        "kanaKanji": "Convert kana to corresponding kanji",
        "phoneticChange": "Unify phonetic changes (particles, etc.)"
      }
    }
  },
  "meta": {
    "version": "1.0.0",
    "lastUpdated": "2025-06-25T07:08:42.608Z"
  }
}

Summary

While there may be some incomplete aspects, I have introduced an example of building a search API server that includes a mechanism for absorbing orthographic variations in the original text.

I hope this serves as a useful reference.