Overview

Omeka-S is a powerful digital archive system, but Japanese full-text search barely works by default. This article explains how to achieve Japanese full-text search by installing the MroongaSearch module.

Background: Why the MroongaSearch Module is Needed

Omeka-S’s standard full-text search (FullTextSearch module) uses the InnoDB engine, which has the following critical issues:

Example of Japanese word search:

Data: "Studying artificial intelligence at the University of Tokyo"
       (東京大学で人工知能を研究する)
Search term: "artificial intelligence" (人工知能)
Result: No hits

Since InnoDB’s full-text search assumes space-delimited languages like English, the following problems occur with Japanese:

  • Word search is impossible: The entire string is treated as a single word
  • Partial matching does not work: FULLTEXT indexes cannot properly process Japanese
  • Zero search results: Users cannot find anything

The MroongaSearch Module Solution

The MroongaSearch module solves this problem in two stages:

1. Fallback Feature (Active Immediately After Installation)

Important: Simply installing the MroongaSearch module enables Japanese search to work without any special configuration.

Data: "東京大学で人工知能を研究する"
Search term: "人工知能"

[Without MroongaSearch module]
→ Zero results

[With MroongaSearch module (even without Mroonga configured)]
→ Falls back to LIKE '%人工知能%'
→ Search results are returned!

The MroongaSearch module’s fallback feature:

  • Automatically detects CJK (Japanese, Chinese, Korean) single-word searches
  • Automatically falls back to LIKE '%term%' search
  • Works even when Mroonga is not configured
  • Without this, Japanese full-text search simply does not work properly

Additionally, configuring the Mroonga plugin in MariaDB enables:

  • Precise word search through morphological analysis
  • High-speed full-text search (hundreds of times faster than LIKE)
  • Strict AND/OR search control

What is the MroongaSearch Module?

MroongaSearch is a full-text search enhancement module for Omeka-S.

Key Features

  1. Automatic fallback feature

    • Enables CJK search even without Mroonga configured
    • Automatic switching to LIKE search
    • Ready to use immediately without configuration
  2. Mroonga integration

    • Precise search through morphological analysis
    • TokenMecab support
    • High-speed index search
  3. Diagnostics page

    • Plugin status check
    • Table engine display
    • Tokenizer information
    • Manual engine switching
  4. Strict AND/OR search

    • More precise search logic than standard FullTextSearch

Developers

  • Kentaro Fukuchi (initial version)
  • Kazufumi Fukuda (feature extensions)
  • Toshihito Waki (current maintainer)

Setup Procedure

Step 1: Installing the MroongaSearch Module

cd /path/to/omeka-s/modules
git clone https://github.com/wakitosh/MroongaSearch.git

Activate the module from the Omeka-S admin panel.

This alone enables Japanese search to work! (LIKE search fallback)

For faster and more precise search, configure the Mroonga plugin in MariaDB.

For Docker Environments

Directory structure:

omeka-s-docker/
├── Dockerfile
├── docker-compose.yml
└── mariadb/
    ├── Dockerfile
    └── init.sql

mariadb/Dockerfile:

FROM mariadb:latest

# Install Mroonga plugin and MeCab for Japanese tokenization
RUN apt-get update && \
    apt-get install -y \
    mariadb-plugin-mroonga \
    groonga-tokenizer-mecab \
    mecab \
    mecab-ipadic-utf8 && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

# Enable Mroonga plugin on startup
RUN echo "plugin_load_add = ha_mroonga" >> /etc/mysql/mariadb.conf.d/50-server.cnf

mariadb/init.sql:

-- Install Mroonga plugin and UDF functions
INSTALL SONAME 'ha_mroonga';

-- Install Mroonga UDF functions
CREATE FUNCTION IF NOT EXISTS mroonga_snippet HTML SONAME 'ha_mroonga.so';
CREATE FUNCTION IF NOT EXISTS mroonga_command RETURNS STRING SONAME 'ha_mroonga.so';
CREATE FUNCTION IF NOT EXISTS mroonga_escape RETURNS STRING SONAME 'ha_mroonga.so';

docker-compose.yml (mariadb section):

services:
  mariadb:
    build:
      context: ./mariadb
      dockerfile: Dockerfile
    restart: always
    volumes:
      - mariadb:/var/lib/mysql
      - ./mariadb/init.sql:/docker-entrypoint-initdb.d/init.sql
    environment:
      MYSQL_ROOT_PASSWORD: your_password
      MYSQL_DATABASE: omeka
      MYSQL_USER: omeka
      MYSQL_PASSWORD: omeka

Rebuilding the container:

docker compose down
docker compose build mariadb
docker compose up -d

Step 3: Verifying the Setup

1. Checking the Mroonga Plugin

docker exec <container-name> mariadb -u root -p<password> \
  -e "SHOW PLUGINS" | grep -i mroonga

Expected output:

Mroonga	ACTIVE	STORAGE ENGINE	ha_mroonga.so	GPL

2. Checking TokenMecab

docker exec <container-name> mariadb -u root -p<password> \
  -e "SELECT mroonga_command('tokenizer_list')"

Expected output (excerpt):

[{"name":"TokenMecab"},{"name":"TokenBigram"}, ...]

If TokenMecab is included, the setup is correct.

3. Checking the MroongaSearch Diagnostics Page

In the Omeka-S admin panel:

Modules → MroongaSearch → Configure → Diagnostics

Displayed information:

  • Plugin status: ACTIVE / NOT ACTIVE
  • Table engine: InnoDB / Mroonga
  • Tokenizer: TokenMecab / None
  • Mroonga effective: YES / NO

If “Mroonga effective: NO”:

  • The plugin is ACTIVE, but the table engine remains InnoDB
  • Fallback search (LIKE) is used
  • It works, but is slow

To set “Mroonga effective: YES”:

  • Manually switch the engine to Mroonga from the diagnostics page

  • Or change it directly via SQL:
ALTER TABLE omeka.fulltext_search
  ENGINE=Mroonga
  COMMENT='table "ms_fulltext" tokenizer "TokenMecab"';

4. Re-indexing

Run re-indexing from the diagnostics page or the Omeka-S admin panel.

How Search Works

Without Mroonga Configured (Fallback)

Search term: "人工知能" (CJK single word)

MroongaSearch module evaluation:
→ CJK characters detected
→ Mroonga not configured detected
→ Falls back to LIKE '%人工知能%'
→ Search results returned

With Mroonga + TokenMecab Configured

Data: "東京大学で人工知能を研究する"
Morphological analysis with TokenMecab:
→ "東京" / "大学" / "で" / "人工" / "知能" / "を" / "研究" / "する"

Search term: "人工知能"
→ Matches on "人工" AND "知能" (fast)

Search term: "東京"
→ Matches on "東京"

Search term: "研究"
→ Matches on "研究"

Substring Search Also Works

Mroonga supports not only morphological analysis but also substring search:

Search term: "工知"
→ Matches "人工知能"

This allows users to get results even when they do not know the exact word.

Morphological Analysis with TokenMecab

What is Morphological Analysis?

Since Japanese does not have space delimiters like English, sentences need to be segmented into words.

Example:

Input: "東京大学で勉強する" (Studying at the University of Tokyo)
↓ Segmented by TokenMecab
Output: "東京" / "大学" / "で" / "勉強" / "する"

This enables searching by individual words such as “Tokyo” or “university.”

Limitations of Morphological Analysis

TokenMecab is powerful, but may not work as expected in the following cases:

1. Proper Nouns (New Words Not in the Dictionary)

"鬼滅の刃" → "鬼" / "滅" / "の" / "刃"
(Not recognized as a work title)

2. Compound Words and Technical Terms

"機械学習" → "機械" / "学習"
(Splitting may change the meaning)

3. Coined Words and Neologisms

"エモい" → "エモ" / "い" or treated as unknown

4. Multiple Segmentation Patterns

"子供服" → "子供" / "服" or "子" / "供" / "服"

Solutions

  • User dictionary: Add custom words to the MeCab dictionary
  • TokenBigram combination: Supplement partial matching with 2-character N-grams
  • Fallback: MroongaSearch automatically uses LIKE search as well

Available Tokenizers

TokenizerDescriptionUse Case
TokenMecabMorphological analysisJapanese search (recommended)
TokenBigram2-character splittingEmphasis on partial matching
TokenUnigram1-character splittingExact matching only
TokenDelimitDelimiter-based splittingEnglish, etc.

Performance Comparison

LIKE Search (Fallback)

SELECT * FROM fulltext_search WHERE text LIKE '%人工知能%';
  • Full row scan
  • Latency proportional to data volume
  • However, search results are returned (zero without the module)
SELECT * FROM fulltext_search
WHERE MATCH(text) AGAINST('人工知能' IN BOOLEAN MODE);
  • Uses index
  • High-speed search (hundreds of times faster than LIKE)
  • Scalable

Summary

Importance of the MroongaSearch Module

  1. Essential: The MroongaSearch module is required for Japanese full-text search in Omeka-S
  2. Immediate effect: Searchable via fallback immediately after installation
  3. Incremental improvement: Further speed improvement with Mroonga configuration
LevelConfigurationSearch BehaviorPerformance
MinimumMroongaSearch module onlyLIKE search fallbackSlow (but works)
RecommendedMroongaSearch module + Mroonga + TokenMecabMorphological analysis searchFast

Benefits

  1. Japanese search enabled: Works immediately via fallback
  2. Improved precision: Word-level search via TokenMecab
  3. Speed improvement: Optimization via Groonga engine
  4. Flexibility: Both morphological search and partial matching

Conclusion: The MroongaSearch module is essential when handling Japanese content in Omeka-S.

References

Test Environment

  • Omeka-S: 4.1.1
  • MroongaSearch: latest
  • MariaDB: latest (11.x)
  • Docker Compose
  • macOS (Darwin 24.6.0)

If you found this article helpful, please star the GitHub repository!