Achieving Japanese Full-Text Search with the MroongaSearch Module for Omeka-S

Overview

Omeka-S is a powerful digital archive system, but Japanese full-text search barely works by default. This article explains how to achieve Japanese full-text search by installing the MroongaSearch module.

Background: Why the MroongaSearch Module is Needed

Problems with Omeka-S Standard Search

Omeka-S’s standard full-text search (FullTextSearch module) uses the InnoDB engine, which has the following critical issues:

Example of Japanese word search:

Data: "Studying artificial intelligence at the University of Tokyo"
       (東京大学で人工知能を研究する)
Search term: "artificial intelligence" (人工知能)
Result: No hits

Since InnoDB’s full-text search assumes space-delimited languages like English, the following problems occur with Japanese:

Word search is impossible: The entire string is treated as a single word
Partial matching does not work: FULLTEXT indexes cannot properly process Japanese
Zero search results: Users cannot find anything

The MroongaSearch Module Solution

The MroongaSearch module solves this problem in two stages:

1. Fallback Feature (Active Immediately After Installation)

Important: Simply installing the MroongaSearch module enables Japanese search to work without any special configuration.

Data: "東京大学で人工知能を研究する"
Search term: "人工知能"

[Without MroongaSearch module]
→ Zero results

[With MroongaSearch module (even without Mroonga configured)]
→ Falls back to LIKE '%人工知能%'
→ Search results are returned!

The MroongaSearch module’s fallback feature:

Automatically detects CJK (Japanese, Chinese, Korean) single-word searches
Automatically falls back to LIKE '%term%' search
Works even when Mroonga is not configured
Without this, Japanese full-text search simply does not work properly

2. High-Speed, High-Precision Search with Mroonga + TokenMecab (Recommended)

Additionally, configuring the Mroonga plugin in MariaDB enables:

Precise word search through morphological analysis
High-speed full-text search (hundreds of times faster than LIKE)
Strict AND/OR search control

What is the MroongaSearch Module?

MroongaSearch is a full-text search enhancement module for Omeka-S.

Key Features

Automatic fallback feature
- Enables CJK search even without Mroonga configured
- Automatic switching to LIKE search
- Ready to use immediately without configuration
Mroonga integration
- Precise search through morphological analysis
- TokenMecab support
- High-speed index search
Diagnostics page
- Plugin status check
- Table engine display
- Tokenizer information
- Manual engine switching
Strict AND/OR search
- More precise search logic than standard FullTextSearch

Developers

Kentaro Fukuchi (initial version)
Kazufumi Fukuda (feature extensions)
Toshihito Waki (current maintainer)

Setup Procedure

Step 1: Installing the MroongaSearch Module

cd /path/to/omeka-s/modules
git clone https://github.com/wakitosh/MroongaSearch.git

Activate the module from the Omeka-S admin panel.

This alone enables Japanese search to work! (LIKE search fallback)

Step 2: Building the Mroonga Environment (Recommended)

For faster and more precise search, configure the Mroonga plugin in MariaDB.

For Docker Environments

Directory structure:

omeka-s-docker/
├── Dockerfile
├── docker-compose.yml
└── mariadb/
    ├── Dockerfile
    └── init.sql

mariadb/Dockerfile:

FROM mariadb:latest

# Install Mroonga plugin and MeCab for Japanese tokenization
RUN apt-get update && \
    apt-get install -y \
    mariadb-plugin-mroonga \
    groonga-tokenizer-mecab \
    mecab \
    mecab-ipadic-utf8 && \
    apt-get clean && \
    rm -rf /var/lib/apt/lists/*

# Enable Mroonga plugin on startup
RUN echo "plugin_load_add = ha_mroonga" >> /etc/mysql/mariadb.conf.d/50-server.cnf

mariadb/init.sql:

-- Install Mroonga plugin and UDF functions
INSTALL SONAME 'ha_mroonga';

-- Install Mroonga UDF functions
CREATE FUNCTION IF NOT EXISTS mroonga_snippet HTML SONAME 'ha_mroonga.so';
CREATE FUNCTION IF NOT EXISTS mroonga_command RETURNS STRING SONAME 'ha_mroonga.so';
CREATE FUNCTION IF NOT EXISTS mroonga_escape RETURNS STRING SONAME 'ha_mroonga.so';

docker-compose.yml (mariadb section):

services:
  mariadb:
    build:
      context: ./mariadb
      dockerfile: Dockerfile
    restart: always
    volumes:
      - mariadb:/var/lib/mysql
      - ./mariadb/init.sql:/docker-entrypoint-initdb.d/init.sql
    environment:
      MYSQL_ROOT_PASSWORD: your_password
      MYSQL_DATABASE: omeka
      MYSQL_USER: omeka
      MYSQL_PASSWORD: omeka

Rebuilding the container:

docker compose down
docker compose build mariadb
docker compose up -d

Step 3: Verifying the Setup

1. Checking the Mroonga Plugin

docker exec <container-name> mariadb -u root -p<password> \
  -e "SHOW PLUGINS" | grep -i mroonga

Expected output:

Mroonga	ACTIVE	STORAGE ENGINE	ha_mroonga.so	GPL

2. Checking TokenMecab

docker exec <container-name> mariadb -u root -p<password> \
  -e "SELECT mroonga_command('tokenizer_list')"

Expected output (excerpt):

[{"name":"TokenMecab"},{"name":"TokenBigram"}, ...]

If TokenMecab is included, the setup is correct.

3. Checking the MroongaSearch Diagnostics Page

In the Omeka-S admin panel:

Modules → MroongaSearch → Configure → Diagnostics

Displayed information:

Plugin status: ACTIVE / NOT ACTIVE
Table engine: InnoDB / Mroonga
Tokenizer: TokenMecab / None
Mroonga effective: YES / NO

If “Mroonga effective: NO”:

The plugin is ACTIVE, but the table engine remains InnoDB
Fallback search (LIKE) is used
It works, but is slow

To set “Mroonga effective: YES”:

Manually switch the engine to Mroonga from the diagnostics page

Or change it directly via SQL:

ALTER TABLE omeka.fulltext_search
  ENGINE=Mroonga
  COMMENT='table "ms_fulltext" tokenizer "TokenMecab"';

4. Re-indexing

Run re-indexing from the diagnostics page or the Omeka-S admin panel.

How Search Works

Without Mroonga Configured (Fallback)

Search term: "人工知能" (CJK single word)

MroongaSearch module evaluation:
→ CJK characters detected
→ Mroonga not configured detected
→ Falls back to LIKE '%人工知能%'
→ Search results returned

With Mroonga + TokenMecab Configured

Data: "東京大学で人工知能を研究する"
Morphological analysis with TokenMecab:
→ "東京" / "大学" / "で" / "人工" / "知能" / "を" / "研究" / "する"

Search term: "人工知能"
→ Matches on "人工" AND "知能" (fast)

Search term: "東京"
→ Matches on "東京"

Search term: "研究"
→ Matches on "研究"

Substring Search Also Works

Mroonga supports not only morphological analysis but also substring search:

Search term: "工知"
→ Matches "人工知能"

This allows users to get results even when they do not know the exact word.

Morphological Analysis with TokenMecab

What is Morphological Analysis?

Since Japanese does not have space delimiters like English, sentences need to be segmented into words.

Example:

Input: "東京大学で勉強する" (Studying at the University of Tokyo)
↓ Segmented by TokenMecab
Output: "東京" / "大学" / "で" / "勉強" / "する"

This enables searching by individual words such as “Tokyo” or “university.”

Limitations of Morphological Analysis

TokenMecab is powerful, but may not work as expected in the following cases:

1. Proper Nouns (New Words Not in the Dictionary)

"鬼滅の刃" → "鬼" / "滅" / "の" / "刃"
(Not recognized as a work title)

2. Compound Words and Technical Terms

"機械学習" → "機械" / "学習"
(Splitting may change the meaning)

3. Coined Words and Neologisms

"エモい" → "エモ" / "い" or treated as unknown

4. Multiple Segmentation Patterns

"子供服" → "子供" / "服" or "子" / "供" / "服"

Solutions

User dictionary: Add custom words to the MeCab dictionary
TokenBigram combination: Supplement partial matching with 2-character N-grams
Fallback: MroongaSearch automatically uses LIKE search as well

Available Tokenizers

Tokenizer	Description	Use Case
TokenMecab	Morphological analysis	Japanese search (recommended)
TokenBigram	2-character splitting	Emphasis on partial matching
TokenUnigram	1-character splitting	Exact matching only
TokenDelimit	Delimiter-based splitting	English, etc.

Performance Comparison

LIKE Search (Fallback)

SELECT * FROM fulltext_search WHERE text LIKE '%人工知能%';

Full row scan
Latency proportional to data volume
However, search results are returned (zero without the module)

Mroonga Full-Text Search

SELECT * FROM fulltext_search
WHERE MATCH(text) AGAINST('人工知能' IN BOOLEAN MODE);

Uses index
High-speed search (hundreds of times faster than LIKE)
Scalable

Summary

Importance of the MroongaSearch Module

Essential: The MroongaSearch module is required for Japanese full-text search in Omeka-S
Immediate effect: Searchable via fallback immediately after installation
Incremental improvement: Further speed improvement with Mroonga configuration

Recommended Setup

Level	Configuration	Search Behavior	Performance
Minimum	MroongaSearch module only	LIKE search fallback	Slow (but works)
Recommended	MroongaSearch module + Mroonga + TokenMecab	Morphological analysis search	Fast

Benefits

Japanese search enabled: Works immediately via fallback
Improved precision: Word-level search via TokenMecab
Speed improvement: Optimization via Groonga engine
Flexibility: Both morphological search and partial matching

Conclusion: The MroongaSearch module is essential when handling Japanese content in Omeka-S.

References

Test Environment

Omeka-S: 4.1.1
MroongaSearch: latest
MariaDB: latest (11.x)
Docker Compose
macOS (Darwin 24.6.0)

If you found this article helpful, please star the GitHub repository!

Overview#

Background: Why the MroongaSearch Module is Needed#

Problems with Omeka-S Standard Search#

The MroongaSearch Module Solution#

1. Fallback Feature (Active Immediately After Installation)#

2. High-Speed, High-Precision Search with Mroonga + TokenMecab (Recommended)#

What is the MroongaSearch Module?#

Key Features#

Developers#

Setup Procedure#

Step 1: Installing the MroongaSearch Module#

Step 2: Building the Mroonga Environment (Recommended)#

For Docker Environments#

Step 3: Verifying the Setup#

1. Checking the Mroonga Plugin#

2. Checking TokenMecab#

3. Checking the MroongaSearch Diagnostics Page#

4. Re-indexing#

How Search Works#

Without Mroonga Configured (Fallback)#

With Mroonga + TokenMecab Configured#

Substring Search Also Works#

Morphological Analysis with TokenMecab#

What is Morphological Analysis?#

Limitations of Morphological Analysis#

1. Proper Nouns (New Words Not in the Dictionary)#

2. Compound Words and Technical Terms#

3. Coined Words and Neologisms#

4. Multiple Segmentation Patterns#

Solutions#

Available Tokenizers#

Performance Comparison#

LIKE Search (Fallback)#

Mroonga Full-Text Search#

Summary#

Importance of the MroongaSearch Module#

Recommended Setup#

Benefits#

References#

Test Environment#