Overview
Omeka-S is a powerful digital archive system, but Japanese full-text search barely works by default. This article explains how to achieve Japanese full-text search by installing the MroongaSearch module.
Background: Why the MroongaSearch Module is Needed
Problems with Omeka-S Standard Search
Omeka-S’s standard full-text search (FullTextSearch module) uses the InnoDB engine, which has the following critical issues:
Example of Japanese word search:
Data: "Studying artificial intelligence at the University of Tokyo"
(東京大学で人工知能を研究する)
Search term: "artificial intelligence" (人工知能)
Result: No hits
Since InnoDB’s full-text search assumes space-delimited languages like English, the following problems occur with Japanese:
- Word search is impossible: The entire string is treated as a single word
- Partial matching does not work: FULLTEXT indexes cannot properly process Japanese
- Zero search results: Users cannot find anything
The MroongaSearch Module Solution
The MroongaSearch module solves this problem in two stages:
1. Fallback Feature (Active Immediately After Installation)
Important: Simply installing the MroongaSearch module enables Japanese search to work without any special configuration.
Data: "東京大学で人工知能を研究する"
Search term: "人工知能"
[Without MroongaSearch module]
→ Zero results
[With MroongaSearch module (even without Mroonga configured)]
→ Falls back to LIKE '%人工知能%'
→ Search results are returned!
The MroongaSearch module’s fallback feature:
- Automatically detects CJK (Japanese, Chinese, Korean) single-word searches
- Automatically falls back to
LIKE '%term%'search - Works even when Mroonga is not configured
- Without this, Japanese full-text search simply does not work properly
2. High-Speed, High-Precision Search with Mroonga + TokenMecab (Recommended)
Additionally, configuring the Mroonga plugin in MariaDB enables:
- Precise word search through morphological analysis
- High-speed full-text search (hundreds of times faster than LIKE)
- Strict AND/OR search control
What is the MroongaSearch Module?
MroongaSearch is a full-text search enhancement module for Omeka-S.
Key Features
Automatic fallback feature
- Enables CJK search even without Mroonga configured
- Automatic switching to LIKE search
- Ready to use immediately without configuration
Mroonga integration
- Precise search through morphological analysis
- TokenMecab support
- High-speed index search
Diagnostics page
- Plugin status check
- Table engine display
- Tokenizer information
- Manual engine switching
Strict AND/OR search
- More precise search logic than standard FullTextSearch
Developers
- Kentaro Fukuchi (initial version)
- Kazufumi Fukuda (feature extensions)
- Toshihito Waki (current maintainer)
Setup Procedure
Step 1: Installing the MroongaSearch Module
cd /path/to/omeka-s/modules
git clone https://github.com/wakitosh/MroongaSearch.git
Activate the module from the Omeka-S admin panel.
This alone enables Japanese search to work! (LIKE search fallback)
Step 2: Building the Mroonga Environment (Recommended)
For faster and more precise search, configure the Mroonga plugin in MariaDB.
For Docker Environments
Directory structure:
omeka-s-docker/
├── Dockerfile
├── docker-compose.yml
└── mariadb/
├── Dockerfile
└── init.sql
mariadb/Dockerfile:
FROM mariadb:latest
# Install Mroonga plugin and MeCab for Japanese tokenization
RUN apt-get update && \
apt-get install -y \
mariadb-plugin-mroonga \
groonga-tokenizer-mecab \
mecab \
mecab-ipadic-utf8 && \
apt-get clean && \
rm -rf /var/lib/apt/lists/*
# Enable Mroonga plugin on startup
RUN echo "plugin_load_add = ha_mroonga" >> /etc/mysql/mariadb.conf.d/50-server.cnf
mariadb/init.sql:
-- Install Mroonga plugin and UDF functions
INSTALL SONAME 'ha_mroonga';
-- Install Mroonga UDF functions
CREATE FUNCTION IF NOT EXISTS mroonga_snippet HTML SONAME 'ha_mroonga.so';
CREATE FUNCTION IF NOT EXISTS mroonga_command RETURNS STRING SONAME 'ha_mroonga.so';
CREATE FUNCTION IF NOT EXISTS mroonga_escape RETURNS STRING SONAME 'ha_mroonga.so';
docker-compose.yml (mariadb section):
services:
mariadb:
build:
context: ./mariadb
dockerfile: Dockerfile
restart: always
volumes:
- mariadb:/var/lib/mysql
- ./mariadb/init.sql:/docker-entrypoint-initdb.d/init.sql
environment:
MYSQL_ROOT_PASSWORD: your_password
MYSQL_DATABASE: omeka
MYSQL_USER: omeka
MYSQL_PASSWORD: omeka
Rebuilding the container:
docker compose down
docker compose build mariadb
docker compose up -d
Step 3: Verifying the Setup
1. Checking the Mroonga Plugin
docker exec <container-name> mariadb -u root -p<password> \
-e "SHOW PLUGINS" | grep -i mroonga
Expected output:
Mroonga ACTIVE STORAGE ENGINE ha_mroonga.so GPL
2. Checking TokenMecab
docker exec <container-name> mariadb -u root -p<password> \
-e "SELECT mroonga_command('tokenizer_list')"
Expected output (excerpt):
[{"name":"TokenMecab"},{"name":"TokenBigram"}, ...]
If TokenMecab is included, the setup is correct.
3. Checking the MroongaSearch Diagnostics Page
In the Omeka-S admin panel:
Modules → MroongaSearch → Configure → Diagnostics
Displayed information:
- Plugin status: ACTIVE / NOT ACTIVE
- Table engine: InnoDB / Mroonga
- Tokenizer: TokenMecab / None
- Mroonga effective: YES / NO
If “Mroonga effective: NO”:
- The plugin is ACTIVE, but the table engine remains InnoDB
- Fallback search (LIKE) is used
- It works, but is slow

To set “Mroonga effective: YES”:
- Manually switch the engine to Mroonga from the diagnostics page


- Or change it directly via SQL:
ALTER TABLE omeka.fulltext_search
ENGINE=Mroonga
COMMENT='table "ms_fulltext" tokenizer "TokenMecab"';
4. Re-indexing
Run re-indexing from the diagnostics page or the Omeka-S admin panel.

How Search Works
Without Mroonga Configured (Fallback)
Search term: "人工知能" (CJK single word)
MroongaSearch module evaluation:
→ CJK characters detected
→ Mroonga not configured detected
→ Falls back to LIKE '%人工知能%'
→ Search results returned
With Mroonga + TokenMecab Configured
Data: "東京大学で人工知能を研究する"
Morphological analysis with TokenMecab:
→ "東京" / "大学" / "で" / "人工" / "知能" / "を" / "研究" / "する"
Search term: "人工知能"
→ Matches on "人工" AND "知能" (fast)
Search term: "東京"
→ Matches on "東京"
Search term: "研究"
→ Matches on "研究"
Substring Search Also Works
Mroonga supports not only morphological analysis but also substring search:
Search term: "工知"
→ Matches "人工知能"
This allows users to get results even when they do not know the exact word.
Morphological Analysis with TokenMecab
What is Morphological Analysis?
Since Japanese does not have space delimiters like English, sentences need to be segmented into words.
Example:
Input: "東京大学で勉強する" (Studying at the University of Tokyo)
↓ Segmented by TokenMecab
Output: "東京" / "大学" / "で" / "勉強" / "する"
This enables searching by individual words such as “Tokyo” or “university.”
Limitations of Morphological Analysis
TokenMecab is powerful, but may not work as expected in the following cases:
1. Proper Nouns (New Words Not in the Dictionary)
"鬼滅の刃" → "鬼" / "滅" / "の" / "刃"
(Not recognized as a work title)
2. Compound Words and Technical Terms
"機械学習" → "機械" / "学習"
(Splitting may change the meaning)
3. Coined Words and Neologisms
"エモい" → "エモ" / "い" or treated as unknown
4. Multiple Segmentation Patterns
"子供服" → "子供" / "服" or "子" / "供" / "服"
Solutions
- User dictionary: Add custom words to the MeCab dictionary
- TokenBigram combination: Supplement partial matching with 2-character N-grams
- Fallback: MroongaSearch automatically uses LIKE search as well
Available Tokenizers
| Tokenizer | Description | Use Case |
|---|---|---|
| TokenMecab | Morphological analysis | Japanese search (recommended) |
| TokenBigram | 2-character splitting | Emphasis on partial matching |
| TokenUnigram | 1-character splitting | Exact matching only |
| TokenDelimit | Delimiter-based splitting | English, etc. |
Performance Comparison
LIKE Search (Fallback)
SELECT * FROM fulltext_search WHERE text LIKE '%人工知能%';
- Full row scan
- Latency proportional to data volume
- However, search results are returned (zero without the module)
Mroonga Full-Text Search
SELECT * FROM fulltext_search
WHERE MATCH(text) AGAINST('人工知能' IN BOOLEAN MODE);
- Uses index
- High-speed search (hundreds of times faster than LIKE)
- Scalable
Summary
Importance of the MroongaSearch Module
- Essential: The MroongaSearch module is required for Japanese full-text search in Omeka-S
- Immediate effect: Searchable via fallback immediately after installation
- Incremental improvement: Further speed improvement with Mroonga configuration
Recommended Setup
| Level | Configuration | Search Behavior | Performance |
|---|---|---|---|
| Minimum | MroongaSearch module only | LIKE search fallback | Slow (but works) |
| Recommended | MroongaSearch module + Mroonga + TokenMecab | Morphological analysis search | Fast |
Benefits
- Japanese search enabled: Works immediately via fallback
- Improved precision: Word-level search via TokenMecab
- Speed improvement: Optimization via Groonga engine
- Flexibility: Both morphological search and partial matching
Conclusion: The MroongaSearch module is essential when handling Japanese content in Omeka-S.
References
- MroongaSearch GitHub
- Mroonga official site
- MeCab official site
- Omeka-S official documentation
- MariaDB Mroonga plugin
Test Environment
- Omeka-S: 4.1.1
- MroongaSearch: latest
- MariaDB: latest (11.x)
- Docker Compose
- macOS (Darwin 24.6.0)
If you found this article helpful, please star the GitHub repository!