Overview

“Advanced Search adapter for Solr” is an Omeka S module that provides an advanced search adapter for Apache Solr. This enables you to leverage the full power of a search engine within Omeka. It provides features such as relevance-based (score) search, instant search, facets, autocomplete, and suggestions for both general users and administrators.

https://github.com/Daniel-KM/Omeka-S-module-SearchSolr

Setting Up Apache Solr

!

Apache Solr can be installed on a server different from the one where Omeka S is installed.

Set up Apache Solr in an environment where Java can be installed. For Ubuntu, the following site was helpful:

https://tecadmin.net/how-to-install-apache-solr-on-ubuntu-22-04/

Apache Solr can be started with commands like the following:

# Install Java
sudo apt update && sudo apt install -y default-jdk
# Download
wget https://dlcdn.apache.org/solr/solr/9.3.0/solr-9.3.0.tgz
# Extract
tar xzf solr-9.3.0.tgz solr-9.3.0/bin/install_solr_service.sh --strip-components=2
# Install
sudo bash ./install_solr_service.sh solr-9.3.0.tgz
# Start
sudo systemctl start solr

Also, create a core named mycol1:

sudo su - solr -c "/opt/solr/bin/solr create -c mycol1 -n data_driven_schema_configs"

Installing the Module

!

From here, work on the server where Omeka S is installed.

Download and install the module from the following page:

https://github.com/Daniel-KM/Omeka-S-module-SearchSolr/releases

During installation, an alert may appear indicating that AdvancedSearch is required, as shown below.

In that case, install and activate the following module first, then try installing Advanced Search adapter for Solr again.

https://omeka.org/s/modules/AdvancedSearch/

Connecting to Apache Solr

Access the following screen from Modules > Search manager in the left side of the admin panel.

</admin/search-manager>

Click the pencil button for the default core under “Solr cores” to access the following page. On this page, enter the IP address or hostname of the server where Apache Solr is installed in the “IP or hostname” form. Also, enter the core you created earlier (in this case, mycol1) in “Solr core.”

</admin/search-manager/solr/core/1/edit>

If configured correctly, the “Status” will show OK as in the screen below. This means Omeka S and Apache Solr are now connected.

Creating Indexes and Pages

Here, we will create indexes and pages to display the search screen in Omeka S.

In Search admin

Access the following screen again:

</admin/search-manager>

Add search engine

On the above screen, press the “Add new search engine” button at the top right. On the following screen, enter an appropriate name and select “Solr” for the “Adapter” field.

Then, on the following screen, click the “Reindex” icon. A reindex block will appear on the right side of the screen, and press the “Confirm reindex” button.

This synchronizes Omeka S and Apache Solr. After reindexing is complete, checking the Apache Solr admin panel shows that documents (in this case, 3) have been registered.

Create a page

Next, create a page. Access the Search manager admin screen again and press the “Add new page” button at the top right.

After navigating to the following screen, fill in the required fields. Here is an example:

FieldValue
Namepage1
Pathfind
Search engineengine1 (the name of the engine you created earlier)
FormMain

Also, for “Availability on sites,” select “Make available in all sites” for now.

In admin or site settings

Next, add the created page to an Omeka S site. Select a specific site from the list of created sites and choose “Navigation.” On the following screen, select “Advanced search page” from “Custom Links” on the right side, and choose the page name you created earlier (in this case, page1).

As a result, you can access the search page that displays Apache Solr query results at the path you configured earlier.

https://omekas.aws.ldas.jp/omeka4/s/default/find

However, since facets and other settings have not been configured yet at this point, we will proceed with various settings below.

Configuration

Facet

First, let me explain how to configure facets.

From the Search manager page, click the pencil icon for the page you created earlier (in this case, page1).

On the following screen, select the “Settings” tab at the top of the screen.

</admin/search-manager/config/1/configure>

Then, edit the items displayed under “Facets.”

Copy and paste the necessary rows from “Available facets” to the List of facets. Here, we will add the following filter:

dcterms_subject_ss = Subject =

As a result, facets are displayed on the Omeka S site page created earlier, enabling filtering based on values.

Filters

Next, here is how to configure filters for specifying search conditions.

Similar to the facets above, copy and paste the necessary rows from “Available filters” to Filters. Here, we will add the following filter:

dcterms_subject_ss = Subject = =

</admin/search-manager/config/1/configure>

As a result, a “Keyword/Subject” form has been added as shown below.

Advanced filters

“Advanced filters” are forms that allow users to dynamically change filter conditions. For example, let’s add “Subject” and “Date” to “Advanced filters.”

The following form appears on the site’s search page.

Here is an example of searching where the “Date” field “contains” the value “11.” Only items with the date “2020-11-24” were retrieved.

Other

Let’s try registering in the following format:

dcterms_subject_ss = Subject = Select = baby|medical

This creates a select box that can be used as follows.

Sort

https://omekas.aws.ldas.jp/omeka4/s/default/find

Japanese Language Support

Introduction

In the configuration above, for example, there were three types for the title alone. These represent differences in how data is indexed in Apache Solr.

  • dcterms_title_s = Title
  • dcterms_title_txt = Title
  • dcterms_title_txt_ja = Title

For example, using the string “横から見たオムツ姿の赤ちゃんのイラスト” (Illustration of a baby in a diaper seen from the side) as an example:

FieldTypeTerms
dcterms_title_sstring横から見たオムツ姿の赤ちゃんのイラスト
dcterms_title_txttext_general横,か,ら,見,た,オムツ,姿,の,赤,ち,ゃ,ん,の,イラスト
dcterms_title_txt_jatext_ja横,見る,オムツ,姿,赤ちゃん,イラスト
dcterms_title_txt_cjktext_cjk横か,から,ら見,見た,たオ,オム,ムツ,ツ姿,姿の,の赤,赤ち,ちゃ,ゃん,んの,のイ,イラ,ラス,スト
  • *_txt applies the StandardTokenizerFactory tokenizer, where consecutive katakana are indexed as one term and other characters are indexed character by character. (I am not entirely confident about this.)
  • *_txt_ja applies the JapaneseTokenizerFactory tokenizer, which indexes by morphemes.
  • *_txt_cjk applies the CJKBigramFilterFactory filter, which indexes in two-character units.

Due to these differences, you need to configure how Omeka S fields are handled in Apache Solr according to your purposes.

CJK Filter

solr map

For example, let’s add *_txt_cjk, which indexes title values in two-character units.

On the Search manager screen, select “Map Omeka metadata and Solr fields” for the Solr core, then select “Resource (or Item)” and press the “Add new map” button at the top right.

On the following screen, select “*_txt_cjk” for “Solr field.”

After that, perform reindexing.

Filters

Then, referring to the earlier Filters configuration, add “dcterms_title_txt_cjk” as follows.

</admin/search-manager/config/1/configure>

Site

As a result, the following differences emerge.

Searching with dcterms_title_s=イラスト yields 0 results because the values are indexed as complete strings like “横から見たオムツ姿の赤ちゃんのイラスト,” and there is no exact match.

On the other hand, searching with dcterms_title_txt_cjk=イラスト yields 2 results because the values are indexed as bigrams like “イラ,ラス,スト,” and the search term “イラスト” is processed similarly, matching items containing the string “イラスト.”

Reference

Let’s check how _txt_cjk and _txt_ja each index data.

_txt_cjk

Target Strings

  • 横から見たオムツ姿の赤ちゃんのイラスト
  • 【タイトルを更新】赤ちゃんの胸囲の測定のイラスト
  • iiif presentation api v3のマニフェスト

Results

</solr/#/mycol1/schema?field=dcterms_title_txt_cjk>

Term FrequencyTerm
3スト
2イラ
ゃん
んの
のイ
ラス
赤ち
ちゃ
1ら見
の赤
たオ
フェ
ムツ
を更
presentation
の胸
オム
の測
から
ェス
タイ
ツ姿
トル
ニフ
測定
マニ
のマ
更新
ルを
囲の
定の
姿の
イト
横か
v3
胸囲
見た
iiif
api

Excluding English, we can confirm that the text is indexed as two-character segments.

For example, “スト” appears 3 times because it is contained in both “イラスト” and “マニフェスト.”

_txt_ja

Target Strings

  • 横から見たオムツ姿の赤ちゃんのイラスト
  • 【タイトルを更新】赤ちゃんの胸囲の測定のイラスト
  • iiif presentation api v3のマニフェスト

Results

</solr/#/mycol1/schema?field=dcterms_title_txt_ja>

Term FrequencyTerm
2赤ちゃん
イラスト
1iiif
v
オムツ
マニフェスト
タイトル
姿
更新
presentation
測定
胸囲
見る
api
3

Particles and other function words are excluded, and the text is indexed as morphemes like “赤ちゃん” and “イラスト.”

Comparison

Searching with dcterms_title_s=スト returns 0 results because the terms are indexed as morphemes like “イラスト” and “マニフェスト.”

On the other hand, searching with dcterms_title_txt_cjk=スト returns 3 results. However, searching with dcterms_title_txt_cjk=イラスト returns only 2 results. This is because only data that contains all three bigrams “イラ,” “ラス,” and “スト” in the index will match, so items containing the string “イラスト” match, while items containing “マニフェスト” do not.

Summary

I have introduced how to connect Omeka S with Apache Solr. This can be a useful option when you need advanced search capabilities including morphological analysis in Omeka S.

This may contain some inaccuracies, but I hope it serves as a useful reference.