Overview

When inputting files with Japanese filenames into Archivematica with default settings, a filename like “ユースケース公募提案書.docx” (Use Case Call for Proposals.docx) is converted as follows:

yu-suke-suGong_Mu_Ti_An_Shu_.docx

This article explains how to customize this filename conversion.

Details

The filename conversion is performed in the following file:

https://github.com/artefactual/archivematica/blob/qa/1.x/src/MCPClient/lib/clientScripts/change_names.py

Specifically, the conversion is done here:

decoded_name = unidecode(basename)

A Google Colab execution example is available here:

https://colab.research.google.com/github/nakamura196/000_tools/blob/main/unidecodeを試す.ipynb

Customization

This time, we will use pykakasi.

https://codeberg.org/miurahr/pykakasi

We assume that Archivematica is running via Docker. Please refer to the following article:

First, add pykakasi to the following file:

https://github.com/artefactual/archivematica/blob/qa/1.x/requirements-dev.txt

Then, modify the following file as well:

https://github.com/artefactual/archivematica/blob/qa/1.x/src/MCPClient/lib/clientScripts/change_names.py

import os
import re
import shutil

from unidecode import unidecode

import pykakasi

# Initialize
kakasi = pykakasi.kakasi()

# Set text conversion to romaji
kakasi.setMode("H", "a")  # Hiragana to romaji
kakasi.setMode("K", "a")  # Katakana to romaji
kakasi.setMode("J", "a")  # Kanji to romaji
kakasi.setMode("r", "Hepburn")  # Set to Hepburn romanization

# Create converter
converter = kakasi.getConverter()

VERSION = "1.10." + "$Id$".split(" ")[1]

# Letters, digits and a few punctuation characters
ALLOWED_CHARS = re.compile(r"[^a-zA-Z0-9\-_.\(\)]")
REPLACEMENT_CHAR = "_"


def change_name(basename):
    if basename == "":
        raise ValueError("change_name received an empty filename.")
    # decoded_name = unidecode(basename)
    decoded_name = converter.do(basename)
...

After applying the above modifications and rebuilding Archivematica, the filename is now converted as follows:

yuusukeesukouboteiansho.docx

Summary

Regarding filename conversion, the METS file records the conversion as follows:

<premis:eventOutcomeDetailNote>Original name="%transferDirectory%objects/ユースケース公募提案書.docx"; new name="%transferDirectory%objects/yuusukeesukouboteiansho.docx"</premis:eventOutcomeDetailNote>

Therefore, you may not need to worry about the filename conversion rules, but I hope this serves as a useful reference.