Overview
When inputting files with Japanese filenames into Archivematica with default settings, a filename like “ユースケース公募提案書.docx” (Use Case Call for Proposals.docx) is converted as follows:
yu-suke-suGong_Mu_Ti_An_Shu_.docx
This article explains how to customize this filename conversion.
Details
The filename conversion is performed in the following file:
Specifically, the conversion is done here:
decoded_name = unidecode(basename)
A Google Colab execution example is available here:
https://colab.research.google.com/github/nakamura196/000_tools/blob/main/unidecodeを試す.ipynb
Customization
This time, we will use pykakasi.
https://codeberg.org/miurahr/pykakasi
We assume that Archivematica is running via Docker. Please refer to the following article:
First, add pykakasi to the following file:
https://github.com/artefactual/archivematica/blob/qa/1.x/requirements-dev.txt
Then, modify the following file as well:
import os
import re
import shutil
from unidecode import unidecode
import pykakasi
# Initialize
kakasi = pykakasi.kakasi()
# Set text conversion to romaji
kakasi.setMode("H", "a") # Hiragana to romaji
kakasi.setMode("K", "a") # Katakana to romaji
kakasi.setMode("J", "a") # Kanji to romaji
kakasi.setMode("r", "Hepburn") # Set to Hepburn romanization
# Create converter
converter = kakasi.getConverter()
VERSION = "1.10." + "$Id$".split(" ")[1]
# Letters, digits and a few punctuation characters
ALLOWED_CHARS = re.compile(r"[^a-zA-Z0-9\-_.\(\)]")
REPLACEMENT_CHAR = "_"
def change_name(basename):
if basename == "":
raise ValueError("change_name received an empty filename.")
# decoded_name = unidecode(basename)
decoded_name = converter.do(basename)
...
After applying the above modifications and rebuilding Archivematica, the filename is now converted as follows:
yuusukeesukouboteiansho.docx

Summary
Regarding filename conversion, the METS file records the conversion as follows:
<premis:eventOutcomeDetailNote>Original name="%transferDirectory%objects/ユースケース公募提案書.docx"; new name="%transferDirectory%objects/yuusukeesukouboteiansho.docx"</premis:eventOutcomeDetailNote>
Therefore, you may not need to worry about the filename conversion rules, but I hope this serves as a useful reference.