Digital Archive Systems Tech Blog

Latest Articles

BDRC Tibetan OCR: Introduction and Implementation Examples of a Tibetan OCR Tool

Overview Digitizing Tibetan manuscripts is one of the important challenges in digital humanities. Precious Buddhist scriptures and historical documents are preserved in libraries around the world, yet most have not yet been converted to text data. Manual transcription requires enormous time and cost, and researchers with the necessary expertise are limited. This article introduces BDRC Tibetan OCR, an open-source Tibetan OCR system developed by the Buddhist Digital Resource Center (BDRC). ...

November 16, 2025 · Updated: November 16, 2025 · 8 min · Nakamura

Marker Position Offset Issue in Cesium 1.135.0 and Its Solution

Problem Overview In a React application using Cesium.js 1.135.0, billboard markers using the CLAMP_TO_GROUND setting were observed to become inaccurately positioned after camera movement or zoom operations. Environment Cesium.js: 1.135.0 (issue occurred) -> 1.134.0 (issue resolved) Framework: Next.js 16.0.1 + React 19.2.0 Terrain Data: Cesium World Terrain (Cesium.Terrain.fromWorldTerrain()) Marker Settings: clampToGround: true heightReference: Cesium.HeightReference.CLAMP_TO_GROUND Symptoms Click a specific marker to zoom in Change the camera viewpoint (rotate/pan) Markers in the distance appear floating above the terrain or displayed at inaccurate positions This issue is thought to occur due to inconsistency between the high-resolution terrain data (LOD: Level of Detail) loaded when the camera moves closer and the low-resolution terrain data referenced by distant markers. ...

November 14, 2025 · Updated: November 14, 2025 · 3 min · Nakamura

Protoweb: A Time Machine to Experience the Internet of the 90s

The modern internet is fast and sophisticated, but many people may feel nostalgic for the atmosphere of the early days of the internet. Protoweb is a community-driven public service that revives the internet experience of the 1990s in the present day. What is Protoweb? Protoweb is a proxy server service that hosts websites from the early days of the internet and recreates the browsing experience of that era. By preserving and restoring historical websites, it provides an environment where you can experience the internet as it was around 1995. ...

November 13, 2025 · Updated: November 13, 2025 · 8 min · Nakamura

Practicing Long-Term Digital Preservation with OCFL - An Introductory Guide

Introduction Long-term preservation of digital data is an important challenge for libraries, archives, and research institutions. Various factors such as changes in data formats, software obsolescence, and the evolution of storage technologies threaten the sustainability of digital information. In this article, I introduce OCFL (Oxford Common File Layout), one solution to this challenge, covering its concepts, significance, and implementation examples. What is OCFL OCFL (Oxford Common File Layout) is a specification for preserving digital information in a structured, transparent, and predictable manner. It was developed primarily by the Bodleian Library at the University of Oxford and Stanford University Libraries, and has now evolved as a community-driven open standard. ...

November 6, 2025 · Updated: November 6, 2025 · 7 min · Nakamura

Using a Hex Editor on Mac: HexEd.it as an Alternative to HxD

Introduction When participating in a workshop at iPRES (International Conference on Digital Preservation), the hex editor “HxD” was used for hands-on exercises in digital preservation. It is an essential tool in digital archive practice for tasks such as analyzing binary file structures and verifying file formats. However, HxD is Windows-only and cannot be used on Mac. After the workshop, I searched for alternative tools to perform similar work in a Mac environment and found a web-based hex editor. ...

November 5, 2025 · Updated: November 5, 2025 · 6 min · Nakamura

Finding Hidden File Format Issues with DROID: An Essential Tool for Digital Preservation

If you are responsible for digital archives or long-term preservation, you have surely wondered, “Is this file really in the format its extension suggests?” This time, I introduce “DROID,” a powerful tool that resolves such doubts, along with actual analysis results. What is DROID? DROID (Digital Record Object Identification) is a file format identification tool developed by The National Archives (UK). It identifies the true format by analyzing not just the file extension but the internal structure (signature) of the file. ...

November 3, 2025 · Updated: November 3, 2025 · 5 min · Nakamura

Development of an IIIF Image Coordinate Editor with Auto-Navigation

Overview The editor developed in this project is a web-based tool for recording and managing arbitrary coordinates on IIIF-compatible high-resolution images. It is designed as a general-purpose coordinate recording tool that can specify images via URL parameters and be used across various research projects. https://youtu.be/UqPo5Xrkin8 Technology Stack OpenSeadragon: IIIF image viewer library (v4.1) SVG Overlay: For marker display localStorage: Data persistence Vanilla JavaScript: Framework-free implementation Technical Features 1. Image Specification via URL Parameters The tool’s most distinctive feature is the ability to specify any IIIF image via URL parameters: ...

October 29, 2025 · Updated: October 29, 2025 · 7 min · Nakamura

Odeuropa Visualization: A Platform for Visualizing Scent Data Using SKOS Vocabularies and SPARQL

Introduction Odeuropa is a project that studies the history of scents in Europe, collecting and analyzing representations of scents depicted in paintings, literature, and other historical sources. This article introduces the implementation of a web application for visualizing scent data based on the SKOS (Simple Knowledge Organization System) vocabulary, utilizing Odeuropa’s SPARQL endpoint. https://odeuropa-seven.vercel.app/ja/ Project Overview Technology Stack Frontend: Next.js 15 (App Router) UI: Material-UI v5 Internationalization: next-intl Data Retrieval: SPARQL queries (Odeuropa SPARQL endpoint) Language: TypeScript Hosting: Static Site Generation (SSG) Main Features 1. Scent Search (/odeuropa-sources) This is the core feature of the application, allowing users to search and browse smell perception events collected by the Odeuropa project. ...

October 24, 2025 · Updated: October 24, 2025 · 7 min · Nakamura

How to Build an Independent Author Database in Omeka S

Introduction A common challenge in digital archives for museums and libraries is the need to “properly manage the relationship between works and their creators.” Especially when a single author has created multiple works, or when multiple authors have collaborated on a single work, it is important to clearly express these relationships and make them searchable. This article explains how to build an independent author database in Omeka S and link it with works. ...

October 20, 2025 · Updated: October 20, 2025 · 4 min · Nakamura

Complete Guide to Annotation Coordinate Conversion in Leaflet-IIIF

Overview This article explains how to accurately display annotation coordinates (in xywh format) from IIIF (International Image Interoperability Framework) Presentation API v3 manifests on a map viewer using Leaflet-IIIF. While this problem may seem simple at first glance, accurate coordinate conversion is not possible without understanding the inner workings of Leaflet-IIIF. Background Annotation Format in IIIF Manifests In IIIF Presentation API v3, the target region of an annotation is specified in xywh format as follows: ...

October 19, 2025 · Updated: October 19, 2025 · 6 min · Nakamura

Complete Guide to Migrating an Omeka-S Docker Environment to Another Server

Introduction This article explains the procedure for migrating an Omeka-S environment set up with Docker Compose, including volume data, to a different server. You can proceed with the migration safely while maintaining data integrity. Environment Source server: Ubuntu 22.04 Target server: Ubuntu 22.04 (fresh setup) Stack: Omeka-S + MariaDB + phpMyAdmin + Traefik + Mailpit Migration Flow Backup on the source server Download to local machine Set up Docker environment on the target server Restore data and start up Step 1: Backup on the Source Server 1.1 Check Current Environment # Check running containers docker ps # Check Docker volumes docker volume ls Example output: ...

October 16, 2025 · Updated: October 16, 2025 · 6 min · Nakamura

Distinguishing Between RDFS and SHACL: Understanding the Relationship Between range and propertyShape

Overview When working with data in RDF (Resource Description Framework), two mechanisms come into play: “RDFS (RDF Schema)” and “SHACL (Shapes Constraint Language)”. Both can define constraints on properties and classes, but their purposes and behaviors are completely different. This article answers the following commonly confused questions: What is the difference between rdfs:domain / rdfs:range and SHACL’s sh:class / sh:datatype? Is it acceptable to set SHACL constraints that differ from the RDFS range? Is it problematic to specify a datatype (xsd:string) in SHACL when the range is a class (foaf:Person)? 1. The Fundamental Difference Between RDFS and SHACL RDFS: For Inference RDFS is a declaration that says “if this property is used, the following knowledge can be derived.” ...

October 15, 2025 · Updated: October 15, 2025 · 8 min · Nakamura

Developing an RDF Metadata Management System Integrating GakuNin RDM and Dydra

Overview This article describes the development of a metadata management system for research data that integrates GakuNin RDM (Research Data Management) with the Dydra RDF database. This system can handle file management for research projects and the registration and search of Dublin Core metadata in a unified manner. System Overview Architecture ┌─────────────────┐ │ Next.js 14 │ │ (App Router) │ └────────┬────────┘ │ ┌────┴────┐ │ │ ┌───▼───┐ ┌──▼─────┐ │GakuNin│ │ Dydra │ │ RDM │ │ RDF │ │ API │ │ DB │ └───────┘ └────────┘ Technology stack: ...

October 14, 2025 · Updated: October 14, 2025 · 8 min · Nakamura

Investigating the Vocabulary Hierarchy of Odeuropa Explorer

Overview Odeuropa Explorer is a fascinating project that digitizes European olfactory heritage. Funded by the EU’s Horizon 2020 research program, it provides a platform for cross-searching and exploring historical scent experiences. The project classifies scent-related information into three main categories: Smell sources: Objects and substances that emit odors Fragrant Spaces: Places and spaces associated with scents Gestures and Allegories: Gestures and allegorical expressions related to scents This article reports the results of investigating what hierarchical structures these vocabularies have, using SKOS (Simple Knowledge Organization System) data published in the Odeuropa vocabularies repository. ...

October 13, 2025 · Updated: October 13, 2025 · 12 min · Nakamura

Guide to Registering RDF Data to Dydra via API

Background Dydra is a cloud-based RDF database service that provides a SPARQL endpoint and REST API. This article explains how to programmatically register RDF data using the Dydra API. Prerequisites A Dydra account An API key A Node.js environment (v16 or later recommended, when using Node.js) Note: The code examples in this article use the following sample values: Account name: your-account Repository name: your-repository API key: your_api_key_here When using them in practice, replace these with your own Dydra account information. ...

October 10, 2025 · Updated: October 10, 2025 · 7 min · Nakamura

Declarative Multi-Format Conversion with TEI Processing Model

Introduction TEI (Text Encoding Initiative) is a widely used standard for digitizing humanities texts. This article introduces a case study of using the Processing Model feature introduced in TEI P5 to achieve conversion from TEI XML to multiple formats (HTML, LaTeX/PDF, EPUB3). https://www.tei-c.org/Vault/P5/3.0.0/doc/tei-p5-doc/en/html/TD.html#TDPM The target project uses texts published in the “Koui Genji Monogatari” (Collated Tale of Genji) as an example. https://kouigenjimonogatari.github.io/ Background Previously, conversion processes were performed individually, as introduced in the following articles. ...

October 8, 2025 · Updated: October 8, 2025 · 6 min · Nakamura

How to Control the Viewing Direction of Mirador from External Parameters

Overview This article explains the implementation for dynamically specifying the viewingDirection of the Mirador viewer via URL parameters. This feature allows the same manifest to be displayed left-to-right or right-to-left. Implementation 1. Retrieving URL Parameters Retrieve the viewingDirection parameter from the URL and set a default value: // Get viewing direction from URL parameters const urlParams = new URLSearchParams(window.location.search); const viewingDirection = urlParams.get('viewingDirection') || 'right-to-left'; In this implementation, when no parameter is specified, 'right-to-left' is used as the default. ...

October 6, 2025 · Updated: October 6, 2025 · 2 min · Nakamura

Odeuropa: The World of Linked Data for Extracting Scents from Historical Documents

Overview Odeuropa is a unique project that extracts descriptions of “scents” from European historical documents and structures them as Linked Data. This article explores the actual data through the SPARQL endpoint, revealing its structure and design philosophy. What is Odeuropa? Project name: Odeuropa (Odeurs d’Europe = Scents of Europe) Database URL: https://data.odeuropa.eu/ SPARQL endpoint: https://data.odeuropa.eu/repositories/odeuropa Web interface: https://explorer.odeuropa.eu/ Data Model Overview Odeuropa uses an extended ontology specialized for scents, built on top of CIDOC-CRM (Conceptual Reference Model for Cultural Heritage). ...

October 4, 2025 · Updated: October 4, 2025 · 5 min · Nakamura

Achieving Japanese Full-Text Search with the MroongaSearch Module for Omeka-S

Overview Omeka-S is a powerful digital archive system, but Japanese full-text search barely works by default. This article explains how to achieve Japanese full-text search by installing the MroongaSearch module. Background: Why the MroongaSearch Module is Needed Problems with Omeka-S Standard Search Omeka-S’s standard full-text search (FullTextSearch module) uses the InnoDB engine, which has the following critical issues: Example of Japanese word search: Data: "Studying artificial intelligence at the University of Tokyo" (東京大学で人工知能を研究する) Search term: "artificial intelligence" (人工知能) Result: No hits Since InnoDB’s full-text search assumes space-delimited languages like English, the following problems occur with Japanese: ...

October 2, 2025 · Updated: October 2, 2025 · 6 min · Nakamura

Azure OpenAI GPT-4 vs Document Intelligence: Comparative Evaluation of Japanese Vertical Text OCR

Overview We performed OCR processing on Japanese vertical-writing manuscript paper using two OCR services provided by Microsoft Azure (Azure OpenAI GPT-4 Vision and Azure Document Intelligence), and conducted a detailed comparative evaluation of the results. Test Image Image Source: Canva template (400-character manuscript paper) URL: https://www.canva.com/ja_jp/templates/EAFbqUoH7P8/ Image Characteristics: 20x20 grid, 400-character manuscript paper Vertical writing layout Light grid lines (cells) Distinction between title and body sections Ground Truth 原稿のタイトル佐藤ちあき原稿用紙に書くテキストが入ります。作文や小論文を作ったり、小説を書いたりなどにご活用ください。このテキストを使用する場合は、日本語の全角を使うことでマスにあった文字を打つことができます。手書きで使用したい場合は、このテキストを削除し、印刷してご使用ください。 1. Recognition Results by Azure OpenAI GPT-4.1 Recognized Text 原稿のタイトル佐藤　ちあき原稿用紙に書くテキストが入ります。作文や小論文を作ったり、小説を書いたりなどにご活用ください。このテキストを使用する場合は、日本語の全角を使うことでマスにあった文字を打つことができます。手書きで使用したい場合は、このテキストを削除し、印刷してご使用ください。 Evaluation GPT-4.1 demonstrated the following characteristics with vertical-writing manuscript paper: ...

September 29, 2025 · Updated: September 29, 2025 · 4 min · Nakamura