Home Articles Books Search About
日本語

Pitfalls of Converting TEI XML Standoff Annotations to Inline, and a DOM-Based Solution

Digital Engishiki is a project that encodes the Engishiki — a collection of supplementary regulations for the ritsuryō legal system, completed in 927 CE — in TEI (Text Encoding Initiative) XML, making it browsable and searchable on the web. Led by the National Museum of Japanese History, the project provides TEI markup for critical editions, modern Japanese translations, and English translations, served through a Nuxt.js (Vue.js) based viewer. During development, we encountered a bug where converting TEI XML standoff annotations to inline annotations caused the XML document structure to collapse. This article records the cause and the DOM-based solution. ...

Fixing 6 GitHub Issues in Parallel with Claude Code: Worktrees and Agents

Fixing 6 GitHub Issues in Parallel with Claude Code: Worktrees and Agents

Introduction We develop a web-based viewer for historical sources structured in TEI/XML, built with Nuxt 2 + Vue 2 + Vuetify. This article describes how we used Claude Code’s worktree and agent features to address 6 GitHub Issues in parallel. Issues Addressed Group Count Description Priority A 3 Text viewer: nested element display bugs High B 1 Legend page: indentation not reflected Medium C 1 Analytics page: broken links High D 1 Keyword search crash High Approach: Worktrees × Parallel Agents Claude Code can run multiple agents in parallel, each in an isolated git worktree. We grouped the issues into 4 categories and launched 4 agents simultaneously. ...

TEI Publisher: A Platform for Publishing TEI XML Digital Editions

TEI Publisher: A Platform for Publishing TEI XML Digital Editions

Introduction TEI (Text Encoding Initiative) is an XML markup language widely adopted as the international standard for digitizing humanities texts. It can describe various textual materials — including classical texts, letters, inscriptions, and dictionaries — in a structured format. However, publishing TEI XML-encoded texts on the web in a readable format requires considerable technical expertise. This article introduces TEI Publisher, a platform that makes it easy to publish TEI XML digital editions. ...

Fast TEI/XML Deployment on Vercel: Automating XSLT Transformation with saxon-js

Fast TEI/XML Deployment on Vercel: Automating XSLT Transformation with saxon-js

Introduction A common architecture in Digital Humanities is to transform TEI (Text Encoding Initiative) XML data into HTML using XSLT and publish it on the web. Traditionally, client-side XSLT transformation in the browser (via <?xml-stylesheet?> or JavaScript’s XSLTProcessor) has been the standard approach, but it comes with several challenges: The browser executes XSLT transformation on every page load, resulting in slow rendering Poor SEO and web crawler support Inconsistent XSLT implementations across browsers This article shows how to run XML-to-HTML transformation at build time on Vercel and serve pre-generated static HTML. ...

Fast TEI/XML Deployment on Vercel: Automating XSLT Transforms with saxon-js

Fast TEI/XML Deployment on Vercel: Automating XSLT Transforms with saxon-js

Introduction A common architecture in Digital Humanities is to encode texts in TEI (Text Encoding Initiative) XML and transform them to HTML via XSLT for web publication. Traditionally, this transformation is done client-side in the browser (using <?xml-stylesheet?> or JavaScript’s XSLTProcessor), but this approach has several drawbacks: The browser must run the XSLT transformation on every page load, slowing down rendering Poor SEO / crawler support Browser-specific XSLT implementation differences This article describes how to run XSLT transforms at build time on Vercel and serve pre-built HTML as a static site. ...

Migrating to DTS (Distributed Text Services) 1.0 ― Updating a TEI/XML Text API

Introduction In February 2026, the v1.0 of the Distributed Text Services (DTS) specification was officially released — a standard API for accessing text collections. This article documents the changes required to migrate the Kouigenji Monogatari Text Database DTS API from 1-alpha to 1.0. https://github.com/distributed-text-services/specifications/releases/tag/v1.0 What is DTS? DTS defines a standard API for accessing text collections such as TEI/XML. It consists of four endpoints: Endpoint Purpose Entry Point Returns URLs for each API endpoint Collection Inter-text navigation (listing collections and resources) Navigation Intra-text navigation (exploring citation structures) Document Retrieving text content (full or partial TEI/XML) Target Project A TypeScript/Express.js implementation of DTS for the Kouigenji Monogatari Text Database. ...

Adding a CETEIcean-Powered TEI Preview to the DOCX → TEI/XML Converter

Adding a CETEIcean-Powered TEI Preview to the DOCX → TEI/XML Converter

Introduction In a previous post, I introduced a DOCX → TEI/XML Converter — a browser-based tool that converts Word documents to TEI/XML using the TEI Garage API. After publishing, I received feedback from users requesting the ability to visually verify that the converted tags function as expected. With only the syntax-highlighted XML view, it was difficult to confirm how headings, notes, lists, and tables would actually render. To address this, I added a TEI preview feature using CETEIcean. ...

5x Faster XSLT Processing: Migrating from Saxon-JS to Saxon-HE

TL;DR By switching from npx xslt3 (Saxon-JS) to Java Saxon-HE for TEI XML → HTML transformation, build time dropped from 1m48s to 23s (~5x speedup). Background Kōi Genji Monogatari Text DB is a digital edition of The Tale of Genji with 54 TEI XML files (one per chapter). The build script (Python) invoked npx xslt3 54 times to transform each XML into HTML. python3 scripts/prebuild.py xsl # XSLT for all 54 chapters This was the slowest step in the entire build pipeline. ...

Building a DOCX to TEI/XML Conversion Tool in the Browser Using the TEI Garage API

Building a DOCX to TEI/XML Conversion Tool in the Browser Using the TEI Garage API

Introduction TEI (Text Encoding Initiative) is an international standard for digitally structuring texts in the humanities. It is used in libraries, museums, and academic research, but writing TEI/XML directly requires knowledge of markup, making the barrier to entry high. This is where conversion tools from Microsoft Word (.docx) to TEI/XML come in. A well-known example is TEI Garage (formerly OxGarage), but its multi-purpose nature makes the UI somewhat complex. This time, I created a simple browser-based tool specialized for DOCX to TEI/XML conversion. ...

Exporting Web Annotations via the Hypothes.is API and Converting to TEI/XML

Exporting Web Annotations via the Hypothes.is API and Converting to TEI/XML

Introduction Hypothes.is is an open-source annotation tool that allows you to add highlights and comments on web pages. It can be easily used through browser extensions or JavaScript embedding, but there are cases where you may want to back up accumulated annotations or utilize them in other formats such as TEI/XML. This article introduces how to export annotations using the Hypothes.is API and convert them to TEI/XML. Obtaining an API Key Log in to Hypothes.is Go to Developer settings Generate an API key with “Generate your API token” Save the obtained key in a .env file. ...

Trying "oitei" - An Automatic Conversion Tool from OpenITI mARkdown to TEI XML

Trying "oitei" - An Automatic Conversion Tool from OpenITI mARkdown to TEI XML

Introduction In the OpenITI (Open Islamicate Texts Initiative) project, which handles historical texts from the Islamicate world, texts can be tagged using a lightweight notation called mARkdown instead of TEI/XML. While TEI/XML is a powerful international standard for structuring texts, it has problems with right-to-left (RTL) languages like Arabic, where mixing XML tags causes display issues in editors. mARkdown was designed to solve this problem. In this article, we will try running oitei, a Python tool that automatically converts mARkdown texts to TEI XML. ...

ODD Editing Tips: Part 1

ODD Editing Tips: Part 1

Restricting an Element’s Attributes to Specific Ones Only By default in TEI, elements inherit many attribute classes (att.global, att.datable, etc.), making numerous attributes available. If you want to allow only specific attributes, configure it as follows. Example: Allowing Only xml:id and corresp on persName <elementSpec ident="persName" mode="change"> <classes mode="change"> <!-- 属性クラスを削除(モデルクラスは維持) --> <memberOf key="att.global" mode="delete"/> <memberOf key="att.cmc" mode="delete"/> <memberOf key="att.datable" mode="delete"/> <memberOf key="att.editLike" mode="delete"/> <memberOf key="att.personal" mode="delete"/> <memberOf key="att.typed" mode="delete"/> </classes> <attList> <attDef ident="xml:id" mode="add" usage="opt"> <desc>要素の一意な識別子</desc> <datatype> <dataRef name="ID"/> </datatype> </attDef> <attDef ident="corresp" mode="add" usage="opt"> <desc>関連する人物情報へのリンク</desc> <datatype> <dataRef key="teidata.pointer"/> </datatype> </attDef> </attList> </elementSpec> Key Points Use <classes mode="change">: If you use mode="replace" and leave it empty, the model classes will also be deleted, making the element itself unusable Delete attribute classes individually: Remove unnecessary attribute classes with <memberOf key="att.xxx" mode="delete"/> Add required attributes: Define the attributes you want to allow with <attDef ident="xxx" mode="add"> Notes You can check which attribute classes an element belongs to in the TEI Guidelines Deleting att.global will also remove xml:id, xml:lang, etc., so add them individually as needed Adding Attributes to an Element When adding a new attribute while keeping existing attribute classes: ...

Constraint Design for IIIF-Compatible Facsimile Description Using TEI ODD

Constraint Design for IIIF-Compatible Facsimile Description Using TEI ODD

Introduction When describing metadata for digital images in TEI (Text Encoding Initiative), the facsimile element is used. Particularly in IIIF (International Image Interoperability Framework) compatible digital archives, it is important to properly describe references to manifests, canvases, and the Image API. This article introduces how to define the constraints needed for facsimile descriptions as a schema using ODD (One Document Does it all). Guidelines Followed This ODD is based on the “Linking with IIIF Images” specification introduced in the Japanese TEI guidelines: ...

ODD Chain Tutorial

ODD Chain Tutorial

A tutorial for learning how to customize schemas using the TEI ODD “chain” feature. What is an ODD Chain There are two approaches to ODD chains: 1. Inheritance (Vertical Chain) Uses the source attribute to reference a parent ODD and inherit customizations. TEI_all → Base ODD → Derived ODD → Further derivations... 2. Combination (Horizontal Chain) Uses specGrp and specGrpRef to combine multiple ODDs. Header ODD ──┬──→ Combined schema Body ODD ────┘ Directory Structure tutorials/ ├── 01-inheritance/ # Inheritance examples │ ├── base.odd # Base ODD │ └── derived.odd # Derived ODD inheriting from base.odd ├── 02-chain/ # Combination examples │ ├── header-specs.odd # Header-related customizations │ ├── text-specs.odd # Body text-related customizations │ ├── main.odd # Main ODD for integration │ └── merge-specs.xsl # XSLT for expanding specGrpRef ├── output/ # Generated files │ ├── base.rng # Generated from 01 base ODD │ ├── base.html # HTML documentation for the above │ ├── derived.rng # Generated from 01 derived ODD │ ├── derived.html # HTML documentation for the above │ ├── combined.rng # Generated from 02 combined ODD │ ├── combined.html # HTML documentation for the above │ └── intermediate/ # Intermediate files │ ├── base.compiled.odd │ ├── derived.compiled.odd │ ├── combined.merged.odd │ └── combined.compiled.odd ├── build.sh # Build script └── README.md # This file Prerequisites Saxon (XSLT 2.0 processor) TEI Stylesheets (installed at ../scripts/Stylesheets) Build Instructions cd tutorials ./build.sh Generated Files Source ODD RNG HTML 01-inheritance/base.odd output/base.rng output/base.html 01-inheritance/derived.odd output/derived.rng output/derived.html 02-chain/main.odd (after combination) output/combined.rng output/combined.html File Descriptions 01-inheritance (Inheritance) base.odd The base ODD containing minimal modules and basic customizations. ...

Customizing the TEI Classical Text Viewer to Display Illegible Sections (gap)

Customizing the TEI Classical Text Viewer to Display Illegible Sections (gap)

Introduction When digitizing East Asian classical texts, it has become common to mark them up in XML following TEI (Text Encoding Initiative) guidelines. The “TEI Classical Text Viewer” developed by the International Institute of Humanistic Research is a convenient tool that can easily display such TEI/XML files in a browser. Official site: https://tei.dhii.jp/teiviewer4eaj Web version: https://candra.dhii.jp/nagasaki/tei/tei_viewer/ This time, I customized this viewer to support displaying <gap> tags that indicate illegible sections. This article introduces the customization method. ...

Declarative Multi-Format Conversion with TEI Processing Model

Declarative Multi-Format Conversion with TEI Processing Model

Introduction TEI (Text Encoding Initiative) is a widely used standard for digitizing humanities texts. This article introduces a case study of using the Processing Model feature introduced in TEI P5 to achieve conversion from TEI XML to multiple formats (HTML, LaTeX/PDF, EPUB3). https://www.tei-c.org/Vault/P5/3.0.0/doc/tei-p5-doc/en/html/TD.html#TDPM The target project uses texts published in the “Koui Genji Monogatari” (Collated Tale of Genji) as an example. https://kouigenjimonogatari.github.io/ Background Previously, conversion processes were performed individually, as introduced in the following articles. ...

Guide to Publishing TEI/XML Files on GitHub

Guide to Publishing TEI/XML Files on GitHub

Introduction This article explains the procedure for uploading TEI (Text Encoding Initiative) format XML files to GitHub and creating URLs that anyone can access. TEI/XML is an international standard format for structurally describing texts such as historical documents and literary works. By using GitHub, you can share your research data with researchers around the world. What You Need A computer (Windows, Mac, or Linux) Internet connection TEI/XML files (that you already have) Email address (for creating a GitHub account) About Sample Files If you don’t have TEI/XML files, you can use the following TEI/XML file from the Koui Genji Monogatari for practice: ...

TEI ODD File Customization: A Case Study with NDL Classical Book OCR

TEI ODD File Customization: A Case Study with NDL Classical Book OCR

Overview TEI (Text Encoding Initiative) is an international standard for digitizing and sharing texts in humanities research. This article introduces the process of customizing a TEI ODD file to match the output format of the NDL Classical Book OCR-Lite application. ODD (One Document Does it all) is a mechanism for customizing TEI schemas, allowing you to define your own schema containing only the elements and attributes you need. Background: Developing the NDL Classical Book OCR-Lite Application We are developing an application that outputs the results of NDL Classical Book OCR-Lite in TEI/XML format. The application is designed to perform OCR processing on Japanese classical books and output the results in standard TEI format. ...

Converting ODD to RNG/HTML Using the TEI Garage API

Converting ODD to RNG/HTML Using the TEI Garage API

Introduction Generating schemas (RNG) and documentation (HTML) from TEI (Text Encoding Initiative) ODD (One Document Does it all) files is an important process in TEI projects. This article analyzes how the TEI Garage API, used internally by Roma (the TEI ODD editor), works and introduces how to call the API directly from scripts to convert ODD files. What Is TEI Garage? TEI Garage is a web service provided by the TEI community that can perform conversions between various formats. For ODD file processing in particular, it provides the following features: ...

Implementation Guide for TEI XML Schema Combining RELAX NG and Schematron

Implementation Guide for TEI XML Schema Combining RELAX NG and Schematron

! After manual verification, an AI wrote this article. Introduction When editing TEI (Text Encoding Initiative) XML, in addition to structural validation of elements and attributes, more complex business rule validation may be needed. This article explains how to combine RELAX NG (RNG) and Schematron to achieve both structural and content validation, using challenges encountered in an actual project as examples. The Problem to Solve When editing classical Japanese literary texts in TEI XML, the following requirements arose: ...