TL;DR

By switching from npx xslt3 (Saxon-JS) to Java Saxon-HE for TEI XML → HTML transformation, build time dropped from 1m48s to 23s (~5x speedup).

Background

Kōi Genji Monogatari Text DB is a digital edition of The Tale of Genji with 54 TEI XML files (one per chapter). The build script (Python) invoked npx xslt3 54 times to transform each XML into HTML.

python3 scripts/prebuild.py xsl   # XSLT for all 54 chapters

This was the slowest step in the entire build pipeline.

Benchmarks

Per-file comparison

ChapterCharsnpx xslt3 (JS)saxon (Java)Speedup
01 Kiritsubo11,2401.8s1.1s1.6x
34 Wakana-jō46,2304.9s0.4s12x

The larger the file, the more dramatic the improvement. After subtracting JVM startup cost (~1s), the actual transformation is orders of magnitude faster.

Total (all 54 chapters)

npx xslt3 (Saxon-JS):  1m48s
saxon (Saxon-HE):      23s

Migration

Local (macOS)

brew install saxon

Build script

Added a helper that falls back gracefully: SAXON_JAR env → saxon command → npx xslt3.

def xslt_cmd(xsl, src, dst):
    """Return XSLT command, preferring Saxon-HE over npx xslt3."""
    saxon_jar = os.environ.get('SAXON_JAR')
    if saxon_jar:
        return ['java', '-jar', saxon_jar, f'-xsl:{xsl}', f'-s:{src}', f'-o:{dst}']
    if shutil.which('saxon'):
        return ['saxon', f'-xsl:{xsl}', f'-s:{src}', f'-o:{dst}']
    return ['npx', 'xslt3', f'-xsl:{xsl}', f'-s:{src}', f'-o:{dst}']

GitHub Actions

Replaced Node.js + xslt3 with Java + Saxon-HE jar:

- name: Set up Java
  uses: actions/setup-java@v4
  with:
    distribution: 'temurin'
    java-version: '21'

- name: Download Saxon-HE
  run: |
    curl -sL -o /tmp/saxon-he.jar \
      https://repo1.maven.org/maven2/net/sf/saxon/Saxon-HE/12.5/Saxon-HE-12.5.jar

- name: Run prebuild
  env:
    SAXON_JAR: /tmp/saxon-he.jar
  run: python3 scripts/prebuild.py tei xsl waka stats

Output differences

HTML content is equivalent. Only formatting differs (whitespace, <!DOCTYPE html> vs <!DOCTYPE HTML>), with no visible impact in the browser.

Takeaways

  • Saxon-JS (JavaScript) is convenient but slows dramatically with larger files
  • Java Saxon-HE is free (MPL 2.0) and trivial to install via brew install saxon
  • A fallback-based approach keeps the build working in environments without Saxon