TL;DR
By switching from npx xslt3 (Saxon-JS) to Java Saxon-HE for TEI XML → HTML transformation, build time dropped from 1m48s to 23s (~5x speedup).
Background
Kōi Genji Monogatari Text DB is a digital edition of The Tale of Genji with 54 TEI XML files (one per chapter). The build script (Python) invoked npx xslt3 54 times to transform each XML into HTML.
python3 scripts/prebuild.py xsl # XSLT for all 54 chapters
This was the slowest step in the entire build pipeline.
Benchmarks
Per-file comparison
| Chapter | Chars | npx xslt3 (JS) | saxon (Java) | Speedup |
|---|---|---|---|---|
| 01 Kiritsubo | 11,240 | 1.8s | 1.1s | 1.6x |
| 34 Wakana-jō | 46,230 | 4.9s | 0.4s | 12x |
The larger the file, the more dramatic the improvement. After subtracting JVM startup cost (~1s), the actual transformation is orders of magnitude faster.
Total (all 54 chapters)
npx xslt3 (Saxon-JS): 1m48s
saxon (Saxon-HE): 23s
Migration
Local (macOS)
brew install saxon
Build script
Added a helper that falls back gracefully: SAXON_JAR env → saxon command → npx xslt3.
def xslt_cmd(xsl, src, dst):
"""Return XSLT command, preferring Saxon-HE over npx xslt3."""
saxon_jar = os.environ.get('SAXON_JAR')
if saxon_jar:
return ['java', '-jar', saxon_jar, f'-xsl:{xsl}', f'-s:{src}', f'-o:{dst}']
if shutil.which('saxon'):
return ['saxon', f'-xsl:{xsl}', f'-s:{src}', f'-o:{dst}']
return ['npx', 'xslt3', f'-xsl:{xsl}', f'-s:{src}', f'-o:{dst}']
GitHub Actions
Replaced Node.js + xslt3 with Java + Saxon-HE jar:
- name: Set up Java
uses: actions/setup-java@v4
with:
distribution: 'temurin'
java-version: '21'
- name: Download Saxon-HE
run: |
curl -sL -o /tmp/saxon-he.jar \
https://repo1.maven.org/maven2/net/sf/saxon/Saxon-HE/12.5/Saxon-HE-12.5.jar
- name: Run prebuild
env:
SAXON_JAR: /tmp/saxon-he.jar
run: python3 scripts/prebuild.py tei xsl waka stats
Output differences
HTML content is equivalent. Only formatting differs (whitespace, <!DOCTYPE html> vs <!DOCTYPE HTML>), with no visible impact in the browser.
Takeaways
- Saxon-JS (JavaScript) is convenient but slows dramatically with larger files
- Java Saxon-HE is free (MPL 2.0) and trivial to install via
brew install saxon - A fallback-based approach keeps the build working in environments without Saxon