This article is co-authored with generative AI. While I have cross-checked facts against official documentation where possible, errors may remain. Please verify primary sources before making important decisions.
Background and disclaimer
The ROIS-DS Center for Open Data in the Humanities (hereafter "CODH") website at codh.rois.ac.jp is currently suspended for long-term maintenance (official notice from ROIS-DS dated 2026-02-24, no announced reopen date).
Digital Tale of Genji had a feature we call "patapata face comparison" for comparing illustrated faces across manuscripts, and it directly called CODH's vdiff.js (a tool that overlays two images and visualizes their differences). With CODH offline, the feature stopped working entirely. As a stop-gap until CODH service resumes, I extracted the vdiff.js distribution from the Wayback Machine and put a mirror on our own host. This post is the write-up of that procedure.
This post is strictly about a stop-gap until CODH service resumes.
- vdiff.js is published under the MIT license (CODH / core contributor: Jun HOMMA (@2SC1815J)) and re-distribution is permitted (with attribution preserved)
- However, freezing it on a particular Wayback snapshot means you don't get any of CODH's later bug fixes or improvements
- After CODH resumes, the plan is to switch URLs back to the upstream and retire the mirror
CODH publishes other browser-based tools too โ IIIF Curation Viewer / Manager / Editor / Player / Board, Soan, and so on. The procedure here applies to those as well, but their tool-specific gotchas (large numbers of dependencies, runtime dependencies invisible in <script> tags, etc.) are covered in a separate post: Setting up a dedicated GitHub Pages repository for a temporary CODH-tool mirror.
Framework-specific gotchas on the consumer site (Nuxt and similar) โ like the static generator's crawler overwriting static/<tool>/index.html with a 404 app shell, or stale Service Worker caches delaying rollouts โ are in yet another post: Embedded apps under static/ get overwritten with the 404 page by Nuxt 2 generate, and how to retire the old Service Worker.
The symptom
In Digital Tale of Genji's case, the link from https://genji.dl.itc.u-tokyo.ac.jp/picture/face/ (the patapata face comparison page) that launched image comparison looked like:
<a href="http://codh.rois.ac.jp/software/vdiffjs/demo/?img1=...&img2=...">
It called CODH's vdiff.js demo app directly. With the CODH host unreachable, the entire feature became unusable.
Overall plan
- Use the Wayback Machine
id_flag to fetch raw bytes, avoiding wombat injection - The ZIP distribution (
vdiffjs_latest.zip) wasn't archived in Wayback, so mirrordemo/index.htmland the JS / CSS / images it references individually - Put it under our public directory as
vdiff/, served at/vdiff/?... - Replace existing
<a href="http://codh.rois.ac.jp/software/vdiffjs/demo/?...">with<a href="/vdiff/?...">
1. Fetching "raw" assets from the Wayback Machine
The Wayback Machine does archive CSS/JS. However, the standard playback URL (/web/<timestamp>/<orig_url>) injects a URL-rewriting shim called wombat at the top of every JavaScript file. In an environment where the rewrite assumptions don't hold (i.e., on your own host), it can break things. Concretely, the vdiff.js wrapper JS hit this issue: the UI rendered, but thumbnails and the comparison preview never appeared.
To avoid this, append the id_ flag (identity / raw bytes) to the URL, and add --compressed so curl auto-decompresses any Content-Encoding: gzip (id_ returns the original capture's Content-Encoding as-is, so if the original was gzip, curl needs to inflate it):
TS=20250822143200
URL=https://web.archive.org/web/${TS}id_/http://codh.rois.ac.jp/software/vdiffjs/demo/vdiffjs/vdiff.bundle.min.js
# This gives you the raw JS, with no wombat injection
curl -sL --compressed --max-time 60 "$URL" -o vdiff.bundle.min.js
Wayback URL flags:
| Flag | Use |
|---|---|
| (none) | Normal playback. HTML gets a banner, JS gets wombat injection |
if_ | iframe hint. In practice it's treated like mp_, so the HTML banner and JS wombat injection are usually still there |
id_ | identity. No rewriting whatsoever โ the original bytes are returned (HTML/JS/CSS all served raw) |
For restoring distributables like JavaScript / CSS / images / fonts, id_ is the right choice.
The CDX API is handy when picking a snapshot:
curl -sL "http://web.archive.org/cdx/search/cdx?url=codh.rois.ac.jp/software/vdiffjs/demo/vdiffjs/vdiff.bundle.min.js&output=json&limit=20"
2. Analyzing the demo app's structure
Pull the archived index.html of the vdiff.js demo (https://codh.rois.ac.jp/software/vdiffjs/demo/) and enumerate its external resource references:
# First fetch the HTML
curl -sL --compressed "https://web.archive.org/web/2025id_/http://codh.rois.ac.jp/software/vdiffjs/demo/" -o demo.html
# Enumerate referenced assets (skip external CDNs, GA, Wayback itself)
grep -oE '(href|src)="[^"]*"' demo.html \
| grep -ivE "googletagmanager|google-analytics|web-static|web.archive|fonts.googleapis|favicon"
The minimal vdiff.js install is 6 files (1 HTML + 4 JS/CSS + OpenCV):
demo/
โโโ index.html
โโโ vdiffjs/
โ โโโ vdiff.bundle.css
โ โโโ vdiff.bundle.min.js
โโโ vdiffjs-wrapper.js
โโโ vdiffjs-wrapper.css
โโโ opencv/
โโโ opencv-4.5.1.js # ~7.3MB
The spin.js / jquery.spin.js loads from jsDelivr can stay as-is (jsDelivr is still alive). Only mirror those if you want full offline operation.
3. Tips for bulk downloading
Looping over referenced assets, you'll start hitting Wayback's rate limit and getting HTTP 000 (connection failures) midway through.
- Add a small sleep between requests (1โ3 seconds)
- Collect failures into a separate list and retry later
- Stretch
--max-timeto about90
for asset in "${ASSETS[@]}"; do
mkdir -p "$(dirname "$asset")"
http_code=$(curl -sL --compressed --max-time 90 -w "%{http_code}" \
"${BASE}/${asset}" -o "${asset}")
if [[ "$http_code" != "200" ]]; then
echo "FAIL ($http_code): $asset"
rm -f "${asset}"
fi
sleep 2
done
For just vdiff.js you only have ~5 files, so this finishes instantly. The pattern is for tools with more dependencies.
4. Placement and link rewriting
Drop the downloaded directory under your public-serving directory and /vdiff/... works (most static hosts behave).
Then rewrite the consumer-site links. In Digital Tale of Genji's getVDiffUrl() (a Vue/Nuxt 2 class component method):
getVDiffUrl(n1, n2, n3) {
...
- return `http://codh.rois.ac.jp/software/vdiffjs/demo/?img1=${...}&img2=${...}`
+ return `/vdiff/?img1=${...}&img2=${...}`
}
The vdiff wrapper (vdiffjs-wrapper.js) parses location.search directly, so the query-string construction logic doesn't need changes.
5. License and attribution
vdiff.js is published under the MIT license (Core contributor: Jun HOMMA (@2SC1815J)). Re-distribution requires preserving the license and copyright, so I leave the comment header in each file untouched:
/*
* vdiff.js - JavaScript-based visual differencing tool
* http://codh.rois.ac.jp/software/vdiffjs/
* Copyright 2021 Center for Open Data in the Humanities, Research Organization of Information and Systems
* Released under the MIT license
*/
The original distributor is CODH, so it's also good practice to put a note in your consumer site's UI text along the lines of: "This is a temporary mirror of vdiff.js by the ROIS-DS Center for Open Data in the Humanities (CODH), served from a Wayback Machine snapshot during CODH's maintenance period."
Closing
This mirror is purely a stop-gap until CODH service resumes. Once it does, I'll point links back to the upstream and retire the mirror.
The fact that this kind of stop-gap is even possible is thanks to CODH and core contributor Jun HOMMA (@2SC1815J), who have published quality tools and datasets openly for years. My sincere thanks.
References
- About the CODH (Center for Open Data in the Humanities) website | ROIS-DS โ Official long-term-maintenance notice
- vdiff.js - JavaScript-based visual differencing tool | CODH
- Setting up a dedicated GitHub Pages repository for a temporary CODH-tool mirror โ A follow-up post that mirrors vdiff-seq, IIIF Curation tools, and Soan in a single GitHub Pages repo
- Wayback Machine
- @2SC1815J on GitHub
