This post documents the steps taken to improve tile delivery performance on a Cantaloupe IIIF image server running on AWS, starting from an initial cold tile fetch time of 8.8 seconds.
Environment
- AWS EC2 us-east-1, 2 vCPU (t3.large), 7.6 GB RAM
- Cantaloupe:
islandora/cantaloupe:6.3.12(Cantaloupe 5.0.7, final release) - Source storage: S3Source
- Reverse proxy: Traefik with Let’s Encrypt TLS
- Test image:
clioimg/shyuga.tif(46825×28127 px)
Initial State
Cold tile requests were taking 8.8 seconds. Examining the setup revealed several contributing factors:
| Item | State |
|---|---|
| JVM heap limit | 512 MB (default) |
| FilesystemCache | 386 MB / 13,950 files (active) |
| Cache volume | Not persisted — started with docker-compose.yml instead of docker-compose.prod.yml |
| Source image format | Strip TIFF (created with ImageMagick convert) — requires reading ~9 chunks (~2 MB each) from S3 per tile |
Fix 1: Convert to Pyramid Tiled TIFF
The largest single improvement came from converting the source TIFF from strip layout to pyramid tiled format. A strip TIFF has no internal resolution levels; Cantaloupe must read multiple sequential chunks from S3 to reconstruct any given tile region. A pyramid tiled TIFF stores pre-computed resolution levels with 256×256 tiles, so each tile request reads only the relevant portion of the file.
vips tiffsave input.tif output.tif --tile --pyramid --tile-width 256 --tile-height 256 --compression jpeg --Q 85
Output size: 219 MB (Q85, pyramid tiled TIFF).
| Metric | Strip TIF (before) | ptif Q85 (after) |
|---|---|---|
| First tile (cold) | 8.8s | 0.7–1.3s |
| Second tile (cold, different region) | 1.7s | 0.7s |
| Cached tile | 0.04s | 0.01s |
| S3 chunk reads | ~9 | few |
| Image processing time | 1073ms | 510ms |
Fix 2: Increase JVM Heap
The container memory limit is 2 GB, but Cantaloupe was running with the JVM default heap of 512 MB. The islandora/cantaloupe:6.3.12 startup script supports CANTALOUPE_HEAP_MIN and CANTALOUPE_HEAP_MAX environment variables directly:
CANTALOUPE_HEAP_MIN: "512m"
CANTALOUPE_HEAP_MAX: "1536m"
Note that JAVA_OPTS is not referenced by the startup script and has no effect. In the older 2.0.10 image, these environment variables were not available, requiring a custom startup script to be mounted.
Fix 3: Persist the Cache Volume
docker-compose.prod.yml had cantaloupe_cache:/data configured correctly, but the service had been started with the plain docker-compose.yml, which omitted the named volume. As a result, the on-disk tile cache was lost on every container restart. Switching to the production compose file ensured the FilesystemCache persisted across restarts.
Fix 4: Upgrade the Cantaloupe Image
The server was initially running islandora/cantaloupe:2.0.10 (Cantaloupe 5.0.5). Upgrading to islandora/cantaloupe:6.3.12 (Cantaloupe 5.0.7) produced a measurable improvement in image processing time:
| Metric | 2.0.10 (5.0.5) | 6.3.12 (5.0.7) |
|---|---|---|
| First tile (cold) | 1.3s | 0.84s |
| Image processing time | 510ms | 302ms |
Fix 5: CloudFront CDN
The server is in us-east-1; most users access it from Japan (~146ms RTT). Adding CloudFront with a Tokyo edge location reduces perceived latency significantly for cache hits.
Before: User (Japan) ──146ms RTT──→ EC2 (us-east-1) → Cantaloupe → S3
After: User (Japan) ──few ms──→ CloudFront Tokyo edge ──→ EC2 → Cantaloupe → S3
↑ cache hit: served here
CloudFront configuration:
- Origin: origin domain on port 443, HTTPS only (via Traefik)
- Cache TTL: min 1 day, default 30 days, max 1 year
- Price Class: PriceClass_200
- HTTP/2 + HTTP/3 enabled
- ACM certificate: issued in us-east-1 (CloudFront requires certificates in that region)
info.json @id issue
Cantaloupe generates the @id field in info.json from the incoming Host header. When CloudFront forwards a request to the origin, the Host header is the origin domain rather than the CloudFront domain. This causes Mirador to read the @id value and then bypass CloudFront, sending subsequent tile requests directly to the origin.
The fix is to set CANTALOUPE_BASE_URI to the CloudFront domain:
CANTALOUPE_BASE_URI: "https://<public-domain>"
With this set, Cantaloupe always embeds the CloudFront domain in @id, and Mirador routes all requests through the CDN.
Parallel tile fetch results (Mirador simulation, 20 tiles)
| Scenario | Total time |
|---|---|
| CloudFront miss (cold) | 6.2s |
| CloudFront hit (cached) | 1.3s (from US); ~0.1–0.2s from Japan |
| Direct localhost (cold) | 4.4s |
What Didn’t Help
CacheStrategy: Switching processor.stream_retrieval_strategy from StreamStrategy to CacheStrategy (downloads the full source file locally before processing) made cold requests slower — 4.8s vs. 0.84s. When the source is already a tiled TIFF on S3, downloading the entire file is unnecessary overhead. Reverted.
TurboJpegProcessor: The config was set to use TurboJpegProcessor, but it falls back to Java2dProcessor in practice. In the 2.0.10 image, the bundled Java bindings are incompatible with the OS-level libjpeg-turbo 2.1.5 API:
Failed to initialize TurboJpegProcessor
(error: 'org.libjpegturbo.turbojpeg.TJScalingFactor[] org.libjpegturbo.turbojpeg.TJ.getScalingFactors()')
In the 6.3.12 image, the library compatibility issue is resolved, but TurboJpegProcessor does not support TIF sources, so Java2dProcessor is used regardless.
Potential Further Improvements
Cache pre-warm: pre-requesting all tiles for frequently accessed images would ensure most user requests are served from CloudFront edge. Once cached there, cold-start latency becomes a non-issue for those images.
Larger instance: at 2 vCPU (t3.large), CPU reaches ~122% during Mirador parallel tile requests. Upgrading to t3.xlarge (4 vCPU / 16 GB) would roughly double cold parallel throughput.
Tokyo region migration: moving EC2 and S3 to ap-northeast-1 would reduce origin latency on CloudFront misses. With a high CDN cache hit rate, the practical impact is limited.
Final Architecture
User → CloudFront (public domain)
↓ on miss
Traefik (origin domain:443, Let's Encrypt TLS)
↓
Cantaloupe (islandora/cantaloupe:6.3.12, JVM heap 1536 MB)
↓
S3 (us-east-1)
Summary
| Metric | Before | After |
|---|---|---|
| First tile (cold) | 8.8s | 0.84s |
| Mirador 20-tile parallel (cold) | est. 15s+ | 6.2s (CloudFront miss) |
| Cached tile (from Japan) | ~292ms RTT + processing | ~0.1–0.2s (edge) |
The pyramid TIFF conversion had the largest effect on raw Cantaloupe processing time. CloudFront had the largest effect on perceived latency for users outside the origin region.