Introduction
Deep Zoom technology is used to smoothly zoom and display high-resolution images on websites. There are cases where you need to restore the original high-resolution image from tiled image data generated by tools such as Microsoft Deep Zoom Composer.
This article explains the technology for restoring original high-resolution TIFF images from image data published in Deep Zoom format.
How Deep Zoom Images Work
Tile Structure
Deep Zoom images divide a single large image into multiple small tile images and store them in a pyramid structure:
- Level 0: Lowest resolution (usually 1 tile)
- Level N: Highest resolution (equivalent to the original image resolution)
- Resolution doubles at each level
File Structure
dzc_output.xml # Metadata
dzc_output_files/
├── 0/
│ └── 0_0.jpg # The only tile at level 0
├── 1/
│ ├── 0_0.jpg
│ └── ...
└── 16/ # Highest level
├── 0_0.jpg
├── 0_1.jpg
└── ... # Tens of thousands of tiles
Implementation Challenges and Solutions
Challenge 1: XML Metadata Namespace Differences
Deep Zoom has multiple versions with different XML namespaces:
http://schemas.microsoft.com/deepzoom/2008http://schemas.microsoft.com/deepzoom/2009
Solution: Implement a flexible XML parser supporting multiple namespaces
def fetch_xml_info(xml_url):
root = ET.fromstring(response.content)
# Try multiple namespaces
namespaces = [
'{http://schemas.microsoft.com/deepzoom/2008}Size',
'{http://schemas.microsoft.com/deepzoom/2009}Size',
'Size'
]
for ns in namespaces:
image_elem = root.find('.//' + ns)
if image_elem is not None:
break
width = int(image_elem.attrib['Width'])
height = int(image_elem.attrib['Height'])
return config
Challenge 2: Auto-Detection of Maximum Level
The maximum level stated in the XML may differ from the levels actually available on the server.
Solution: Send HEAD requests to verify existence
def find_actual_max_level(base_url, format_ext):
"""Detect the actual maximum level that exists"""
for level in range(20, -1, -1):
url = f"{base_url}{level}/0_0.{format_ext}"
try:
response = requests.head(url, timeout=10)
if response.status_code == 200:
return level
except:
continue
return None
Challenge 3: Efficient Download of Large Numbers of Tiles
High-resolution images require downloading tens of thousands of tiles (e.g., 29,146 tiles).
Solution: Parallel download using ThreadPoolExecutor
def download_tiles(tiles_list, base_url, level, format_ext):
session = requests.Session()
downloaded_tiles = []
with ThreadPoolExecutor(max_workers=10) as executor:
futures = {
executor.submit(download_tile, base_url, level,
col, row, format_ext, session): (col, row)
for col, row in tiles_list
}
with tqdm(total=len(futures)) as pbar:
for future in as_completed(futures):
result = future.result()
if result:
downloaded_tiles.append(result)
pbar.update(1)
return downloaded_tiles
Challenge 4: Handling Tile Overlap
Deep Zoom tiles have overlap (overlapping areas) for seamless display.
Solution: Coordinate calculation considering overlap
def reconstruct_image(tiles, tile_size, overlap):
for col, row, tile_img in tiles:
# Placement coordinates considering overlap
x = col * (tile_size - overlap)
y = row * (tile_size - overlap)
canvas.paste(tile_img, (x, y))
Challenge 5: Saving Large Images
Restored images can be several GB in size, potentially exceeding the standard TIFF 4GB limit.
Solution: Save in BigTIFF format
def save_bigtiff(image, output_path):
# First save as PNG (no file size limitation)
png_file = output_path.replace('.tif', '.png')
image.save(png_file, format='PNG', compress_level=6)
# Convert to BigTIFF using tifffile library
import tifffile
img_array = np.array(image)
tifffile.imwrite(
output_path,
img_array,
bigtiff=True, # Enable BigTIFF
compression='deflate', # Compression
tile=(256, 256) # Tiling
)
Implementation Results
Processing Performance
| Image Size | Tile Count | Download Time | Final File Size |
|---|---|---|---|
| 62533 x 29734 | 28,899 | Approx. 12 min | 3.7GB (TIFF) |
| 62588 x 29800 | 29,146 | Approx. 12 min | 3.4GB (TIFF) |
| 7760 x 10328 | 1,271 | Approx. 2 min | 72MB (TIFF) |
Parallel Processing Effectiveness
- Parallelism: 10 threads
- Average download speed: Approx. 40 tiles/sec
- Efficient use of network bandwidth
Technology Stack
# Main libraries
import requests # HTTP communication
from PIL import Image # Image processing
import numpy as np # Array operations
import tifffile # BigTIFF support
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm # Progress bar
Code Structure
download_deepzoom.py # Single image processing
batch_download_deepzoom.py # Batch processing
├─ fetch_xml_info() # XML parsing
├─ find_actual_max_level() # Level auto-detection
├─ download_tile() # Tile download
├─ reconstruct_image() # Image restoration
└─ save_bigtiff() # BigTIFF saving
Optimization Points
1. Session Reuse
session = requests.Session()
# Reuse HTTP connections within the same session
response = session.get(url)
2. Error Handling
try:
response = session.get(url, timeout=30)
if response.status_code == 200:
return Image.open(BytesIO(response.content))
except Exception as e:
print(f"Failed to download tile: {e}")
return None
3. Memory Efficiency
- Process tile by tile to minimize memory usage
- Create the large canvas only once
Use Cases
Digital Archives
- High-resolution image preservation of historical documents and artworks
- Complete restoration of map data
- Digital preservation of cultural assets
Data Migration
- Image data conversion during platform migration
- Complete image acquisition for backup purposes
- Image usage in offline environments
Summary
The restoration of Deep Zoom images involved the following technical challenges, which were successfully addressed:
- Handling XML namespace differences
- Auto-detection of actual maximum levels
- Parallel download of large numbers of tiles
- Image restoration considering overlap
- Large image storage in BigTIFF format
Using this method, ultra-high-resolution images of 60,000 x 30,000 pixels could be completely restored in approximately 12 minutes.
References
- Microsoft Deep Zoom Specification
- PIL/Pillow Documentation
- tifffile Library Documentation
- Python concurrent.futures
Note: The techniques described in this article should only be used on image data for which you have appropriate permissions. Please ensure compliance with copyright and licensing requirements.