Complete Restoration of Deep Zoom Images: Converting Tile Images to BigTIFF

Introduction

Deep Zoom technology is used to smoothly zoom and display high-resolution images on websites. There are cases where you need to restore the original high-resolution image from tiled image data generated by tools such as Microsoft Deep Zoom Composer.

This article explains the technology for restoring original high-resolution TIFF images from image data published in Deep Zoom format.

How Deep Zoom Images Work

Tile Structure

Deep Zoom images divide a single large image into multiple small tile images and store them in a pyramid structure:

Level 0: Lowest resolution (usually 1 tile)
Level N: Highest resolution (equivalent to the original image resolution)
Resolution doubles at each level

File Structure

dzc_output.xml              # Metadata
dzc_output_files/
  ├── 0/
  │   └── 0_0.jpg          # The only tile at level 0
  ├── 1/
  │   ├── 0_0.jpg
  │   └── ...
  └── 16/                   # Highest level
      ├── 0_0.jpg
      ├── 0_1.jpg
      └── ...               # Tens of thousands of tiles

Implementation Challenges and Solutions

Challenge 1: XML Metadata Namespace Differences

Deep Zoom has multiple versions with different XML namespaces:

http://schemas.microsoft.com/deepzoom/2008
http://schemas.microsoft.com/deepzoom/2009

Solution: Implement a flexible XML parser supporting multiple namespaces

def fetch_xml_info(xml_url):
    root = ET.fromstring(response.content)

    # Try multiple namespaces
    namespaces = [
        '{http://schemas.microsoft.com/deepzoom/2008}Size',
        '{http://schemas.microsoft.com/deepzoom/2009}Size',
        'Size'
    ]

    for ns in namespaces:
        image_elem = root.find('.//' + ns)
        if image_elem is not None:
            break

    width = int(image_elem.attrib['Width'])
    height = int(image_elem.attrib['Height'])
    return config

Challenge 2: Auto-Detection of Maximum Level

The maximum level stated in the XML may differ from the levels actually available on the server.

Solution: Send HEAD requests to verify existence

def find_actual_max_level(base_url, format_ext):
    """Detect the actual maximum level that exists"""
    for level in range(20, -1, -1):
        url = f"{base_url}{level}/0_0.{format_ext}"
        try:
            response = requests.head(url, timeout=10)
            if response.status_code == 200:
                return level
        except:
            continue
    return None

Challenge 3: Efficient Download of Large Numbers of Tiles

High-resolution images require downloading tens of thousands of tiles (e.g., 29,146 tiles).

Solution: Parallel download using ThreadPoolExecutor

def download_tiles(tiles_list, base_url, level, format_ext):
    session = requests.Session()
    downloaded_tiles = []

    with ThreadPoolExecutor(max_workers=10) as executor:
        futures = {
            executor.submit(download_tile, base_url, level,
                          col, row, format_ext, session): (col, row)
            for col, row in tiles_list
        }

        with tqdm(total=len(futures)) as pbar:
            for future in as_completed(futures):
                result = future.result()
                if result:
                    downloaded_tiles.append(result)
                pbar.update(1)

    return downloaded_tiles

Challenge 4: Handling Tile Overlap

Deep Zoom tiles have overlap (overlapping areas) for seamless display.

Solution: Coordinate calculation considering overlap

def reconstruct_image(tiles, tile_size, overlap):
    for col, row, tile_img in tiles:
        # Placement coordinates considering overlap
        x = col * (tile_size - overlap)
        y = row * (tile_size - overlap)
        canvas.paste(tile_img, (x, y))

Challenge 5: Saving Large Images

Restored images can be several GB in size, potentially exceeding the standard TIFF 4GB limit.

Solution: Save in BigTIFF format

def save_bigtiff(image, output_path):
    # First save as PNG (no file size limitation)
    png_file = output_path.replace('.tif', '.png')
    image.save(png_file, format='PNG', compress_level=6)

    # Convert to BigTIFF using tifffile library
    import tifffile
    img_array = np.array(image)
    tifffile.imwrite(
        output_path,
        img_array,
        bigtiff=True,           # Enable BigTIFF
        compression='deflate',   # Compression
        tile=(256, 256)         # Tiling
    )

Implementation Results

Processing Performance

Image Size	Tile Count	Download Time	Final File Size
62533 x 29734	28,899	Approx. 12 min	3.7GB (TIFF)
62588 x 29800	29,146	Approx. 12 min	3.4GB (TIFF)
7760 x 10328	1,271	Approx. 2 min	72MB (TIFF)

Parallel Processing Effectiveness

Parallelism: 10 threads
Average download speed: Approx. 40 tiles/sec
Efficient use of network bandwidth

Technology Stack

# Main libraries
import requests          # HTTP communication
from PIL import Image    # Image processing
import numpy as np       # Array operations
import tifffile          # BigTIFF support
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm    # Progress bar

Code Structure

download_deepzoom.py           # Single image processing
batch_download_deepzoom.py     # Batch processing
├─ fetch_xml_info()           # XML parsing
├─ find_actual_max_level()    # Level auto-detection
├─ download_tile()            # Tile download
├─ reconstruct_image()        # Image restoration
└─ save_bigtiff()             # BigTIFF saving

Optimization Points

1. Session Reuse

session = requests.Session()
# Reuse HTTP connections within the same session
response = session.get(url)

2. Error Handling

try:
    response = session.get(url, timeout=30)
    if response.status_code == 200:
        return Image.open(BytesIO(response.content))
except Exception as e:
    print(f"Failed to download tile: {e}")
    return None

3. Memory Efficiency

Process tile by tile to minimize memory usage
Create the large canvas only once

Use Cases

Digital Archives

High-resolution image preservation of historical documents and artworks
Complete restoration of map data
Digital preservation of cultural assets

Data Migration

Image data conversion during platform migration
Complete image acquisition for backup purposes
Image usage in offline environments

Summary

The restoration of Deep Zoom images involved the following technical challenges, which were successfully addressed:

Handling XML namespace differences
Auto-detection of actual maximum levels
Parallel download of large numbers of tiles
Image restoration considering overlap
Large image storage in BigTIFF format

Using this method, ultra-high-resolution images of 60,000 x 30,000 pixels could be completely restored in approximately 12 minutes.

References

Microsoft Deep Zoom Specification
PIL/Pillow Documentation
tifffile Library Documentation
Python concurrent.futures

Note: The techniques described in this article should only be used on image data for which you have appropriate permissions. Please ensure compliance with copyright and licensing requirements.

Introduction#

How Deep Zoom Images Work#

Tile Structure#

File Structure#

Implementation Challenges and Solutions#

Challenge 1: XML Metadata Namespace Differences#

Challenge 2: Auto-Detection of Maximum Level#

Challenge 3: Efficient Download of Large Numbers of Tiles#

Challenge 4: Handling Tile Overlap#

Challenge 5: Saving Large Images#

Implementation Results#

Processing Performance#

Parallel Processing Effectiveness#

Technology Stack#

Code Structure#

Optimization Points#

1. Session Reuse#

2. Error Handling#

3. Memory Efficiency#

Use Cases#

Digital Archives#

Data Migration#

Summary#

References#