Introduction

Deep Zoom technology is used to smoothly zoom and display high-resolution images on websites. There are cases where you need to restore the original high-resolution image from tiled image data generated by tools such as Microsoft Deep Zoom Composer.

This article explains the technology for restoring original high-resolution TIFF images from image data published in Deep Zoom format.

How Deep Zoom Images Work

Tile Structure

Deep Zoom images divide a single large image into multiple small tile images and store them in a pyramid structure:

  • Level 0: Lowest resolution (usually 1 tile)
  • Level N: Highest resolution (equivalent to the original image resolution)
  • Resolution doubles at each level

File Structure

dzc_output.xml              # Metadata
dzc_output_files/
  ├── 0/
  │   └── 0_0.jpg          # The only tile at level 0
  ├── 1/
  │   ├── 0_0.jpg
  │   └── ...
  └── 16/                   # Highest level
      ├── 0_0.jpg
      ├── 0_1.jpg
      └── ...               # Tens of thousands of tiles

Implementation Challenges and Solutions

Challenge 1: XML Metadata Namespace Differences

Deep Zoom has multiple versions with different XML namespaces:

  • http://schemas.microsoft.com/deepzoom/2008
  • http://schemas.microsoft.com/deepzoom/2009

Solution: Implement a flexible XML parser supporting multiple namespaces

def fetch_xml_info(xml_url):
    root = ET.fromstring(response.content)

    # Try multiple namespaces
    namespaces = [
        '{http://schemas.microsoft.com/deepzoom/2008}Size',
        '{http://schemas.microsoft.com/deepzoom/2009}Size',
        'Size'
    ]

    for ns in namespaces:
        image_elem = root.find('.//' + ns)
        if image_elem is not None:
            break

    width = int(image_elem.attrib['Width'])
    height = int(image_elem.attrib['Height'])
    return config

Challenge 2: Auto-Detection of Maximum Level

The maximum level stated in the XML may differ from the levels actually available on the server.

Solution: Send HEAD requests to verify existence

def find_actual_max_level(base_url, format_ext):
    """Detect the actual maximum level that exists"""
    for level in range(20, -1, -1):
        url = f"{base_url}{level}/0_0.{format_ext}"
        try:
            response = requests.head(url, timeout=10)
            if response.status_code == 200:
                return level
        except:
            continue
    return None

Challenge 3: Efficient Download of Large Numbers of Tiles

High-resolution images require downloading tens of thousands of tiles (e.g., 29,146 tiles).

Solution: Parallel download using ThreadPoolExecutor

def download_tiles(tiles_list, base_url, level, format_ext):
    session = requests.Session()
    downloaded_tiles = []

    with ThreadPoolExecutor(max_workers=10) as executor:
        futures = {
            executor.submit(download_tile, base_url, level,
                          col, row, format_ext, session): (col, row)
            for col, row in tiles_list
        }

        with tqdm(total=len(futures)) as pbar:
            for future in as_completed(futures):
                result = future.result()
                if result:
                    downloaded_tiles.append(result)
                pbar.update(1)

    return downloaded_tiles

Challenge 4: Handling Tile Overlap

Deep Zoom tiles have overlap (overlapping areas) for seamless display.

Solution: Coordinate calculation considering overlap

def reconstruct_image(tiles, tile_size, overlap):
    for col, row, tile_img in tiles:
        # Placement coordinates considering overlap
        x = col * (tile_size - overlap)
        y = row * (tile_size - overlap)
        canvas.paste(tile_img, (x, y))

Challenge 5: Saving Large Images

Restored images can be several GB in size, potentially exceeding the standard TIFF 4GB limit.

Solution: Save in BigTIFF format

def save_bigtiff(image, output_path):
    # First save as PNG (no file size limitation)
    png_file = output_path.replace('.tif', '.png')
    image.save(png_file, format='PNG', compress_level=6)

    # Convert to BigTIFF using tifffile library
    import tifffile
    img_array = np.array(image)
    tifffile.imwrite(
        output_path,
        img_array,
        bigtiff=True,           # Enable BigTIFF
        compression='deflate',   # Compression
        tile=(256, 256)         # Tiling
    )

Implementation Results

Processing Performance

Image SizeTile CountDownload TimeFinal File Size
62533 x 2973428,899Approx. 12 min3.7GB (TIFF)
62588 x 2980029,146Approx. 12 min3.4GB (TIFF)
7760 x 103281,271Approx. 2 min72MB (TIFF)

Parallel Processing Effectiveness

  • Parallelism: 10 threads
  • Average download speed: Approx. 40 tiles/sec
  • Efficient use of network bandwidth

Technology Stack

# Main libraries
import requests          # HTTP communication
from PIL import Image    # Image processing
import numpy as np       # Array operations
import tifffile          # BigTIFF support
from concurrent.futures import ThreadPoolExecutor
from tqdm import tqdm    # Progress bar

Code Structure

download_deepzoom.py           # Single image processing
batch_download_deepzoom.py     # Batch processing
├─ fetch_xml_info()           # XML parsing
├─ find_actual_max_level()    # Level auto-detection
├─ download_tile()            # Tile download
├─ reconstruct_image()        # Image restoration
└─ save_bigtiff()             # BigTIFF saving

Optimization Points

1. Session Reuse

session = requests.Session()
# Reuse HTTP connections within the same session
response = session.get(url)

2. Error Handling

try:
    response = session.get(url, timeout=30)
    if response.status_code == 200:
        return Image.open(BytesIO(response.content))
except Exception as e:
    print(f"Failed to download tile: {e}")
    return None

3. Memory Efficiency

  • Process tile by tile to minimize memory usage
  • Create the large canvas only once

Use Cases

Digital Archives

  • High-resolution image preservation of historical documents and artworks
  • Complete restoration of map data
  • Digital preservation of cultural assets

Data Migration

  • Image data conversion during platform migration
  • Complete image acquisition for backup purposes
  • Image usage in offline environments

Summary

The restoration of Deep Zoom images involved the following technical challenges, which were successfully addressed:

  1. Handling XML namespace differences
  2. Auto-detection of actual maximum levels
  3. Parallel download of large numbers of tiles
  4. Image restoration considering overlap
  5. Large image storage in BigTIFF format

Using this method, ultra-high-resolution images of 60,000 x 30,000 pixels could be completely restored in approximately 12 minutes.

References

  • Microsoft Deep Zoom Specification
  • PIL/Pillow Documentation
  • tifffile Library Documentation
  • Python concurrent.futures

Note: The techniques described in this article should only be used on image data for which you have appropriate permissions. Please ensure compliance with copyright and licensing requirements.