This article was co-written with a generative AI. I cross-checked the facts against primary sources where I could, but errors may remain. Please consult the primary sources before relying on it for important decisions.

What I did

IIIF 3D Viewer is a Next.js app for viewing glTF/GLB 3D models and their annotations as delivered by IIIF Manifests. Until now it ran on IIIF Presentation API 3.0 manifests with a project-local extension — a custom 3DSelector for points / polygons and a camPos field on the selector for the recommended camera state.

I aligned the viewer with the IIIF 3D Technical Specification Group draft (Presentation API 4 / temp-draft-4). Concretely:

  • The internal pipeline now treats Scene / PointSelector / WKTSelector / PerspectiveCamera (the shapes used by the TSG examples) as the canonical v4 form
  • Legacy Presentation 3 + custom-extension manifests are converted to v4 in-memory at the fetch boundary
  • The bundled manifests/*.json samples are rewritten in v4 form

Presentation API 4 is still a draft, but matching the structure used by the TSG examples (e.g. 9_commenting_annotations) is a small step toward future standardisation and interoperability with other viewers.

Legacy form vs. v4 draft

For reference, here is the diff between the two shapes.

Manifest skeleton

AspectLegacy (P3 + custom)v4 / 3D TSG
@contexthttp://iiif.io/api/presentation/3/context.jsonhttp://iiif.io/api/presentation/4/context.json
Top-level childManifestitemsCanvasManifestitemsScene
Model placementpainting Annotation attached to Canvas (no spatial position)painting Annotation + PointSelector on the Scene
Cameraselector.camPos inside the annotationstandalone PerspectiveCamera Annotation
Polygoncustom area: [x,y,z,...] flat arrayWKTSelector with POLYGON Z((...))
Georeferencingcustom GeoJSON-T (motivation: "georeferencing")not in v4 (kept as project extension)

Commenting annotation (point)

Legacy:

{
  "motivation": "commenting",
  "body": { "value": "<p></p>", "label": "North Sea", "type": "TextualBody" },
  "target": {
    "source": ".../canvas-p1",
    "selector": {
      "type": "3DSelector",
      "value": [-0.2557, 0.7615, -0.5854],
      "camPos": [-0.378, 0.874, -0.911]
    }
  }
}

v4:

{
  "motivation": ["commenting"],
  "bodyValue": "Right pterygoid hamulus",
  "target": {
    "type": "SpecificResource",
    "source": [{ "id": ".../scene1", "type": "Scene" }],
    "selector": [{ "type": "PointSelector", "x": 0.040, "y": 0.063, "z": -0.066 }]
  }
}

The visible differences: motivation appears as an array; target is a SpecificResource; both source and selector are arrays; and PointSelector carries its position as object properties {x, y, z} rather than a tuple value: [x, y, z]. The array form for motivation isn’t explicitly required by the v4 prose — it’s the convention used consistently across the TSG examples.

Polygon

The legacy form had an ad-hoc area: [x,y,z,...] flat list of vertex coordinates. v4 expresses polygons as Well-Known Text strings. The naming is not yet settled: the temp-draft-4 prose defines this selector as PolygonZSelector, while the TSG’s example manifests (e.g. whale_comment_point_polygon.json) use WKTSelector. The viewer follows the example form and uses WKTSelector.

"selector": [
  {
    "type": "WKTSelector",
    "value": "POLYGON Z ((0 0.18 -0.23, -0.03 0.16 -0.23, -0.015 0.12 -0.23, 0.006 0.12 -0.23, 0.027 0.16 -0.23))"
  }
]

Camera

Previously the recommended camera position for an annotation lived in the same selector object as camPos. In v4 it becomes a separate Annotation whose body references a PerspectiveCamera and whose target is a PointSelector on the Scene.

{
  "motivation": ["painting"],
  "body": {
    "type": "SpecificResource",
    "source": [{ "type": "PerspectiveCamera" }]
  },
  "target": {
    "type": "SpecificResource",
    "source": [{ "id": ".../scene1", "type": "Scene" }],
    "selector": [{ "type": "PointSelector", "x": -0.378, "y": 0.874, "z": -0.911 }]
  }
}

The PerspectiveCamera itself carries properties like fieldOfView / near / far, and the surrounding SpecificResource (the annotation body) can hold a sibling transform array (e.g. RotateTransform) next to source, per the draft examples. For now this viewer emits a minimal PerspectiveCamera and ties it back to the originating commenting annotation by id.

How the annotation and the camera are linked

In the legacy form one selector carried both value (a point on the model surface) and camPos (the recommended camera position), so the pairing was implicit. In v4 they live in separate annotations, so the linkage has to be made explicit.

This viewer uses the following convention:

Scene "scene-p1"
├─ AnnotationPage "page/comments"
│   └─ Annotation  id: "anno-1"               motivation: ["commenting"]
│        body:    TextualBody { label: "North Sea" }
│        target:  PointSelector { x, y, z }   ← point on the model surface
│        cameraAnnotation: "anno-1/camera" ─┐
│                                            │ id reference
└─ AnnotationPage "page/cameras"             │
    └─ Annotation  id: "anno-1/camera" ←─────┘ motivation: ["painting"]
         body:    SpecificResource → { type: "PerspectiveCamera" }
         target:  PointSelector { x, y, z }   ← recommended camera position
  • The commenting annotation carries the marker position (the point on the model surface) in its target.selector
  • The matching camera position is the target.selector of a separate annotation
  • The two are linked by a custom cameraAnnotation property and by the ${commentingId}/camera naming convention
  • The parser first indexes the PerspectiveCamera annotations as id → [x, y, z], then looks each commenting annotation’s cameraAnnotation (or fallback ${id}/camera) up against that index

cameraAnnotation is not a v4 spec term — it is a property local to this viewer. As the draft evolves a more standard linkage (mutual source references, refinedBy, etc.) could replace it.

Migration approach

I wanted the viewer code to deal with v4 only, while still reading legacy manifests produced by the existing editor. So the conversion happens at the fetch boundary.

fetchManifest(url)
  → convertToV4(raw)        // P3 + custom → v4 (in-memory)
  → parseManifestV4(v4)     // v4 → internal Annotation[] / GeoFeature[]
  → React components

The internal Annotation shape is unchanged — only the IIIF ↔ internal boundary moves to v4.

File layout

New:

  • src/types/iiif.d.ts — TypeScript types for v4 (ManifestV4, SceneV4, PointSelectorV4, WKTSelectorV4, SpecificResourceV4, AnnotationV4, …)
  • src/lib/services/manifestConverter.tsconvertToV4(unknown): ManifestV4
  • src/lib/services/manifestParser.tsparseManifestV4(ManifestV4) returning { modelUrl, annotations, geoFeatures }

Edited:

  • ViewerContent.tsx / GeoRefContent.tsx — replace the in-line manifest parsing with convertToV4 → parseManifestV4
  • Annotations.tsx — marker selection now branches on 'PointSelector' instead of '3DSelector'
  • manifestAtom — the type changes from Manifest (@iiif/presentation-3) to ManifestV4
  • public/manifests/sample-manifest*.json — rewritten in v4 form

Inside the converter

The converter takes a manifest of unknown shape and returns a v4 plain object. If @context already points to Presentation 4 it is returned untouched; otherwise it walks the canvases, rewrites selectors, and lifts camPos into separate PerspectiveCamera annotations.

The commenting annotation case looks like this (excerpt; the real code also passes through bodyValue and seeAlso on the source annotation, omitted here for brevity):

const convertCommentingAnnotation = (
  anno: LegacyAnnotation,
  sceneId: string,
): { annotation: AnnotationV4; camera: AnnotationV4 | null } => {
  const target = (anno.target ?? {}) as LegacyTargetObject;
  const selector: LegacySelector = (typeof target === 'object' && target.selector) || {};
  const value = asTriple(selector.value);
  const camPos = asTriple(selector.camPos);
  const area = selector.area && selector.area.length >= 9 ? selector.area : null;

  const v4Selector = area
    ? { type: 'WKTSelector' as const, value: buildWktPolygon(area) }
    : value
      ? buildPointSelector(value)
      : null;

  const annotationId =
    anno.id ?? `${sceneId}/anno/${Math.random().toString(36).slice(2, 10)}`;

  const v4Target: SpecificResourceV4 = {
    type: 'SpecificResource',
    source: buildSceneSource(sceneId),
    ...(v4Selector ? { selector: [v4Selector] } : {}),
  };

  const annotation: AnnotationV4 = {
    id: annotationId,
    type: 'Annotation',
    motivation: normalizeMotivation(anno.motivation).length
      ? normalizeMotivation(anno.motivation)
      : ['commenting'],
    target: v4Target,
    ...(anno.body !== undefined ? { body: anno.body as AnnotationV4['body'] } : {}),
    ...(camPos ? { cameraAnnotation: `${annotationId}/camera` } : {}),
  };

  if (!camPos) return { annotation, camera: null };

  const camera: AnnotationV4 = {
    id: `${annotationId}/camera`,
    type: 'Annotation',
    motivation: ['painting'],
    body: {
      type: 'SpecificResource',
      source: [{ type: 'PerspectiveCamera' }],
    },
    target: {
      type: 'SpecificResource',
      source: buildSceneSource(sceneId),
      selector: [buildPointSelector(camPos)],
    },
  };

  return { annotation, camera };
};

A few notes:

  • The legacy area is a flat list of 3D vertex coordinates, so it gets stitched into a POLYGON Z((x y z, x y z, ...)) WKT string
  • camPos is hoisted into a standalone Annotation. The originating annotation keeps a cameraAnnotation property pointing at the camera annotation by id, using a ${annotationId}/camera naming convention
  • motivation is normalised to an array, and target is rebuilt as a SpecificResource with array-valued source / selector

georeferencing annotations (with a GeoJSON-T body) have no v4 standard counterpart, so the body is preserved as-is and only the target is rewritten in v4 form.

A smoke test on a minimal legacy manifest confirms the rewrite pattern:

{
  "@context": "http://iiif.io/api/presentation/4/context.json",
  "items": [{
    "type": "Scene",
    "items": [{ "type": "AnnotationPage", "items": [/* model painting */] }],
    "annotations": [
      { "type": "AnnotationPage", "items": [/* commenting */] },
      { "type": "AnnotationPage", "items": [/* PerspectiveCamera */] }
    ]
  }]
}

Canvas becomes Scene, 3DSelector becomes PointSelector, and camPos is moved out into its own AnnotationPage as expected.

Inside the parser

Going the other way — v4 manifest into the viewer’s internal structures — the parser pulls the model URL from the painting / Model body in Scene.items, then flattens Scene.annotations and routes each entry into one of three buckets: commenting, georeferencing, or PerspectiveCamera.

PerspectiveCamera annotations are indexed up front as Map<string, [x,y,z]>, so each commenting annotation can look up its camera by cameraAnnotation (or, as a fallback, the ${id}/camera naming convention). For WKTSelector the parser does a forgiving read of POLYGON Z((...)) and stores the vertices in the internal area flat array, with position set to the centroid for use as the marker anchor.

const parseWktPolygonZ = (wkt: string): number[] => {
  const open = wkt.indexOf('((');
  const close = wkt.indexOf('))', open);
  if (open < 0 || close < 0) return [];
  const ring = wkt.slice(open + 2, close);
  const out: number[] = [];
  for (const raw of ring.split(',')) {
    const [x, y, z] = raw.trim().split(/\s+/).map(Number);
    if ([x, y, z].every((n) => Number.isFinite(n))) out.push(x, y, z);
  }
  return out;
};

It is a deliberately sloppy WKT parser, sufficient to round-trip the strings the converter emits.

Sample manifest

public/manifests/sample-manifest-with-annotations.json is now v4. The georeferencing block keeps its GeoJSON-T body (still a project extension) but its target is rebuilt as a v4 SpecificResource. Commenting annotations use motivation: ["commenting"], a PointSelector { x, y, z } selector, and a cameraAnnotation reference into a separate PerspectiveCamera annotation.

There are two ways to write the body for a commenting annotation: the TSG examples shown earlier use a plain bodyValue: "..." string, while the sample manifest uses Web Annotation’s body: { type: "TextualBody", value, label } so that the existing UI can keep separating the HTML value from the label. The parser accepts both shapes.

Open questions

  • Presentation API 4 is still temp-draft-4, and the TSG’s commenting examples (9_commenting_annotations) lean on plain-text bodyValue. The body shape and camera-annotation conventions may shift before the spec is final
  • v4 doesn’t (yet) cover Linked Data enrichment of annotation bodies (e.g. URIs to Wikidata items) or 3D extensions of the IIIF Georeference Extension, so those remain project-local for now

References