Exporting Web Annotations via the Hypothes.is API and Converting to TEI/XML

Introduction

Hypothes.is is an open-source annotation tool that allows you to add highlights and comments on web pages. It can be easily used through browser extensions or JavaScript embedding, but there are cases where you may want to back up accumulated annotations or utilize them in other formats such as TEI/XML.

This article introduces how to export annotations using the Hypothes.is API and convert them to TEI/XML.

Obtaining an API Key

Log in to Hypothes.is
Go to Developer settings
Generate an API key with “Generate your API token”

Save the obtained key in a .env file.

cp .env.example .env
# Edit .env to set the API key

HYPOTHESIS_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Exporting Annotations

API Basics

The base URL for the Hypothes.is API is https://api.hypothes.is/api. Authentication is done via the Authorization: Bearer <API_KEY> header.

Key endpoints:

Endpoint	Purpose
`GET /api/profile`	Get authenticated user’s profile
`GET /api/search`	Search annotations
`GET /api/annotations/{id}`	Get individual annotation

Script

The export through TEI/XML conversion is consolidated in a single script hypothes_export.py.

https://github.com/nakamura196/hypothes-export/blob/main/hypothes_export.py

Below, the main processing is excerpted and explained.

Loading .env and API Calls

def load_env():
    env_path = Path(__file__).parent / ".env"
    with open(env_path) as f:
        for line in f:
            line = line.strip()
            if line and not line.startswith("#") and "=" in line:
                k, v = line.split("=", 1)
                os.environ[k.strip()] = v.strip()


def api_get(endpoint, params=None):
    api_key = os.environ["HYPOTHESIS_API_KEY"]
    url = f"https://api.hypothes.is/api/{endpoint}"
    if params:
        url += "?" + urllib.parse.urlencode(params)
    req = urllib.request.Request(url)
    req.add_header("Authorization", f"Bearer {api_key}")
    with urllib.request.urlopen(req) as resp:
        return json.loads(resp.read().decode())

Fetching All Annotations (with Pagination)

The Search API returns a maximum of 200 results per request, so all annotations are fetched by incrementing the offset.

def fetch_all_annotations():
    profile = api_get("profile")
    user = profile["userid"]

    all_annotations = []
    limit = 200
    offset = 0

    result = api_get("search", {"user": user, "limit": limit, "offset": 0})
    total = result["total"]
    all_annotations.extend(result["rows"])
    offset += limit

    while offset < total:
        result = api_get("search", {"user": user, "limit": limit, "offset": offset})
        all_annotations.extend(result["rows"])
        offset += limit

    return all_annotations

Execution

# Output JSON + TEI/XML
python hypothes_export.py

# Output JSON only
python hypothes_export.py --json-only

# Convert from existing JSON to TEI/XML only
python hypothes_export.py --tei-only

User: acct:your_username@hypothes.is
Total: 6 annotations
Saved JSON: output/annotations.json (6 annotations)
Saved TEI/XML: output/annotations.xml

Annotation Data Structure

Each annotation in the exported JSON has a structure based on the W3C Web Annotation Data Model.

{
  "id": "a1lBUhPdEfG-Lk8iV7GT3w",
  "created": "2026-02-27T13:08:33.427772+00:00",
  "user": "acct:your_username@hypothes.is",
  "uri": "https://example.com/page",
  "text": "Is this correct?",
  "tags": ["memo"],
  "target": [
    {
      "source": "https://example.com/page",
      "selector": [
        {
          "type": "RangeSelector",
          "startContainer": "/main[1]/div[1]/p[1]",
          "startOffset": 335,
          "endContainer": "/main[1]/div[1]/p[1]/span[4]",
          "endOffset": 0
        },
        {
          "type": "TextPositionSelector",
          "start": 1663,
          "end": 1667
        },
        {
          "type": "TextQuoteSelector",
          "exact": "此詩乃是",
          "prefix": "人樂太平無事日　　　鶯花無限日高眠 \n　　　　　　　　",
          "suffix": "宋太祖朝中一個名儒姓邵諱尭堯夫道號康節先生所作為"
        }
      ]
    }
  ]
}

Three Types of Selectors

Hypothes.is records the text position of annotation targets using three types of selectors.

Selector	Mechanism	Robustness
RangeSelector	Specifies position using XPath on the DOM	Fair - Vulnerable to HTML structure changes
TextPositionSelector	Specifies by character offset position	Fair - Shifts with text additions/deletions
TextQuoteSelector	Specifies by target text + surrounding context	Excellent - Can re-anchor via fuzzy match

When the source document changes, Hypothes.is attempts these selectors as fallbacks in sequence. TextQuoteSelector performs fuzzy matching including prefix/suffix, making it the most robust, but if the target text itself is deleted or significantly modified, the annotation becomes “orphaned.”

Conversion to TEI/XML

The exported JSON is converted to TEI/XML format.

Mapping Strategy

Hypothes.is	TEI/XML
Target document (URI, title)	`<sourceDesc><bibl>`
Group by document	`<div>`
Each annotation	`<ab>`
Highlighted text (`TextQuoteSelector.exact`)	`<quote>`
Comment body	`<note type="annotation">`
Tags	`<note type="tag">`

Conversion Logic

Quote text is extracted from TextQuoteSelector and mapped to TEI elements.

def get_text_quote(annotation):
    """Get exact/prefix/suffix from TextQuoteSelector"""
    for target in annotation.get("target", []):
        for sel in target.get("selector", []):
            if sel.get("type") == "TextQuoteSelector":
                return sel
    return None

Annotations are grouped by URI and output in the structure <div> -> <ab> -> <quote> / <note>. See the source code for details.

Output Example

<?xml version="1.0" encoding="UTF-8"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
  <teiHeader>
    <fileDesc>
      <titleStmt>
        <title>Hypothes.is Annotations Export</title>
      </titleStmt>
      <publicationStmt>
        <p>Exported from Hypothes.is API</p>
      </publicationStmt>
      <sourceDesc>
        <bibl xml:id="src-0">
          <title>巻首題：新刻全像水滸傳</title>
          <ref target="https://example.com/page">https://example.com/page</ref>
        </bibl>
      </sourceDesc>
    </fileDesc>
  </teiHeader>
  <text>
    <body>
      <div corresp="#src-0">
        <head>巻首題：新刻全像水滸傳</head>
        <ab xml:id="ann-a1lBUhPdEfG">
          <quote>此詩乃是</quote>
          <note type="annotation"
                corresp="https://hypothes.is/a/a1lBUhPdEfG"
                when="2026-02-27T13:08:33.427772+00:00">
            Is this correct?
          </note>
        </ab>
      </div>
    </body>
  </text>
</TEI>

Source Document Changes and Annotation Consistency

Hypothes.is annotations use a “standoff annotation” approach, stored separately from the source document. Therefore, when the source document changes, annotation positions may shift.

Minor changes: Often re-anchored via TextQuoteSelector fuzzy matching
Major changes: Annotations become “orphaned” and are no longer linked to their target locations

By exporting to TEI/XML, the highlighted target text is recorded in <quote> elements, so the correspondence with the source document is at least preserved as a record.

Summary

The Hypothes.is API allows programmatic retrieval of your annotations
TextQuoteSelector’s exact/prefix/suffix are most important for identifying annotation target text
Converting to TEI/XML enables storage and utilization in a format widely used in humanities research
However, be aware of anchoring shifts due to source document changes

The source code is published on GitHub.

Introduction#

Obtaining an API Key#

Exporting Annotations#

API Basics#

Script#

Loading .env and API Calls#

Fetching All Annotations (with Pagination)#

Execution#

Annotation Data Structure#

Three Types of Selectors#

Conversion to TEI/XML#

Mapping Strategy#

Conversion Logic#

Output Example#

Source Document Changes and Annotation Consistency#

Summary#