
Introduction
Hypothes.is is an open-source annotation tool that allows you to add highlights and comments on web pages. It can be easily used through browser extensions or JavaScript embedding, but there are cases where you may want to back up accumulated annotations or utilize them in other formats such as TEI/XML.
This article introduces how to export annotations using the Hypothes.is API and convert them to TEI/XML.
Obtaining an API Key
- Log in to Hypothes.is
- Go to Developer settings
- Generate an API key with “Generate your API token”
Save the obtained key in a .env file.
cp .env.example .env
# Edit .env to set the API key
HYPOTHESIS_API_KEY=xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Exporting Annotations
API Basics
The base URL for the Hypothes.is API is https://api.hypothes.is/api. Authentication is done via the Authorization: Bearer <API_KEY> header.
Key endpoints:
| Endpoint | Purpose |
|---|---|
GET /api/profile | Get authenticated user’s profile |
GET /api/search | Search annotations |
GET /api/annotations/{id} | Get individual annotation |
Script
The export through TEI/XML conversion is consolidated in a single script hypothes_export.py.
https://github.com/nakamura196/hypothes-export/blob/main/hypothes_export.py
Below, the main processing is excerpted and explained.
Loading .env and API Calls
def load_env():
env_path = Path(__file__).parent / ".env"
with open(env_path) as f:
for line in f:
line = line.strip()
if line and not line.startswith("#") and "=" in line:
k, v = line.split("=", 1)
os.environ[k.strip()] = v.strip()
def api_get(endpoint, params=None):
api_key = os.environ["HYPOTHESIS_API_KEY"]
url = f"https://api.hypothes.is/api/{endpoint}"
if params:
url += "?" + urllib.parse.urlencode(params)
req = urllib.request.Request(url)
req.add_header("Authorization", f"Bearer {api_key}")
with urllib.request.urlopen(req) as resp:
return json.loads(resp.read().decode())
Fetching All Annotations (with Pagination)
The Search API returns a maximum of 200 results per request, so all annotations are fetched by incrementing the offset.
def fetch_all_annotations():
profile = api_get("profile")
user = profile["userid"]
all_annotations = []
limit = 200
offset = 0
result = api_get("search", {"user": user, "limit": limit, "offset": 0})
total = result["total"]
all_annotations.extend(result["rows"])
offset += limit
while offset < total:
result = api_get("search", {"user": user, "limit": limit, "offset": offset})
all_annotations.extend(result["rows"])
offset += limit
return all_annotations
Execution
# Output JSON + TEI/XML
python hypothes_export.py
# Output JSON only
python hypothes_export.py --json-only
# Convert from existing JSON to TEI/XML only
python hypothes_export.py --tei-only
User: acct:your_username@hypothes.is
Total: 6 annotations
Saved JSON: output/annotations.json (6 annotations)
Saved TEI/XML: output/annotations.xml
Annotation Data Structure
Each annotation in the exported JSON has a structure based on the W3C Web Annotation Data Model.
{
"id": "a1lBUhPdEfG-Lk8iV7GT3w",
"created": "2026-02-27T13:08:33.427772+00:00",
"user": "acct:your_username@hypothes.is",
"uri": "https://example.com/page",
"text": "Is this correct?",
"tags": ["memo"],
"target": [
{
"source": "https://example.com/page",
"selector": [
{
"type": "RangeSelector",
"startContainer": "/main[1]/div[1]/p[1]",
"startOffset": 335,
"endContainer": "/main[1]/div[1]/p[1]/span[4]",
"endOffset": 0
},
{
"type": "TextPositionSelector",
"start": 1663,
"end": 1667
},
{
"type": "TextQuoteSelector",
"exact": "此詩乃是",
"prefix": "人樂太平無事日 鶯花無限日高眠 \n ",
"suffix": "宋太祖朝中一個名儒姓邵諱尭堯夫道號康節先生所作為"
}
]
}
]
}
Three Types of Selectors
Hypothes.is records the text position of annotation targets using three types of selectors.
| Selector | Mechanism | Robustness |
|---|---|---|
| RangeSelector | Specifies position using XPath on the DOM | Fair - Vulnerable to HTML structure changes |
| TextPositionSelector | Specifies by character offset position | Fair - Shifts with text additions/deletions |
| TextQuoteSelector | Specifies by target text + surrounding context | Excellent - Can re-anchor via fuzzy match |
When the source document changes, Hypothes.is attempts these selectors as fallbacks in sequence. TextQuoteSelector performs fuzzy matching including prefix/suffix, making it the most robust, but if the target text itself is deleted or significantly modified, the annotation becomes “orphaned.”
Conversion to TEI/XML
The exported JSON is converted to TEI/XML format.
Mapping Strategy
| Hypothes.is | TEI/XML |
|---|---|
| Target document (URI, title) | <sourceDesc><bibl> |
| Group by document | <div> |
| Each annotation | <ab> |
Highlighted text (TextQuoteSelector.exact) | <quote> |
| Comment body | <note type="annotation"> |
| Tags | <note type="tag"> |
Conversion Logic
Quote text is extracted from TextQuoteSelector and mapped to TEI elements.
def get_text_quote(annotation):
"""Get exact/prefix/suffix from TextQuoteSelector"""
for target in annotation.get("target", []):
for sel in target.get("selector", []):
if sel.get("type") == "TextQuoteSelector":
return sel
return None
Annotations are grouped by URI and output in the structure <div> -> <ab> -> <quote> / <note>. See the source code for details.
Output Example
<?xml version="1.0" encoding="UTF-8"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0">
<teiHeader>
<fileDesc>
<titleStmt>
<title>Hypothes.is Annotations Export</title>
</titleStmt>
<publicationStmt>
<p>Exported from Hypothes.is API</p>
</publicationStmt>
<sourceDesc>
<bibl xml:id="src-0">
<title>巻首題:新刻全像水滸傳</title>
<ref target="https://example.com/page">https://example.com/page</ref>
</bibl>
</sourceDesc>
</fileDesc>
</teiHeader>
<text>
<body>
<div corresp="#src-0">
<head>巻首題:新刻全像水滸傳</head>
<ab xml:id="ann-a1lBUhPdEfG">
<quote>此詩乃是</quote>
<note type="annotation"
corresp="https://hypothes.is/a/a1lBUhPdEfG"
when="2026-02-27T13:08:33.427772+00:00">
Is this correct?
</note>
</ab>
</div>
</body>
</text>
</TEI>
Source Document Changes and Annotation Consistency
Hypothes.is annotations use a “standoff annotation” approach, stored separately from the source document. Therefore, when the source document changes, annotation positions may shift.
- Minor changes: Often re-anchored via
TextQuoteSelectorfuzzy matching - Major changes: Annotations become “orphaned” and are no longer linked to their target locations
By exporting to TEI/XML, the highlighted target text is recorded in <quote> elements, so the correspondence with the source document is at least preserved as a record.
Summary
- The Hypothes.is API allows programmatic retrieval of your annotations
TextQuoteSelector’sexact/prefix/suffixare most important for identifying annotation target text- Converting to TEI/XML enables storage and utilization in a format widely used in humanities research
- However, be aware of anchoring shifts due to source document changes
The source code is published on GitHub.