Overview

This is a memo on creating a DTS (Distributed Text Services) API for TEI/XML files published by the Koui Genji Monogatari Text DB.

Background

The Koui Genji Monogatari Text DB is available at:

https://kouigenjimonogatari.github.io/

It publishes TEI/XML files.

Developed DTS

The developed DTS is available at:

https://dts-typescript.vercel.app/api/dts

It is built with Express.js deployed on Vercel.

For more information about DTS, please refer to:

https://zenn.dev/nakamura196/articles/4233fe80b3e76d

MyCapytain Library

The following article introduced a library for using DTS from Python:

https://zenn.dev/nakamura196/articles/1f52f460025274

Let’s try using the developed DTS with this library.

Create the resolver

With the following line we create the resolver :

from MyCapytain.resolvers.dts.api_v1 import HttpDtsResolver

resolver = HttpDtsResolver("https://dts-typescript.vercel.app/api/dts")

Require metadata : let’s visit the catalog

The following code is gonna find each text that is readable by Alpheios

# We get the root collection
root = resolver.getMetadata()
# Then we retrieve dynamically all the readableDescendants : it browse automatically the API until
# it does not have seen any missing texts: be careful with this one on huge repositories
readable_collections = root.readableDescendants
print("We found %s collections that can be parsed" % len(readable_collections))
We found 54 collections that can be parsed

Printing the full tree

# Note that we could also see and make a tree of the catalog.
# If you are not familiar with recursivity, the next lines might be a bit complicated
def show_tree(collection, char_number=1):
    for subcollection_id, subcollection in collection.children.items():
        print(char_number*"--" + " " + subcollection.id)
        show_tree(subcollection, char_number+1)

print(root.id)
show_tree(root)
default
-- urn:kouigenjimonogatari
---- urn:kouigenjimonogatari.1
---- urn:kouigenjimonogatari.2
---- urn:kouigenjimonogatari.3
---- urn:kouigenjimonogatari.4
---- urn:kouigenjimonogatari.5
---- urn:kouigenjimonogatari.6
---- urn:kouigenjimonogatari.7
---- urn:kouigenjimonogatari.8
---- urn:kouigenjimonogatari.9
---- urn:kouigenjimonogatari.10
---- urn:kouigenjimonogatari.11
---- urn:kouigenjimonogatari.12
---- urn:kouigenjimonogatari.13
---- urn:kouigenjimonogatari.14
---- urn:kouigenjimonogatari.15
---- urn:kouigenjimonogatari.16
---- urn:kouigenjimonogatari.17
---- urn:kouigenjimonogatari.18
---- urn:kouigenjimonogatari.19
---- urn:kouigenjimonogatari.20
---- urn:kouigenjimonogatari.21
---- urn:kouigenjimonogatari.22
---- urn:kouigenjimonogatari.23
---- urn:kouigenjimonogatari.24
---- urn:kouigenjimonogatari.25
---- urn:kouigenjimonogatari.26
---- urn:kouigenjimonogatari.27
---- urn:kouigenjimonogatari.28
---- urn:kouigenjimonogatari.29
---- urn:kouigenjimonogatari.30
---- urn:kouigenjimonogatari.31
---- urn:kouigenjimonogatari.32
---- urn:kouigenjimonogatari.33
---- urn:kouigenjimonogatari.34
---- urn:kouigenjimonogatari.35
---- urn:kouigenjimonogatari.36
---- urn:kouigenjimonogatari.37
---- urn:kouigenjimonogatari.38
---- urn:kouigenjimonogatari.39
---- urn:kouigenjimonogatari.40
---- urn:kouigenjimonogatari.41
---- urn:kouigenjimonogatari.42
---- urn:kouigenjimonogatari.43
---- urn:kouigenjimonogatari.44
---- urn:kouigenjimonogatari.45
---- urn:kouigenjimonogatari.46
---- urn:kouigenjimonogatari.47
---- urn:kouigenjimonogatari.48
---- urn:kouigenjimonogatari.49
---- urn:kouigenjimonogatari.50
---- urn:kouigenjimonogatari.51
---- urn:kouigenjimonogatari.52
---- urn:kouigenjimonogatari.53
---- urn:kouigenjimonogatari.54

Printing details about a specific one

# Let's get a random one !
from random import randint
# The index needs to be between 0 and the number of collections
rand_index = randint(0, len(readable_collections))
collection = readable_collections[rand_index]

# Now let's print information ?
label = collection.get_label()

text_id = collection.id
print("Treaing `"+label+"` with id " + text_id)
Treaing `総角` with id urn:kouigenjimonogatari.47

What about more detailed informations ? Like the citation scheme ?

def recursive_printing_citation_scheme(citation, char_number=1):
    for subcitation in citation.children:
        print(char_number*"--" + " " + subcitation.name)
        recursive_printing_citation_scheme(subcitation, char_number+1)

print("Maximum citation depth : ", collection.citation.depth)
print("Citation System")
recursive_printing_citation_scheme(collection.citation)
Maximum citation depth :  1
Citation System
-- line

Let’s get some references !

reffs = resolver.getReffs(collection.id)
print(reffs)
# Nice !
DtsReferenceSet (DtsReference https://w3id.org/kouigenjimonogatari/api/items/1587-01.json> [line]>, DtsReference https://w3id.org/kouigenjimonogatari/api/items/1587-02.json> [line]>, DtsReference https://w3id.org/kouigenjimonogatari/api/items/1587-03.json> [line]>, DtsReference https://w3id.org/kouigenjimonogatari/api/items/1587-04.json> [line]>, DtsReference ...

Let’s get some random passage !

# Let's get a random one !
from random import randint
# The index needs to be between 0 and the number of collections
rand_index = randint(0, len(reffs)-1)
reff = reffs[rand_index]

passage = resolver.getTextualNode(collection.id, reff)
print(passage.id, passage.reference)

# Let's see the XML here
# For that, we need to get the mimetype right :
from MyCapytain.common.constants import Mimetypes
print(passage.export(Mimetypes.XML.TEI))
urn:kouigenjimonogatari.47 DtsReference https://w3id.org/kouigenjimonogatari/api/items/1640-06.json> [line]>
TEI xmlns="http://www.tei-c.org/ns/1.0">dts:fragment xmlns:dts="https://w3id.org/dts/api#">
...

Discussion

As shown above, we were able to build a DTS that supports the basic operations of the MyCapytain library.

While the examples above used Python, the DTS can also be accessed from a browser. For example, the following URL retrieves the first line of Kiritsubo:

https://dts-typescript.vercel.app/api/dts/document?id=urn:kouigenjimonogatari.1&ref=https://w3id.org/kouigenjimonogatari/api/items/0005-01.json

Notes

This DTS was developed with reference to the following API:

https://texts.alpheios.net/api/dts

However, it has not been confirmed whether the above API conforms to the latest guidelines:

https://distributed-text-services.github.io/specifications/

Therefore, please note that the DTS API developed here may also have areas that do not conform to the above guidelines.

Summary

We hope this serves as a useful reference for understanding DTS.