Overview

I had an opportunity to create an XML file conforming to a specific schema, and needed to verify that the XML file matched the schema.

To meet this requirement, I tried the jingtrang library for working with RELAX NG schemas, so here are my notes:

https://pypi.org/project/jingtrang/

I also prepared a Google Colab notebook:

https://colab.research.google.com/github/nakamura196/ndl_ocr/blob/main/jingtrangを試す.ipynb

Trying Validation

# ライブラリのインストール
pip install jingtrang

# rngファイルのダウンロード(tei_allを使用)
wget https://raw.githubusercontent.com/nakamura196/test2021/main/tei_all.rng

# validation対象のXMLファイルの用意(校異源氏物語テキストのダウンロード)
wget https://kouigenjimonogatari.github.io/tei/01.xml

Passing Example

Running the following produced no output:

pyjing tei_all.rng 01.xml

Failing Example

On the other hand, I prepared the following XML file that does not conform to the TEI schema:

<a>bbb</a>

The result was as follows. It output that an a element is not allowed and that a TEI or teiCorpus element is expected. This demonstrates that schema conformance checking works:

pyjing tei_all.rng ng.xml
/content/ng.xml:1:4: error: element "a" not allowed here; expected element "TEI" or "teiCorpus" (with xmlns="http://www.tei-c.org/ns/1.0")

Summary

I was able to successfully perform validation.

However, since my actual need was to validate against a schema other than TEI/XML, I plan to write a separate article about how to create and configure the RNG file.