Overview
I had an opportunity to create an XML file conforming to a specific schema, and needed to verify that the XML file matched the schema.
To meet this requirement, I tried the jingtrang library for working with RELAX NG schemas, so here are my notes:
https://pypi.org/project/jingtrang/
I also prepared a Google Colab notebook:
https://colab.research.google.com/github/nakamura196/ndl_ocr/blob/main/jingtrangを試す.ipynb
Trying Validation
# ライブラリのインストール
pip install jingtrang
# rngファイルのダウンロード(tei_allを使用)
wget https://raw.githubusercontent.com/nakamura196/test2021/main/tei_all.rng
# validation対象のXMLファイルの用意(校異源氏物語テキストのダウンロード)
wget https://kouigenjimonogatari.github.io/tei/01.xml
Passing Example
Running the following produced no output:
pyjing tei_all.rng 01.xml
Failing Example
On the other hand, I prepared the following XML file that does not conform to the TEI schema:
<a>bbb</a>
The result was as follows. It output that an a element is not allowed and that a TEI or teiCorpus element is expected. This demonstrates that schema conformance checking works:
pyjing tei_all.rng ng.xml
/content/ng.xml:1:4: error: element "a" not allowed here; expected element "TEI" or "teiCorpus" (with xmlns="http://www.tei-c.org/ns/1.0")
Summary
I was able to successfully perform validation.
However, since my actual need was to validate against a schema other than TEI/XML, I plan to write a separate article about how to create and configure the RNG file.