
Overview
JPCOAR Schema publishes XML Schema Definitions in the following repository. Thank you for creating the schema and making the data available.
https://github.com/JPCOAR/schema
This article is a memo of trying XML file validation using the above schema. (Since this is my first time doing this kind of validation, it may contain inaccurate terminology or information. I apologize.)
A Google Colab notebook is also prepared.
Preparation
Clone the repository
cd /content/
git clone https://github.com/JPCOAR/schema.git
Install the library
pip install xsd-validator
Load the XSD file (v1)
from xsd_validator import XsdValidator
validator = XsdValidator('/content/schema/1.0/jpcoar_scm.xsd')
Trying v1
OK Example
<?xml version="1.0" ?>
<jpcoar:jpcoar
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:jpcoar="https://github.com/JPCOAR/schema/blob/master/1.0/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://github.com/JPCOAR/schema/blob/master/1.0/jpcoar_scm.xsd">
<dc:title>JPCOARスキーマを用いたxmlファイルのバリデーション</dc:title>
<dc:type rdf:resource="http://purl.org/coar/resource_type/c_6501">article</dc:type>
</jpcoar:jpcoar>
validator.assert_valid("/content/ok.xml")
# No errors
NG Example
Error from placing jpcoar:subject after dc:type?
<?xml version="1.0" ?>
<jpcoar:jpcoar
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:jpcoar="https://github.com/JPCOAR/schema/blob/master/1.0/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://github.com/JPCOAR/schema/blob/master/1.0/jpcoar_scm.xsd">
<dc:title>JPCOARスキーマを用いたxmlファイルのバリデーション</dc:title>
<dc:type rdf:resource="http://purl.org/coar/resource_type/c_6501">article</dc:type>
<jpcoar:subject subjectScheme="Other">テスト</jpcoar:subject>
</jpcoar:jpcoar>
validator.assert_valid("/content/ng.xml")
XsdValidationErrorWithInfo: /content/ng.xml: line 9 column 41: cvc-complex-type.2.4.a: Invalid content was found starting with element ‘{"https://github.com/JPCOAR/schema/blob/master/1.0/":subject}’. One of ‘{"https://schema.datacite.org/meta/kernel-4/":version, “http://namespace.openaire.eu/schema/oaire/":version, “https://github.com/JPCOAR/schema/blob/master/1.0/":identifier, “https://github.com/JPCOAR/schema/blob/master/1.0/":identifierRegistration, “https://github.com/JPCOAR/schema/blob/master/1.0/":relation, “http://purl.org/dc/terms/":temporal, “https://schema.datacite.org/meta/kernel-4/":geoLocation, “https://github.com/JPCOAR/schema/blob/master/1.0/":fundingReference, “https://github.com/JPCOAR/schema/blob/master/1.0/":sourceIdentifier, “https://github.com/JPCOAR/schema/blob/master/1.0/":sourceTitle, “https://github.com/JPCOAR/schema/blob/master/1.0/":volume, “https://github.com/JPCOAR/schema/blob/master/1.0/":issue, “https://github.com/JPCOAR/schema/blob/master/1.0/":numPages, “https://github.com/JPCOAR/schema/blob/master/1.0/":pageStart, “https://github.com/JPCOAR/schema/blob/master/1.0/":pageEnd, “http://ndl.go.jp/dcndl/terms/":dissertationNumber, “http://ndl.go.jp/dcndl/terms/":degreeName, “http://ndl.go.jp/dcndl/terms/":dateGranted, “https://github.com/JPCOAR/schema/blob/master/1.0/":degreeGrantor, “https://github.com/JPCOAR/schema/blob/master/1.0/":conference, “https://github.com/JPCOAR/schema/blob/master/1.0/":file}’ is expected.
Fix
Try placing dc:type after jpcoar:subject
<?xml version="1.0" ?>
<jpcoar:jpcoar
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:jpcoar="https://github.com/JPCOAR/schema/blob/master/1.0/"
xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="https://github.com/JPCOAR/schema/blob/master/1.0/jpcoar_scm.xsd">
<dc:title>JPCOARスキーマを用いたxmlファイルのバリデーション</dc:title>
<jpcoar:subject subjectScheme="Other">テスト</jpcoar:subject>
<dc:type rdf:resource="http://purl.org/coar/resource_type/c_6501">article</dc:type>
</jpcoar:jpcoar>
validator.assert_valid("/content/fix.xml")
# No errors
Summary
Based on the error messages, we were able to fix the XML file.
The Google Colab notebook also includes validation examples targeting JPCOAR Schema Version 2.0.
There may be some inaccurate content, but I hope this serves as a helpful reference for XML file validation.