Overview
ShEx is described on Wikipedia as:
Shape Expressions is a data modeling language for validating and describing Resource Description Framework
Here are my notes from attempting to create a ShEx file.
Creating a ShEx File
This time, we start with RDF data in data/tmp/merged.ttl. We use shexer to create a ShEx file from the RDF data.
pip install shexer
Get the list of classes in the RDF data.
from rdflib import Graph
input_nt_file = "data/tmp/merged.ttl"
graph = Graph()
graph.parse(input_nt_file, format="turtle")
knows_query = """
SELECT DISTINCT ?cls
WHERE {
?a a ?cls
}"""
qres = graph.query(knows_query)
target_classes = []
for row in qres:
target_classes.append(f"{row.cls}")
target_classes
Process the retrieved classes.
from shexer.shaper import Shaper
from shexer.consts import NT, SHEXC, SHACL_TURTLE, TURTLE
shaper = Shaper(target_classes=target_classes,
input_format=TURTLE,
graph_file_input=input_nt_file)
output_file = "data/tmp/shapes.shex"
shaper.shex_graph(output_file=output_file,
acceptance_threshold=0.1)
print("Done!")
As a result, the following ShEx file was created.
:教育メタデータ
{
exp:指導要領コード IRI +; # 100.0 %
# 12.307692307692308 % obj: IRI. Cardinality: {7}
rdf:type [data:教育メタデータ] ; # 100.0 %
schema:geo IRI +; # 100.0 %
# 21.53846153846154 % obj: IRI. Cardinality: {1}
# 12.307692307692308 % obj: IRI. Cardinality: {3}
# 12.307692307692308 % obj: IRI. Cardinality: {6}
# 10.76923076923077 % obj: IRI. Cardinality: {2}
exp:学年 @:学年 +; # 100.0 %
# 21.53846153846154 % obj: @:学年. Cardinality: {5}
# 16.923076923076923 % obj: @:学年. Cardinality: {1}
# 10.76923076923077 % obj: @:学年. Cardinality: {6}
# 10.76923076923077 % obj: @:学年. Cardinality: {4}
# 10.76923076923077 % obj: @:学年. Cardinality: {3}
exp:教科 @:教科 +; # 100.0 %
# 18.461538461538463 % obj: @:教科. Cardinality: {8}
# 15.384615384615385 % obj: @:教科. Cardinality: {3}
# 12.307692307692308 % obj: @:教科. Cardinality: {6}
# 12.307692307692308 % obj: @:教科. Cardinality: {4}
# 10.76923076923077 % obj: @:教科. Cardinality: {5}
rdfs:label xsd:string ; # 100.0 %
exp:学習指導案 IRI ; # 100.0 %
exp:時代 @:時代 *;
# 96.92307692307692 % obj: @:時代. Cardinality: +
# 23.076923076923077 % obj: @:時代. Cardinality: {2}
# 15.384615384615385 % obj: @:時代. Cardinality: {1}
# 15.384615384615385 % obj: @:時代. Cardinality: {3}
# 13.846153846153847 % obj: @:時代. Cardinality: {4}
# 12.307692307692308 % obj: @:時代. Cardinality: {6}
...
Converting ShEx to Turtle Format
Here, we convert the ShEx file created above to Turtle format.
Converting ShEx to ShExJ
There may be other methods, but here we use Node.js to convert to JSON format.
npm install shex
./node_modules/@shexjs/cli/bin/shex-to-json data/tmp/shapes.shex > data/tmp/shapes.shexj
The following JSON file was generated.
{
"type": "Schema",
"shapes": [
{
"id": "http://weso.es/shapes/Class",
"type": "ShapeDecl",
"shapeExpr": {
"type": "Shape",
"expression": {
"type": "EachOf",
"expressions": [
{
"type": "TripleConstraint",
"predicate": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type",
"valueExpr": {
"type": "NodeConstraint",
"values": [
"http://www.w3.org/2000/01/rdf-schema#Class"
]
}
},
{
"type": "TripleConstraint",
"predicate": "http://www.w3.org/2000/01/rdf-schema#label",
"valueExpr": {
"type": "NodeConstraint",
"datatype": "http://www.w3.org/1999/02/22-rdf-syntax-ns#langString"
},
"min": 1,
"max": -1
},
...
Converting ShExJ to Turtle Format
Convert to Turtle using rdflib.
import json
from rdflib import Graph
INPUT_PATH = "data/tmp/shapes.shexj"
OUTPUT_PATH = "data/tmp/shapes.ttl"
# Load ShExJ schema from a file
with open(INPUT_PATH, "r") as file:
shex_j_str = file.read()
# Parse ShExJ schema as JSON
shex_j = json.loads(shex_j_str)
# Create a new RDF graph
g = Graph()
# Load ShExJ schema into the graph
g.parse(data=json.dumps(shex_j), format="json-ld")
# Serialize the graph to Turtle format
turtle_output = g.serialize(format="turtle")
# Save the Turtle output to a file
with open(OUTPUT_PATH, "w") as file:
file.write(turtle_output)
print("ShEx schema has been successfully converted to Turtle.")
As a result, the following Turtle file was created.
<http://weso.es/shapes/教育メタデータ> a <file:///Users/nakamura/git/oi/oi/demo/src/ShapeDecl> ;
shex:shapeExpr [ a shex:Shape ;
shex:expression [ a shex:EachOf ;
shex:expressions ( [ a shex:TripleConstraint ;
shex:max -1 ;
shex:min 1 ;
shex:predicate <https://w3id.org/sukilam-educational-metadata/term/property#指導要領コード> ;
shex:valueExpr [ a shex:NodeConstraint ;
shex:nodeKind shex:iri ] ] [ a shex:TripleConstraint ;
shex:predicate rdf:type ;
shex:valueExpr [ a shex:NodeConstraint ;
shex:values ( <https://w3id.org/sukilam-educational-metadata/data/教育メタデータ> ) ] ] [ a shex:TripleConstraint ;
shex:max -1 ;
shex:min 1 ;
shex:predicate <http://schema.org/geo> ;
shex:valueExpr [ a shex:NodeConstraint ;
shex:nodeKind shex:iri ] ] [ a shex:TripleConstraint ;
shex:max -1 ;
...
Summary
Due to my insufficient knowledge of ShEx, I have not been able to verify whether the output is correct, but I was at least able to generate ShEx-related files from my existing RDF data.
I plan to study ShEx further and refine this in the future.