Overview

ShEx is described on Wikipedia as:

Shape Expressions is a data modeling language for validating and describing Resource Description Framework

Here are my notes from attempting to create a ShEx file.

Creating a ShEx File

This time, we start with RDF data in data/tmp/merged.ttl. We use shexer to create a ShEx file from the RDF data.

pip install shexer

Get the list of classes in the RDF data.

from rdflib import Graph

input_nt_file = "data/tmp/merged.ttl"

graph = Graph()
graph.parse(input_nt_file, format="turtle")

knows_query = """
SELECT DISTINCT ?cls
WHERE {
    ?a a ?cls
}"""

qres = graph.query(knows_query)

target_classes = []

for row in qres:
    target_classes.append(f"{row.cls}")

target_classes

Process the retrieved classes.

from shexer.shaper import Shaper
from shexer.consts import NT, SHEXC, SHACL_TURTLE, TURTLE

shaper = Shaper(target_classes=target_classes,
                input_format=TURTLE,
                graph_file_input=input_nt_file)

output_file = "data/tmp/shapes.shex"

shaper.shex_graph(output_file=output_file,
                  acceptance_threshold=0.1)

print("Done!")

As a result, the following ShEx file was created.

:教育メタデータ
{
   exp:指導要領コード  IRI  +;                                        # 100.0 %
            # 12.307692307692308 % obj: IRI. Cardinality: {7}
   rdf:type  [data:教育メタデータ]  ;                                 # 100.0 %
   schema:geo  IRI  +;                                         # 100.0 %
            # 21.53846153846154 % obj: IRI. Cardinality: {1}
            # 12.307692307692308 % obj: IRI. Cardinality: {3}
            # 12.307692307692308 % obj: IRI. Cardinality: {6}
            # 10.76923076923077 % obj: IRI. Cardinality: {2}
   exp:学年  @:学年  +;                                            # 100.0 %
            # 21.53846153846154 % obj: @:学年. Cardinality: {5}
            # 16.923076923076923 % obj: @:学年. Cardinality: {1}
            # 10.76923076923077 % obj: @:学年. Cardinality: {6}
            # 10.76923076923077 % obj: @:学年. Cardinality: {4}
            # 10.76923076923077 % obj: @:学年. Cardinality: {3}
   exp:教科  @:教科  +;                                            # 100.0 %
            # 18.461538461538463 % obj: @:教科. Cardinality: {8}
            # 15.384615384615385 % obj: @:教科. Cardinality: {3}
            # 12.307692307692308 % obj: @:教科. Cardinality: {6}
            # 12.307692307692308 % obj: @:教科. Cardinality: {4}
            # 10.76923076923077 % obj: @:教科. Cardinality: {5}
   rdfs:label  xsd:string  ;                                   # 100.0 %
   exp:学習指導案  IRI  ;                                           # 100.0 %
   exp:時代  @:時代  *;
            # 96.92307692307692 % obj: @:時代. Cardinality: +
            # 23.076923076923077 % obj: @:時代. Cardinality: {2}
            # 15.384615384615385 % obj: @:時代. Cardinality: {1}
            # 15.384615384615385 % obj: @:時代. Cardinality: {3}
            # 13.846153846153847 % obj: @:時代. Cardinality: {4}
            # 12.307692307692308 % obj: @:時代. Cardinality: {6}
	    ...

Converting ShEx to Turtle Format

Here, we convert the ShEx file created above to Turtle format.

Converting ShEx to ShExJ

There may be other methods, but here we use Node.js to convert to JSON format.

npm install shex
./node_modules/@shexjs/cli/bin/shex-to-json data/tmp/shapes.shex > data/tmp/shapes.shexj

The following JSON file was generated.

{
  "type": "Schema",
  "shapes": [
    {
      "id": "http://weso.es/shapes/Class",
      "type": "ShapeDecl",
      "shapeExpr": {
        "type": "Shape",
        "expression": {
          "type": "EachOf",
          "expressions": [
            {
              "type": "TripleConstraint",
              "predicate": "http://www.w3.org/1999/02/22-rdf-syntax-ns#type",
              "valueExpr": {
                "type": "NodeConstraint",
                "values": [
                  "http://www.w3.org/2000/01/rdf-schema#Class"
                ]
              }
            },
            {
              "type": "TripleConstraint",
              "predicate": "http://www.w3.org/2000/01/rdf-schema#label",
              "valueExpr": {
                "type": "NodeConstraint",
                "datatype": "http://www.w3.org/1999/02/22-rdf-syntax-ns#langString"
              },
              "min": 1,
              "max": -1
            },
	    ...

Converting ShExJ to Turtle Format

Convert to Turtle using rdflib.

import json
from rdflib import Graph

INPUT_PATH = "data/tmp/shapes.shexj"
OUTPUT_PATH = "data/tmp/shapes.ttl"

# Load ShExJ schema from a file
with open(INPUT_PATH, "r") as file:
    shex_j_str = file.read()

# Parse ShExJ schema as JSON
shex_j = json.loads(shex_j_str)

# Create a new RDF graph
g = Graph()

# Load ShExJ schema into the graph
g.parse(data=json.dumps(shex_j), format="json-ld")

# Serialize the graph to Turtle format
turtle_output = g.serialize(format="turtle")

# Save the Turtle output to a file
with open(OUTPUT_PATH, "w") as file:
    file.write(turtle_output)

print("ShEx schema has been successfully converted to Turtle.")

As a result, the following Turtle file was created.

<http://weso.es/shapes/教育メタデータ> a <file:///Users/nakamura/git/oi/oi/demo/src/ShapeDecl> ;
    shex:shapeExpr [ a shex:Shape ;
            shex:expression [ a shex:EachOf ;
                    shex:expressions ( [ a shex:TripleConstraint ;
                                shex:max -1 ;
                                shex:min 1 ;
                                shex:predicate <https://w3id.org/sukilam-educational-metadata/term/property#指導要領コード> ;
                                shex:valueExpr [ a shex:NodeConstraint ;
                                        shex:nodeKind shex:iri ] ] [ a shex:TripleConstraint ;
                                shex:predicate rdf:type ;
                                shex:valueExpr [ a shex:NodeConstraint ;
                                        shex:values ( <https://w3id.org/sukilam-educational-metadata/data/教育メタデータ> ) ] ] [ a shex:TripleConstraint ;
                                shex:max -1 ;
                                shex:min 1 ;
                                shex:predicate <http://schema.org/geo> ;
                                shex:valueExpr [ a shex:NodeConstraint ;
                                        shex:nodeKind shex:iri ] ] [ a shex:TripleConstraint ;
                                shex:max -1 ;
				...

Summary

Due to my insufficient knowledge of ShEx, I have not been able to verify whether the output is correct, but I was at least able to generate ShEx-related files from my existing RDF data.

I plan to study ShEx further and refine this in the future.