TLDR: This research paper introduces a new method for validating data using SHACL constraints in conjunction with OWL ontologies. It addresses the semantic conflict between OWL’s open-world assumption and SHACL’s closed-world assumption by defining an “austere canonical model” and developing a rewriting technique. This technique transforms ontology rules and SHACL constraints into a new set of SHACL constraints, allowing validation against the original data using standard SHACL validators. The paper also analyzes the computational complexity, showing it’s efficient for large datasets.
Data on the web is often managed using standards like RDF, which describes information in a graph-like structure. Two crucial standards from the W3C for handling this data are the Web Ontology Language (OWL) and the Shape Constraint Language (SHACL). While both are powerful, they operate under fundamentally different assumptions, creating a significant challenge when used together.
OWL is designed for inferring new facts from incomplete data. Imagine you have a database of pets, and you know “all pet birds are pets.” If your data only says “Linda has a pet bird,” OWL can infer that “Linda has a pet.” This is called the open-world assumption (OWA) – it assumes that what’s not explicitly stated might still be true. On the other hand, SHACL is used to define and validate constraints on data. For example, a SHACL constraint might say “every pet owner must have at least one pet.” SHACL operates under the closed-world assumption (CWA), meaning it assumes the data it’s given is complete, and it validates constraints based only on what’s explicitly present.
The natural question arises: how do you validate SHACL constraints when an OWL ontology might imply additional facts that aren’t directly in your data? This semantic gap is a major hurdle. For instance, if your data says “Linda has a pet bird” and your OWL ontology says “all pet birds are pets,” you’d want SHACL to validate a constraint like “Linda has a pet” even if “Linda has a pet” isn’t explicitly written down. This is precisely the problem that Anouk Oudshoorn, Magdalena Ortiz, and Mantas Å imkus from TU Wien, Austria, address in their research paper, “SHACL Validation in the Presence of Ontologies: Semantics and Rewriting Techniques.”
A New Approach to Validation
The researchers propose a novel semantics for SHACL validation in the presence of ontologies, based on what they call “core universal models.” Think of a universal model as a comprehensive version of your data, where all facts implied by the OWL ontology have been made explicit. However, simply adding all possible implied facts can lead to issues, especially with SHACL’s negation features. The paper introduces the “austere canonical model,” a special kind of universal model that is “minimal” – it avoids introducing any redundant structures or unnecessary facts. This minimality is crucial for ensuring that SHACL’s closed-world assumptions work intuitively with OWL’s open-world inferences.
To make this practical, the paper develops a “rewriting technique.” Instead of actually constructing this potentially infinite austere canonical model, the technique transforms the original SHACL constraints and the OWL ontology rules into a new set of SHACL constraints. These new constraints can then be validated directly against the original (or a slightly enriched) data graph. This is a significant breakthrough because it allows developers to reuse existing, standard SHACL validators, avoiding the need for specialized tools that can handle the complex interplay between OWL and SHACL.
Also Read:
- A New Approach to Learning Logical Rules with Minimum Message Length
- Boosting Efficiency in Learning Logic Programs
Handling Complex Constraints and Practicality
The research extends its rewriting technique to “stratified SHACL,” a fragment of recursive SHACL that allows for negation and recursion in a controlled manner. This ensures that the approach can handle more sophisticated validation scenarios. The core idea is to process constraints in “strata” or layers, ensuring that negative conditions are evaluated based on what’s already known from earlier layers.
The paper also delves into the computational complexity of this combined validation. While reasoning with ontologies alone can be computationally intensive, the researchers show that SHACL validation in the presence of Horn-ALCHIQ TBoxes (a rich type of OWL ontology) is “ExpTime-complete” in combined complexity (considering the size of the ontology, data, and constraints) but remains “PTime-complete” in data complexity (when the ontology and constraints are fixed, and only the data size varies). This means that for practical applications where the ontology and constraints are stable, the validation process remains efficient, scaling well with the size of the data.
In essence, this research provides a robust theoretical foundation and practical techniques for combining the strengths of OWL for knowledge representation and SHACL for data validation. By bridging their semantic gap through innovative modeling and rewriting, it paves the way for more powerful and reliable data management on the web.


