Relations-between-classes based Constraints
and Constraint-based Ontology-Completenesses

Dr Philippe A. MARTIN (www.phmartin.info;  article@phmartin.info)


This article is an extension of the one accepted at Quatic 2018. A preliminary (-> smaller and older) version has also been published in French in the proceedings of EGC 2018, with Jun H. JO as co-author.
In the current version of this article, a text in bold characters within a paragraph is a text modified or added in July/August/September 2018.


Abstract. This article first distinguishes constraints from rules, and descriptive constraints from prescriptive ones. Both kinds can be used for calculating constraint-based completenesses (as opposed to real-world-based completeness), i.e. evaluating how much of a knowledge base is complete with respect to some constraints, e.g. for evaluating how well this base follows given ontology design patterns or best practices. Such evaluations may also guide knowledge elicitation and modelisation. This article explores the ways constraints can be represented via relations between classes, hence via any knowledge representation language (KRL) that has an expressiveness at least equal to RDF or RDFS. Compared to the popular practice of both representing and checking constraints via queries, this approach is as simple, offers more possibilities for exploiting both knowledge and constraints, and permits the selection and use of inference engines adapted to the expressiveness of the exploited knowledge instead of the use of restricted or ad hoc constraint-validation tools. This approach is also modular in the sense it separates content from usage: the represented “content focused constraints” can then be exploited via few “content independent” queries, one for each usage and kind of constraint.

Keywords:  prescriptive  and descriptive constraints, ontology completeness, OWL2, SPARQL.

Table of Contents

1. Introduction 2. Constraints: Definitions and Representation Approaches 2.1. Definitions 2.1.1. Considered Constraints: Those Not Used For Inferences, Just For Checks 2.1.2. Descriptive vs. Prescriptive (Hence Not Using All Possible Inferences) 2.1.3. Descriptive Constraints Restricted To Named Individuals 2.2. General Approach for Representing And Exploiting Constraints 2.2.1. Using Constraint Types 2.2.2. Using “Clones Without Types” For Bypassing Some Inferences When Checking The Conclusions Of Positive Prescriptive Constraints 2.3. Representing Constraints Via Relations Between Classes 2.3.1. Approaches 2.3.1.1. Comparison With OWL-ER 2.3.2. Proposed Types 2.3.3. Examples 2.3.3.1. Individual-based Constraints 2.3.3.2. SubclassOf-analogous Constraints 2.3.3.3. SubclassOf-based Constraints 3. Exploiting Relations-between-classes Constraints in SPARQL and Javascript 3.1. SPARQL Commands for Checking Constraints 3.1.1. Examples of KB Pre-treatments 3.1.1.1. Temporarily Removing Individuals For Checking Constraints on Types With Classic Description Logic Inference Engines 3.1.1.2. Creating “Identifier Relations” for Named Individuals 3.1.1.3. Creating “Clones Without Types” of Objects For Exploiting These Objects Without Inheritance Mechanism 3.1.2. Checking Individual-based Constraints 3.1.2.1. Checking Individual-based Positive Descriptive Constraints 3.1.2.2. Checking Individual-based Positive Prescriptive Constraints 3.1.2.3. Checking Individual-based Inconsistency-implying Constraints 3.1.3. Checking SubclassOf-analogous Constraints 3.1.4. Checking SubclassOf-based Constraints 3.1.5. Checking Binary Relations Instead of Individuals 3.1.6. Evaluating Completenesses of a KB 3.2. JavaScript Function for Ontology Completeness Evaluation 4. Illustration of Applications (or Use Cases) and Limits of the Approach by Showing How To Represent Some Important Ontology Design Patterns Via Relations-between-classes based Constraints 4.1. Examples of Useful General Descriptive Constraints 4.1.1. Checking or Evaluating the Design of a Subtype Hierarchy 4.1.1.1. Approach and Rationale 4.1.1.2. Implementation of the Subclass ODP with OWL2 4.1.2. Generalization To Check the Use of Any Transitive Relation 4.2. Examples of Useful General Prescriptive Constraints 4.2.1. Checking that Certain Relation Types Are Systematically Used From Objects of Certain Types 5. Evaluation and Comparisons 6. Conclusion 7. Acknowledgments 8. References




1. Introduction

Knowledge representations (KRs) are formal descriptions enabling automatic logical inferencing, and thus automatic KR comparison, search, merge, etc. KRs are logic formulas, e.g. the binary predicates of 1st-order logic; these predicates are called triples or property instances in RDF and binary relations in Conceptual Graphs (CGs) [Sowa, 1992]. For the purpose of clarity, this article uses the intuitive terminology of CGs: (information) objects are either types or individuals, and types are either relation types or concept types (classes and datatypes in RDF). A formal knowledge base (KB) is a collection of such objects written using a KR language (KRL). An ontology is a KB that is essentially about types, rather than about individuals.

Creating a KB or evaluating its quality – for knowledge sharing or exploitation purposes, or for designing or generating software, or evaluating their qualities – are difficult. Models and constraints (e.g. design patterns) help these tasks and can be stored into an ontology. E.g., the author of this article is building an ontology representing and organizing ontology design patterns as well as software design patterns; both kinds are organized by the same top-level types. [Zaveri et al., 2016], a survey on quality assessment for Linked Data, provides many dimensions and metrics for evaluating the quality of KBs and hence for helping the selection or design of KBs. One of the quality dimensions is the (degree of) completeness of a KB, with respect to some criteria or, more concisely, “its completeness”. Evaluating this degree is common in various tasks or fields but is performed differently by different tools and sometimes in implicit or ad hoc ways. Examples of such tasks or fields are: i) the automatic/manual extraction of knowledge or the creation of a KB, ii) the exploitation of ontology design patterns, KB design libraries (e.g., the KADS library) or top-level ontologies (e.g., DOLCE), and iii) the evaluation of ontologies or, more generally, datasets. In this third field, as noted in [Zaveri et al., 2016], completeness commonly refers to a degree to which the “information required to satisfy some given criteria or a given query” are present in the considered dataset. To complement this very general definition, this article distinguishes two kinds of completeness:

One way to define or calculate the constraint-based completeness of a KB is to divide “the number of statements satisfying the constraints in that KB” by “the total number of statements in the KB”. When calculating real-world-based-based completeness, aggregating the results of completeness oracles or making comparisons with the gold standard dataset may not be as simple. As a variant, instead of statements only, one may want to consider objects and thus measure the percentage of objects for which all relations from/to them satisfy the constraints. Other variants may be defined by considering only certain kinds of objects or statements. Defining constraints via KRs in the KB, instead of via queries, permits the definition of “content-independent queries” to exploit these constraints. Otherwise, a different (content-dependent) query has to be created for each variant of constraint-based checking or completeness. Because of this lack of modularity, when stored in an ontology, content-dependent queries are also less easily organized than content-independent ones.

This article does not address real-world based completeness but the techniques this article proposes for representing and exploiting constraints may also be used for representing certain domain-specific parts of the rules used for calculating real-world based completeness. From now on, unless otherwise explicitly specified, “completeness” refers to constraint-based completeness.

Section 2.1 explores the first research question of this article: what does the expression “must and must not be represented in the dataset” entail or, more precisely, given the “descriptive vs. prescriptive” distinction, what kinds of constraints need to be considered for evaluating constraint-based completeness via content-independent queries?

Section 2.2 and Section 2.3 proposes a general approach to answer the second research question of this article: how to represent constraints in a KRL independent way – or, more precisely, in any KRL that has an expressiveness at least equal to RDF or RDFS – even though actually defining the semantics of some of these constraints would require much more expressive logics? The proposed solution relies i) on the representation of constraints via restricted constructs based on relations between classes (or to classes), e.g. rules using the rdfs:subClassOf relation for representing a restricted implication between the condition and conclusion of the rule, ii) on the use of certain special types for specifying that a statement is a constraint of a certain kind, and iii) on the exploitation of these types in content-independent queries. Such constraint representations can then be exploited via most inference engines and KR query languages instead of tools tied to a particular KRL or goal (knowledge acquisition, ontology design pattern application, ontology evaluation, etc.).

Section 3.1 shows commands (queries or update requests) that may be used as an answer to a third research question: how to implement the above cited general approach in SPARQL or slight extensions of it. Section 3.2 shows that using a procedural language instead of a query language permits the implementation of the proposed general approach in a simpler and more flexible way.

Section 4 illustrates applications or use cases for the given content-independent queries. Like Section 3, it also shows how some – but not all – expressiveness limits of subclassOf-based constraints can be overcome by pre-treating the KB, e.g. via SPARQL.

Section 5 evaluates the proposed approach and compares it to other ones.




2. Constraints: Definitions and Representation Approaches

2.1. Definitions

2.1.1. Considered Constraints: Those Not Used For Inferences, Just For Checks

In this article, as in KIF (Knowledge Interchange Format) [Genesereth & Fikes, 1992], a rule is a statement that can be represented – directy or via a loss-less translation – in the form “X =>> Y” where “=>>” is a restricted version of the logical implication ( “X => Y”): it only supports modus ponens, not modus tollens. An inference supporting statement is a rule or a statement that can be represented in the form “X => Y”. a rule is a statement that can be represented

A rule allowing the derivation of a non-modal statement is a rule that can be represented in the form “X =>> Y” where Y does not include a modality (e.g., must). An example is “if x is a Person then x has a parent”. If this statement and “Tom is a Person” are in a KB, an inference engine can derive the non-modal statement “Tom has a parent”.

[Chein & Mugnier, 2008] defines constraints as positive or negative, respectively expressing statements of the form “if A,  B must be true” and “if A,  B must be false”. Thus, the authors define constraints as rules where the conclusion has a “must” modality. These are the kinds of constraints considered in this article, with the interpretation that in such constraints the “must” entails that the constraints can only be used for checking statements, i.e. that they are not rules allowing the derivation of non-modal statements. More formally, this means that such positive and negative constraints can respectively be translated into the forms “A ∧ ¬B =>> false” and “A ∧ B =>> false” where A and B do not contain a “must” modality and A may be empty. As an example, consider the positive constraint “if x is a Person, x must have a parent”. From this constraint and the fact “Tom is a Person”, an inference engine must not derive “Tom has a parent”. It may derive “Tom must have a parent” but, in practice, such derivation is not made. As a somewhat opposite example, RDFS-aware engines do not exploit relations of type rdfs:domain or rdfs:range as relation signature constraints but as inference supporting statements: when a relation r has a type partially defined by an rdfs:domain (vs. rdfs:range) relation, RDFS-aware engines may infer a type for the source (vs. destination) of r.

In this article, constraints that are directly represented in a form ending by “=>> false” – or, equivalently, “=>> ⊥” – are called constraints in inconsistency-implying form. Not all KRLs allow to represent rules (instead of – or in addition to – implications); in those that do, representing negative constraints using the inconsistency-implying form is easy but using this form for representing positive constraints may not possible: the KRL may not permit the representation of the negation in the “¬B” part. This in why in this article i) negative constraints are represented in inconsistency-implying form, and ii) positive constraints are in the form “A =>> Bbut have a type that distinguishes them from actual rules. Furthermore, as in most rule-based systems, in the rest of this article the A and B parts share variables. More precisely, these parts are representations of relations from a same object (i.e. from a type or an individual, including a relation or a more complex statement since they are particular kinds of individuals). Thus, checking if a positive constraint is violated by an object in a dataset is checking if the object matches – that is, specializes – the condition of the constraint but not its conclusion. In other words, both the condition and conclusion of a positive constraint should be logically derivable from the object, possibly using various inference supporting statements from the KB.

In the research literature on constraints, these ones are generally not represented – or checked – via modal logic based KRLs but rather using queries, e.g. via SPARQL or the nonmonotonic-epistemic-logic query language EQL-Lite [Calvanese et al.]. In (unidirectional) rule based systems, rules with empty conclusions (or “false” as conclusions) are handled like constraints. However, this is a particularity of these systems. It should not be relied upon for general knowledge representation purpose. For such a purpose, the special semantics of constraints has to be made explicit via special syntactic sugar or special types. Since KRLs rarely propose syntactic sugar for expressing constraints, a more generic approach for expressing that a statement is a constraint, as opposed to an inference supporting statement, is to state that this statement is an instance of a type expressing a particular kind of constraint, as explained in Section 2.2.1. Then, these constraints can be retrieved and exploited by content-independent queries such as those provided below. These constraints can also be directly interpreted and exploited by inference engines designed to take into account the used constraint types. In any case, either i) constraints are not represented in a way they can be exploited as inference supporting statements, or ii) the results of these inferences must not be detrimental, i.e., must not influence the checking of constraints. Both techniques will be illustrated in Section 2.3.1 and Section 3.


2.1.2. Prescriptive (i.e. Not Using All Possible Inferences) vs. Descriptive

As noted in [Assmann & Wagner, 2006], one common distinction between engineering models is whether they are i) descriptive of some reality, e.g. like most ontologies (e.g., by default, ontologies written in RDF or OWL), or ii) prescriptive of what must be in the considered dataset, as with system specifications, meta-models, XML schemas, database schemas, some SHACL statements, etc. Similarly, this article distinguishes two kinds of constraints. First, like definitions or axioms, descriptive constraints lead inference engines to check the use of certain formal terms, if and only if these terms are used. On the other hand, prescriptive constraints lead inference engines to check that certain formal terms are actually used (or not) and that this use is not due to inferences. This can for example be useful for checking that whenever people or software agents create an instance of a certain type, they explicitly associate certain relations to this instance. Here, “explicitly” specifies that these relations must not exist just because they were automatically deduced, e.g. by inheritance, but only because they were (manually or automatically) set by an agent when creating or modifying the dataset. As an example, assume that a KB includes the non-modal rule “if x is a Person, x has a parent” and that a user enters that “John is a Person” in the base of facts of this KB (this base is the set of relations from/to individuals; for a description-logic based KB, this is its A-box). Even if this KB also includes the descriptive constraint “if x is a Person, x must have a parent in the represented world (descriptive-must)”, an error message should not be given by a KB checking mechanism since this constraint is satisfied (by inferencing) without the user having to represent a parent for John. On the other hand, if the KB includes the prescriptive constraint “if x is a Person, x must have a parent in the base of facts (prescriptive-must)”, the adding of a new person without a relation to a parent must now be rejected. In this article, since constraints are distinguished from actual rules via their type, this one is also used for permitting inference engines or special procedures to distinguish prescriptive constraints from descriptive ones.

With constraints in inconsistency-implying form, the non-existence of certain terms is checked. In the general case, it does not matter (and is difficult to know) whether this non-existence is due to inferences or not. Thus, at least in this article, the distinction between descriptive and prescriptive (as above defined) does not apply to constraints in inconsistency-implying form.

When trying to match an object (in the dataset) to the condition of a constraint, in order to avoid missing a match the expressiveness used for the inference regime must be at least equal to the expressiveness used in the object. This is also true when trying to match an object to the conclusion of a descriptive constraint. However, when trying to match an object O to the conclusion C of a prescriptive constraint, the knowledge K that could automatically add relations to the object (even a type relation) should not be used. If relations have already been added to O using K, e.g. because forward chaining saturation was automatically used, the checking must be done on O0 the version of O as it was before K was used. More formally, if we call KB0 the version of the KB before the use of K, matching O to C is trying to find if O0 permits the derivation of C via inference supporting statements in “KB0 minus K”: “O0 (KB0\K)⊢ C”. In an OWL2 based KB, K includes the definitions based on rdfs:domain or rdfs:range relations and the definitions based on owl:equivalentClass relations to restrictions (there is no problem if only owl:subClassOf is used). If some mechanism automatically associates relations to some of the checked objects – e.g., by dynamic lookup for inherited relations during each object matching test, or by forward chaining saturation – this mechanism must be temporarily disabled or bypassed. However, disabling or changing these mechanisms (or, in other words, the used entailment regime) generally cannot be done in the middle of a query. E.g., SPARQL does not permit such a change. Hence, instead, bypassing methods are needed. Section 2.2.2 proposes two methods.

The notion of prescriptive constraints as above defined is original: to our knowledge, in other works on constraints, these ones apply on statements of the KB whether they have been inferred or originally provided. Since these works do not provide ways to make the differences between those two kinds of statements, the constraints they handle are by default descriptive. Yet in other words, since these works do not provide a mechanism for temporary disabling or bypassing some inferences only when checking the conclusions of the constraints, these last ones can be exploited as prescriptive constraints (as above defined) by disabling or bypassing some inferences for checking both the conditions and the conclusions, thus in an incomplete way for the checking of the conditions.

A constraint, whether descriptive or prescriptive, is intended to check the existence (or the absence of existence) of certain objects in a KB. To that end, constraint checking needs to exploit some form of closed world assumption. When SPARQL is used for checking constraints, the closed world assumption related parts can be represented via the operators EXISTS and NOT EXISTS. In this article, SPARQL is used for the content-independent queries (see Section 3) and only the operator EXISTS had to be used in these queries. Furthermore, with such queries, there is no need to represent any closed world assumption related part in the constraints themselves. Thus, the inference engines used for matching objects of the KB to the constraints need not use the closed world assumption. Hence, regular inference engines for OWL2 can for example be reused.


2.1.3. Descriptive Constraints Restricted To Named Individuals

When specifying a constraint, people or software agents may want to prevent a part of the constraint to match with an object that is anonymous (i.e. that is a “blank node” in the RDF terminology) or, conversely, that is named (i.e. that has been given an identifier or, in the RDF model, that is an identifier). E.g. they may want to use a descriptive constraint for checking that all instances of Person have been given a parent relation with a named destination, inferred or not. Indeed, if the KB contains an inference supporting statement precise enough to associate a named parent to certain persons, it seems reasonable to exploit this statement and thus spare some explicit knowledge entering, while it also seems reasonable not to reuse an inference supporting statement merely associating an anoymous parent to each person. To sum up, there is a need for descriptive constraints restricted to named individuals and hence a need for distinguishing anonymous individuals from named ones when writing these constraints.

Many KRLs – including JSON-LD – provide a relation type or some syntactic sugar for associating an identifier to an individual. This is not the case in RDF because, in this model, IRIs (International Resource Identifiers) are not lexical objects, they are interpreted as (or conflated with) named individuals. OWL also does not provide a relation type which could be used for distinguishing which individuals are (or must be) named from those who are not (or need not be). In OWL2-DL, this distinction can be made if each named individual is declared as instance of the class owl:NamedIndividual. However, doing so in OWL2-Full still does not permit to make the distinction. Fortunately, SPARQL supports the distinction via the operators isIRI and isBlank. Hence, the solution proposed in this article is to provide i) the relation type cstr:id for enabling the authors of a constraint to specify which individual must have an identifier (an example is given in each of the subsections of Section 2.3.3), and ii) a SPARQL update request permitting the adding of cstr:id relations to each named individual type in a KB (Section 3.1.1.2). Thus, if some descriptive constraints use cstr:id relations and if this last SPARQL update request is run, these constraints will correctly be checked by content-independent queries provided for descriptive constraints (e.g., see Section 3.1.2.1).



2.2. General Approach for Representing And Exploiting Constraints

2.2.1. Using Constraint Types

[Tao et al., 2010] shows that representing and checking certain kinds of integrity constraints that exploit some forms of the Unique Name Assumption or Closed World Assumption can be performed via SPARQL queries. Instead, as explained in the introduction, the goal is here to enable the representation of constraints that i) can be exploited via content-independent queries, ii) can be represented via any KRL that has an expressiveness at least equal to RDF or RDFS, and iii) can be marked as descriptive or prescriptive (this distinction is not made in [Tao et al., 2010]).

To that end, the proposed approach is to introduce a few types for constraints. By setting instanceOf or subtypeOf relations from certain KRs to some of those types, KB authors can state that these KRs are constraints and can indicate which kind of constraints. Thus, these constraints can be exploited by content-independent queries or inference engines that understand the used constraint types. For these engines, the types change the way the statements must be interpreted. This approach is similar to the use of OWL2 types in RDF statements and their exploitation by OWL2-aware inference engines. The name of the proposed ontology of constraint types is CSTR. In this ontology, cstr:Constraint is the supertype of all types of constraints. Similarly, the type cstr:Prescriptive_constraint, a subtype of cstr:Constraint, enables one to state that some rules are actually prescriptive constraints or to retrieve all and only such constraints. The prefix “cstr:” in these identifiers is an abbreviation for the namespace http://www.webkb.org/kb/it/CSTR. CSTR also includes types for constraint conditions and types for relations between the condition and condition of a constraint. These types are listed in Section 2.3.2.


2.2.2. Using “Clones Without Types” For Bypassing Some Inferences When Checking The Conclusions Of Positive Prescriptive Constraints

For adequately checking positive prescriptive constraints Section 2.1.2 introduced the need for temporarily disabling or bypassing “inference mechanisms that automatically associate relations to objects” when testing the first object of the conclusion of the constraint. This subsection proposes a method to so. Statically (i.e. via a pre-treatment of the KB like the one given in Section 3.1) or dynamically (i.e. during the checking of such constraints), this method creates a “clone without type” of each object matching the condition of such a constraint and then, when checking its conclusion, does so on this clone instead of the original object. The clone has the same relations as the original object except for instanceOf relations (it has none; furthermore, if it is a named individual, it has an identifier different from the original object). Thus, so using clones without types, “inferences exploiting types to associate relations to an object” are avoided. As an abbreviation, from now on, this is referred to as avoiding inheritance. In the case of RDFS or OWL entailments, “avoiding inheritance” means that, when searching relations associated to an object, the types of this object and their superclasses are not exploited.

Creating clones without types is not necessarily easy since there may be information in the KB that lead certain inference engines to regenerate types for some clones. Here are examples.

  • Assuming there is an rdfs:domain relation from the relation type parent to the type Person, if an object of type Person Person is source of a parent relation and this object has its type removed, an inference engine may set it again. To avoid such a case, instead of using rdfs:domain or rdfs:range relations, one may write inconsistency-implying constraints that are equivalent to these relations except that they are usable only for checking purposes. An example is given at the end of each of the subsections of Section 2.3.3.
  • When SPARQL is used for creating a clone without type, as illustrated in Section 3.1.1.2, another potential problem is that the whole KB is duplicated, not just one object.
  • This method based on clones without types also does not work if there are inferences that do not exploit types (e.g. via duck typing instead of inheritance) or if a forward chaining saturation on the KB is automatically run before the above cited pre-treatment. However, these last two cases are rare.

This method relies on a temporary update of KRs before their checking by an inference engine. Thus, this method does not rely on a particular KRL, inference engine or tool feature. In other words, this solution is KRL independent and tool independent: it can be used with any KRL and any tool. Hence, depending on the domain and application, different inference engines can be reused to check or evaluate ontology completeness. However, with some query languages such as current standard versions of SPARQL, the temporary update cannot be done dynamically: a KB pre-treatment is necessary. This is a limitation since KB servers, e.g. SPARQL endpoints, rarely allow their users to modify a KB for checking it. With an extension of SPARQL such as LDScript [Corby, Faron-Zucker & Gandon, 2017], the temporary update can be done dynamically. As with SPARQL, the whole KB is duplicated but now it is temporary and done every time an object is matched with the conclusion of a prescriptive constraint (Section 3.1.2.2 shows the LDSCRIPT code).



2.3. Representing Constraints Via Relations Between Classes

2.3.1. Approaches

One way to represent and exploit (simple) rules in a KRL that has an expressiveness at least equal to RDF or RDFS is to use an rdfs:subClassOf relation for representing the implication between the condition and conclusion of a rule. However then, either this implication must not be used for modus tollens or the results must not be detrimental. The situation is not much more complex when subclassOf rules are used as a way to represent constraints. There are three cases.

  • If the conclusion is (equivalent to) owl:Nothing, i.e. if the inconsistency-implying form is used, the rule is semantically a constraint and, depending on the inference engines, modus tollens may or may not be a danger.
  • Otherwise, if a prescriptive constraint is represented, the “Clones Without Types” based method prevents the results of modus ponens or modus tollens to influence the checking of constraints (this is where these results could have been detrimental).
  • Otherwise, i.e. if a descriptive constraint is represented, one must use an inference engine that does not exploit rules for modus ponens nor modus tollens when the condition of the rule is subtype of cstr:SubclassOf-based_constraint_condition or instance of cstr:Type_of_subclassOf-based_constraint_condition.

In other words, using subclassOf-based constraints when inferences based on subclassOf relations then have to be ignored is generally not relevant. However, the idea of using classes for representing the conditions and conclusion of a constraint without using variables is interesting. Here are the two simplest ways to do so.

  • The “subclassOf-analogous” way: it consists in relating the condition class and the conclusion class by a relation that is not a subclassOf one. To support this, CSTR proposes the relation types cstr:descriptive_constraint_conclusion and cstr:prescriptive_constraint_conclusion.
  • The “individual-based constraint” way: it consists in creating a constraint individual and, from it, relations to express its type (i.e., descriptive vs. prescriptive) and the classes for its condition and its conclusion. To support this, CSTR proposes the relation types cstr:condition_class and cstr:conclusion_class. Since this way uses an individual to relate two classes, it can be seen as indirectly relation-between-classes based, as opposed to the subclassOf-based and subclassOf-analogous ways which are directly relation-between-classes based. A similar solution would be to use a type instead of an individual for representing the constraint. This solution is slightly closer to the two other ones but a bit artificial. Another reason it is not used in this article is that the individual-based solution is easier to name and distinguish from the other two basic solutions.

A disadvantage of any solution using relations from/to classes when these relations are not subclassOf ones is that the result requires a KRL with an expressivity at least equal to RDF. For OWL-based representations, this means interpreting them with the RDF-Based Semantics, not the OWL2 Direct Semantics. Since each of the three approaches has its advantages and drawbacks, this article proposes types and commands for the three approaches: Section 2.3.2 gives the types, Section 2.3.3 gives examples and Section 3.1 gives the requests.

SubclassOf-based constraints could be generalized using a subtypeOf relation instead of a subclassOf one. Conforming to the terminology given in the first paragraph of the introduction, this would permit the use of relation types and datatypes, hence not just classes, in the condition and conclusion of a constraint. However, checking objects such as relations and instances of datatypes can most often be done via constraints on class instances related to these objects. For example, relations can generally be checked via their sources or destinations, or via the assertions that include these relations, i.e. instances of classes such as owl:NegativeObjectPropertyAssertion. Hence, for relating the condition and conclusion of a constraint, this article does not propose the use of subtypeOf relations that are not rdfs:subClassOf relations.

2.3.1.1. Comparison With OWL-ER

[Baget et al., 2015] shows that a sublanguage of OWL2 called OWL2-ER using subclassOf-based rules can represent many kinds of “existential rules” (hence the suffix “-ER”), i.e. rules in which the conclusion (or head) can include anonymous existentially quantified objects. A sublanguage of OWL2-ER is OWL2-RL [OWL 2 profiles] which can be translated into Datalog (definite Horn rules with equality), a purely declarative subset of Prolog. Similarly, OWL2-ER can be translated into – and hence represented by – Datalog+ [Baget et al., 2015] (the extension of Datalog to existential rules) and then into RuleML. Thus, not all OWL2 class expressions are allowed in OWL2-ER: rules are restricted to the logical form “∀X ∀Y (B(X,Y) => ∃Z H(X,Z))”, where “X, Y and Z are sets of variables, B (the body) is a conjunction with variables in X and Y, and H (the head) is a conjunction with variables in X and Z; hence Z denotes the set of variables that occur in H but not in B” [Baget et al., 2015]. Given its restrictions with respect to OWL2, OWL2-ER is roughly the part of Datalog+ that can be expressed using only OWL2, hence with binary relations only and without the syntactic possibility of directly and explicitly using variables for relating objects shared by both the condition and conclusion of a rule.

In OWL2-ER, negative constraints can be represented in two ways: i) by using a class expression equivalent to the type owl:Nothing in a rule conclusion, hence a rule of the form ∀x (ClassExpression(x) => ⊥), ii) by using the type owl:NegativeObjectPropertyAssertion to express negated facts of the form ¬ ∃x ClassExpression(x). Neither OWL2-ER nor Datalog+ can directly represent a positive constraint and hence also do not distinguish between descriptive and prescriptive constraints. However, OWL2-ER statements can be used for expressing and exploiting constraints with the types and approach proposed in this article. Conversely, with the proposed approach, subclassOf-based constraints are not restricted to OWL2-ER or OWL2 class expressions. Here are examples of OWL2-ER rules given in [Baget et al., 2015]; to express the meaning of constraints reusing them, it is sufficient to replace the words “is” or “has” by the words “must be” or “must have” in the informal descriptions (each of them is prefixed by “#”).

   ∀X ∀Y ∀Z (isProject(X, Y, Z) => isMember(Z,X))
      #Every leader of a project is a member of this project
  ∀X ∀Y (Researcher(X) ∧ hasExpertise(X,Y) => ∃Z ∃L (isProject(Z,Y,L) ∧ isMember(X,Z)))
      #Every researcher expert in an area is a member of a project in that area
  ∀X ∀Y ∀Z (isProject(X,Y,Z) ∧ isProject(X,Y,Z´) => Z = Z´)
      #Every project has at most one leader

Since the proposed approach is purposely KRL independent and not restricted to subclassOf-based constraints (and, a fortiori, to OWL2-ER statements), more formal details on OWL2-ER – or on what representing a rule using rdfs:subClassOf entails – would be irrelevant in this article. See [Baget et al., 2015] and [Swan, 2016] for those formal points. The SPARQL code given in the Section 3.1 provides the remaining necessary formal details.


2.3.2. Proposed Types

Figure 1 uses a concise version of UML to show subclassOf relations between some classes of CSTR. Figure 2 uses the FL notation – a notation somewhat similar to Turtle but more expressive and – to show subtypeOf relations between main types of CSTR. This figure shows how many of the first-order classes are instances of some second-order classes that are similarly organized and somewhat similarly named. E.g., to state that a class (class expression or named class) is the condition of a subclassOf-based constraint one can mark it as subtype of cstr:SubclassOf-based_prescriptive_constraint_condition or instance of cstr:Type_of_subclassOf-based_prescriptive_constraint_condition. The first case implies the second one but it also specifies that, if the marked class has subclasses, each definition of these subclasses is also a condition for a constraint. The other second-order classes have the same purpose.

In its constraint examples, this article only uses 1st-order classes. Indeed, most inference engines for description logics cannot handle a KB that defines 2nd-order types, i.e. a KB that exploits 2nd-order types not predefined in those description logics. As explained in the second paragraph of Section 2.3.1, the two other proposed representation approaches use other relations to classes and hence cannot be handled by these above cited inference engines.

Because of this limitation of these engines, to use them for checking constraints on class definitions, it is necessary to pre-treat the KB by removing the individuals. In such a case, 1st order classes become individuals and their subclass relations must be removed too. Section 3.1.1 shows the basis of how to do that in SPARQL. Conversely, if such engines must be used to check constraints on individuals the 2nd order types must first be removed.

Figure 1. Concise-UML representation of subclassOf relations between some classes in the CSTR ontology
                    cstr:Constraint              cstr:Constraint_condition
                       /\      /\                         /\
                       |      |                       |
cstr:Descriptive_constraint   |      cstr:SubclassOf-based_constraint_condition
         cstr:Prescriptive_constraint                 /\         /\
                                                      |         |
     cstr:SubclassOf-based_descriptive_constraint_condition     |
                   cstr:SubclassOf-based_prescriptive_constraint_condition

Legend. The class names with no prefix are from the CSTR ontology.
Each arrow “->” represents a subClassOf relation.
For concision purposes, boxes around classes are not drawn.

Figure 2. FL representation of subtypeOf relation between main types in the CSTR ontology
cstr#binary_relation  rdf#type: rdf:Property,
 > exclusion    //cstr#binary_relation has the following 5 exclusive subtypes
   { cstr#id    //  (this exclusion set is not complete: this is not a partition)
     cstr:descriptive_constraint_conclusion    cstr:condition_class
     cstr:prescriptive_constraint_conclusion   cstr:conclusion_class
   };

cstr#Constraint  rdf#type: cstr#Type_of_constraint,
 > cstr#Descriptive_constraint  cstr#Prescriptive_constraint;

cstr#Constraint_condition
 > (cstr:SubclassOf-based_constraint_condition
      rdf#type: cstr:Type_of_subclassOf-based_constraint_condition,
      > (cstr:SubclassOf-based_descriptive_constraint_condition
           rdf#type: cstr:Type_of_subclassOf-based_descriptive_constraint_condition)
        (cstr:SubclassOf-based_prescriptive_constraint_condition
           rdf#type: cstr:Type_of_subclassOf-based_prescriptive_constraint_condition)
   );

owl#Class
  > exclusion  //owl#Class has the following 3 exclusive subclasses
    { cstr#Type_of_constraint 
      (cstr#Type_of_constraint_condition
         > (cstr:Type_of_constraint_condition_that_is_a_type
              > (cstr:Type_of_subclassOf-based_constraint_condition
                   > cstr:Type_of_subclassOf-based_descriptive_constraint_condition
                     cstr:Type_of_subclassOf-based_prescriptive_constraint_condition
                ) ) )
    };

Legend.
“>” means “subtype”.
Namespace prefixes end by “#”, not “:”.
Thus, the syntactic form “X r: Y” can be read “X has for r Y”
(although FL syntactic sugar such as “exclusion”
 slightly changes this reading: see the comments prefixed by “//”).


2.3.3. Examples

In this article, except in Figure 1 and Figure 2, the Turtle notation is used when SPARQL is not used since SPARQL reuses this notation for representing relations. For clarity purposes, the names of relation types have a lowercase initial while other names have an uppercase initial. In SPARQL, Turtle or other graph-based notations, a statement of the form “ SourceConcept relationName1 DestConcept1ForRelName1, DestConcept2ForRelName1; relationName2 DestConcept1ForRelName2, DestConcept2ForRelName2 . ” can for example be read “SourceConcept has for relationName1 DestConcept1ForRelName1 as well as DestConcept2ForRelName1, and has for relationName2 DestConcept1ForRelName2, and has for relationName2 DestConcept2ForRelName2”.

2.3.3.1. Individual-based Constraints

Here is the positive prescriptive constraint “if x is a Person, x must have a parent in the base of facts (prescriptive-must)” represented as an individual-based constraint using the CSTR ontology.

  [] rdf:type cstr:Prescriptive_constraint;
     cstr:condition_class :Person;
     cstr:conclusion_class
       [rdf:type owl:Class;
        owl:equivalentClass [rdf:type owl:Restriction;  owl:onProperty :parent;
                             owl:someValuesFrom :Person] ].

For the general descriptive version of this constraint, it is sufficient to replace “prescriptive” by “descriptive” in the previous representation. Here is a version where parents are restricted to be named individuals.

  [] rdf:type cstr:Descriptive_constraint;
     cstr:condition_class :Person;
     cstr:conclusion_class
       [rdf:type owl:Class;
        owl:equivalentClass [rdf:type owl:Restriction;  owl:onProperty :parent;
                             owl:someValuesFrom :Named_person] ].
  #with:
  :Named_person owl:equivalentClass
     [rdf:type owl:Class;
      owl:intersectionOf ( :Person
                           [rdf:type owl:Restriction;  owl:onProperty cstr:id;
                            owl:someValuesFrom xsd:string]
     )].

The negative constraint “if x is a Person_without_parent, x must not have a parent” may be translated into the inconsistency-implying form “x is a Person_without_parent =>> false”. Here is its representation as an individual-based constraint using the CSTR ontology.

  [] rdf:type cstr:Descriptive_constraint;  #this rdf:type relation is optional
     cstr:condition_class                   #  for a negative constraint
        [rdf:type owl:Class; 
         owl:equivalentClass [rdf:type owl:Restriction;  owl:onProperty :parent;
                              owl:maxCardinality  "0"^^xsd:nonNegativeInteger] ];
     cstr:conclusion_class owl:Nothing.

As referred to in Section 2.2.2 (about “clones without types”), here are examples of how to write inconsistency-implying constraints that are equivalent to the use of rdfs:domain or rdfs:range relations except that they are usable only for checking purposes. These examples respectively represent that “if a parent relation has something that is Not_a_person as source (alias, domain), this relation is incorrect” and “if a parent relation has something that is Not_a_person as destination (alias, range), this relation is incorrect”.

  [] cstr:condition_class    #constraint-based version of an rdfs:domain relation:
       [rdf:type owl:Class;
        owl:intersectionOf ( :Not_a_person
                             [rdf:type owl:Restriction;  owl:onProperty :parent;
                              owl:someValuesFrom owl:Thing ] ) ];
     cstr:conclusion_class owl:Nothing.
  [] cstr:condition_class   #constraint-based version of an rdfs:range relation:
       [rdf:type owl:Restriction;  owl:onProperty :parent;
                                   owl:someValuesFrom :Not_a_person];
     cstr:conclusion_class owl:Nothing.
  #with:
  [] rdf:type owl:AllDisjointClasses;  owl:members (:Person :Not_a_person).

2.3.3.2. SubclassOf-analogous Constraints

For the positive prescriptive constraint “if x is a Person, x must have a parent in the base of facts (prescriptive-must)”, using the CSTR ontology:

  :Person  #class and constraint condition
     cstr:prescriptive_constraint_conclusion
        [rdf:type owl:Restriction;  owl:onProperty :parent;
         owl:someValuesFrom :Person].

For the general descriptive version of this constraint, it is sufficient to replace “prescriptive” by “descriptive” in the previous representation. Here is a version where parents are restricted to be named individuals.

  :Person  #class and constraint condition
     cstr:descriptive_constraint_conclusion
       [rdf:type owl:Restriction;  owl:onProperty :parent;
        owl:someValuesFrom :Named_person]. #as previously defined

Here is the negative constraint “x is a Person_without_parent =>> false” represented as a subclassOf-analogous constraint using the CSTR ontology.

  :Person_without_a_parent  #class and constraint condition 
     owl:equivalentClass  #definition and constraint condition
        [rdf:type owl:Restriction;  owl:onProperty :parent;
         owl:maxCardinality  "0"^^xsd:nonNegativeInteger];
     cstr:prescriptive_constraint_conclusion  owl:Nothing.

Here are the subclassOf-analogous version for the constraints “a parent relation has something that is Not_a_person as source (alias, domain) =>> false” and “a parent relation has something that is Not_a_person as destination (alias, range) =>> false”.

  :Not-a-person_with_a_parent   #constraint-based version of an rdfs:domain relation:
     owl:equivalentClass  #constraint-based version of an rdfs:domain relation:
        [rdf:type owl:Class; 
         owl:intersectionOf ( :Not_a_person
                              [rdf:type owl:Restriction;  owl:onProperty :parent;
                               owl:someValuesFrom owl:Thing ] ) ];
     cstr:conclusion_class owl:Nothing.
  :Thing_having_for_parent_a_Not-a-parent  #constraint for an rdfs:range relation:
     owl:equivalentClass 
        [rdf:type owl:Restriction;  owl:onProperty :parent;
                                    owl:someValuesFrom :Not_a_person ];
     cstr:prescriptive_constraint_conclusion  owl:Nothing.

2.3.3.3. SubclassOf-based Constraints

For the positive prescriptive constraint “if x is a Person, x must have a parent in the base of facts (prescriptive-must)”, using the CSTR ontology:

  :Person  #class and constraint condition
     rdfs:subClassOf cstr:SubclassOf-based_prescriptive_constraint_condition;
     rdfs:subClassOf  #conclusion (types of relations that must be present):
        [rdf:type owl:Restriction;  owl:onProperty :parent;
         owl:someValuesFrom :Person].

For the general descriptive version of this constraint, it is sufficient to replace “prescriptive” by “descriptive” in the previous representation. However, as explained in Section 2.3.1, such descriptive versions require using an inference engine that takes into account the special meaning of cstr:SubclassOf-based_descriptive_constraint_condition. Here is a particular case where parents are restricted to be named individuals.

  :Person  #class and constraint condition
     rdfs:subClassOf cstr:SubclassOf-based_descriptive_constraint_condition;
     rdfs:subClassOf  #conclusion (types of relations that must be present):
       [rdf:type owl:Restriction;  owl:onProperty :parent;
        owl:someValuesFrom :Named_person]. #as previously defined

Using the CSTR ontology, the negative constraints of the previous subsection are identical to their subclassOf-analogous versions except that cstr:prescriptive_constraint_conclusion is replaced by rdfs:subClassOf.




3. Exploiting Relations-between-classes Based Constraints in SPARQL and Javascript

3.1. SPARQL Commands for Checking Constraints

In some extensions of SPARQL, e.g. LDScript [Corby, Faron-Zucker & Gandon, 2017], the next commands (queries or update requests) can be grouped into scripts or functions. The commands of Section 3.1.1 and Section 3.1.2 are KB pre-treatments and hence update requests. The commands of the remaining sections are queries. These queries are organized by kind and size. At first, the difficulty to understand them is directly related to their size. In SPARQL, variable names begin by “?”.

3.1.1. Examples of KB Pre-treatments

3.1.1.1. Temporarily Removing Individuals For Checking Constraints on Types
          With Classic Description Logic Inference Engines

The next command shows a reversible way to remove individuals – as well as subclassOf relations between their types – to allow classic Description Logic inference engines to handle user-defined relations between types, as explained in the paragraph before Figure 1 in Section 2.3. The WHERE clause of this command selects each object ?o that i) has a type which has a superclass, and ii) does not have for type rdfs:Class (and hence is an individual). The DELETE clause removes the rdf:type relations from ?o and, from the destinations of these relations, removes the rdfs:subClassOf relations when they exist. As replacements for these rdfs:subClassOf relations, the INSERT clause sets cstr:type and cstr:subClassOf relations. These substitute relations later permit the re-assertion of the initial rdf:type and rdfs:subClassOf relations, when needed, via a similar command.

DELETE { ?o rdf:type  ?t .  ?t rdfs:subClassOf ?superClass } 
 INSERT{ ?o cstr:type ?t .  ?t cstr:subClassOf ?superClass }
 WHERE { ?o rdf:type  ?t .  ?t rdfs:subClassOf ?superClass 
         FILTER NOT EXISTS { ?o rdf:type rdfs:Class } 
       } 

3.1.1.2. Creating “Identifier Relations” for Named Individuals

From each selected named individual having a type which has a superclass, the next command adds a cstr:id relation with destination the identifier of that individual. Thus, as explained in Section 2.1.3, queries checking descriptive constraints also work on those that include cstr:id relations, i.e. that have restrictions to named individuals. Here, only an individual that has a type with a superclass and that has other relations is selected since in practice only such an individual might violate a constraint. To search for individuals, just looking for each object that is not a relation and that does not have rdfs:Class as type would be an incomplete strategy and many conditions would have to be added for filtering out objects such as i) classes defined via an equivalence to a restriction, and ii) owl:Thing and some other types from OWL, RDFS or XSD.

INSERT { ?o cstr:id ?id } WHERE
{ ?o rdf:type ?t.  FILTER NOT EXISTS { ?o rdf:type rdfs:Class }  #?o is a typed individual
              ?t rdfs:subClassOf ?superClass.  FILTER isIRI(?o)  #?o is named with an IRI
  FILTER NOT EXISTS { ?o cstr:id ?id } #?o must not already have a cstr:id relation
  {?o ?r ?o2 FILTER(?r!=rdf:type)} UNION {?o1 ?r ?o} #?o has relations other than rdf:type
  BIND( str(?o) as ?id ) #?id is now the IRI identifying ?o
}

3.1.1.3. Creating “Clones Without Types” of Objects For Exploiting These Objects Without Inheritance Mechanism

Section 2.2.2 introduced a method for handling positive prescriptive constraints, i.e. for bypassing or avoiding the relation lookup mechanism above abbreviated as “inheritance”. The next command implements the KB pre-treatment supporting the “clones without types” based method when, as is the case with SPARQL, i) a particular entailment regime cannot be changed within a query, and ii) clones cannot be temporarily created within a query. For the sake of clarity, this command assumes that the KB does not include user-defined 2nd-order types. For every object ?o in the KB, if this object is an individual, this command creates ?o2, a partial copy of ?o that has the same relations except for rdf:type relations. This partial copy has for identifier the one of ?o but with the suffix “_cloneWithoutType”. This command also relates ?o to ?o2 by a relation of type cstr:cloneWithoutType.

INSERT {?o cstr:cloneWithoutType ?o2 . ?o2 ?r ?dest . ?o2 cstr:cloneWithoutTypeOf ?o} WHERE
{ ?o rdf:type ?t.  FILTER NOT EXISTS { ?o rdf:type rdfs:Class }  #?o is a typed individual
              ?t rdfs:subClassOf ?superClass.
  FILTER NOT EXISTS { ?o  cstr:cloneWithoutType ?c1 } #?o must not be a clone
  FILTER NOT EXISTS { ?c2 cstr:cloneWithoutType ?o }  #?o must not have a clone
  { #Case 1: cloning each individual having at least 1 relation different from
    #        rdf:type and owl:sameAs; "?o2 ?r ?dest" inserted
    ?o ?r ?dest. FILTER(?r!=rdf:type) FILTER(?r!=owl:sameAs)
  }
  UNION #Case 2: cloning each individual not having at least 1 relation different
        #        from rdf:type and owl:sameAs;  "?o2 ?r ?dest" not inserted
  { ?o ?r1 ?dest  #?o has at least one relation from it 
    FILTER NOT EXISTS {?o ?r ?dest2. FILTER(?r!=rdf:type) FILTER(?r!=owl:sameAs) }
  }
  BIND( uri(concat(str(?o),"_cloneWithoutType")) as ?o2 )
}


3.1.2. Checking Individual-based Constraints

3.1.2.1. Checking Individual-based Positive Descriptive Constraints

The next query lists every object violating at least one of the individual-based positive descriptive constraints – including those restricted to named individuals if cstr:id relations have been added to named individuals (Section 3.1.1.2). As shown by the code, such an object satisfies two conditions. First, this object matches – and hence has for type – the condition of a constraint ?posConstr that is of type cstr:Descriptive_constraint. Second, this object does not match – and hence has not for type – the conclusion of the constraint. Thus, this query requires a SPARQL engine with an entailment regime enabling the matching (alias, categorization) of an individual to a class expression and thence the deduction of an rdf:type relation between them. In the code of the commands in this Section 3, such deduced rdf:type relations are highlighted in bold. For inferencing completeness purposes, such deductions require an entailment regime able to handle the expressiveness used in the constraints and the rest of the KB.

SELECT ?objectNotMatchingPosConstr ?posConstr_condition ?posConstr_conclusion WHERE
{ ?posConstr rdf:type cstr:Descriptive_constraint;
             cstr:condition_class  ?posConstr_condition;
             cstr:conclusion_class ?posConstr_conclusion. 
  FILTER (?posConstr_conclusion != owl:Nothing)
  ?objectNotMatchingPosConstr rdf:type ?posConstr_condition. #matches condition
  FILTER NOT EXISTS  #objects satisfying the conclusion must NOT be listed
  { ?objectNotMatchingPosConstr rdf:type ?posConstr_conclusion }
}

3.1.2.2. Checking Individual-based Prescriptive Constraints

The next query uses the “clones without types” based method for handling positive prescriptive constraints (Section 2.2.2). This query assumes that these clones have been statically created via a KB pre-treatment as seen in Section 3.1.1.3. In the rest of this Section 3, when a command has some code that has not been used in a previous command, this code is in italics.

SELECT ?objectNotMatchingPosConstr ?posConstr_condition ?posConstr_conclusion WHERE
{ ?posConstr rdf:type cstr:Prescriptive_constraint;
             cstr:condition_class  ?posConstr_condition;
             cstr:conclusion_class ?posConstr_conclusion.
  FILTER (?posConstr_conclusion != owl:Nothing)
  ?objectNotMatchingPosConstr rdf:type ?posConstr_condition. #matches condition
  ?objectNotMatchingPosConstr cstr:cloneWithoutType ?cloneWithoutType 
  FILTER NOT EXISTS  #objects with clones satisfying the conclusion must not be listed
  { ?cloneWithoutType rdf:type ?posConstr_conclusion }
}

Here is the same query in LDScript, with an embedded query that temporarily creates the above cited partial copies “on the fly”, thus removing the necessity to modify the KB.

SELECT ?objectNotMatchingPosConstr ?posConstr_condition ?posConstr_conclusion WHERE
{ ?posConstr rdf:type cstr:Prescriptive_constraint;
             cstr:condition_class  ?posConstr_condition;
             cstr:conclusion_class ?posConstr_conclusion.
  FILTER (?posConstr_conclusion != owl:Nothing)
  ?objectNotMatchingPosConstr rdf:type ?posConstr_condition. #matches condition
  #the two functions called in the next two lines are defined below
  BIND( cstr:getCloneWithoutType(?objectNotMatchingPosConstr) as ?cloneWithoutType )
  BIND( cstr:copyOfKbIntoTemporaryGraphWithTheAdditionOfTheCloneWithoutType
            (?objectNotMatchingPosConstr,?cloneWithoutType) as ?g )
  FILTER NOT EXISTS { GRAPH ?g { ?cloneWithoutType rdf:type ?posConstr_conclusion } }
}
FUNCTION cstr:getCloneWithoutType (?object)
{ uri(concat(str(?object),"_cloneWithoutType")) 
}
FUNCTION cstr:copyOfKbIntoTemporaryGraphWithTheAdditionOfTheCloneWithoutType
             (?objectNotMatchingPosConstr, ?cloneWithoutType)
{ LET (?g = CONSTRUCT { ?cloneWithoutType ?r ?dest .  ?x ?r2 ?y } WHERE
            { VALUES ?cloneWithoutType { UNDEF }
              ?objectNotMatchingPosConstr ?r ?dest.  FILTER (?r != rdf:type)
              ?x ?r2 ?y.  FILTER (?x != ?objectNotMatchingPosConstr)
            }) 
  { xt:entailment(?g) }  #triggers inferences on ?g
}

The same “on the fly” cloning technique can be used for adding cstr:id relations to named individuals. Thus, this technique also permits the checking of constraints restricted to named individuals without having to modify the KB (Section 3.1.1.2).

3.1.2.3. Checking Individual-based Inconsistency-implying Constraints

The next query lists every object violating an individual-based inconsistency-implying constraint.

SELECT ?objectMatchingNegConstr ?negConstr_condition WHERE
{ ?negConstr cstr:condition_class  ?negConstr_condition;
             cstr:conclusion_class owl:Nothing. 
  ?objectMatchingNegConstr rdf:type ?negConstr_condition.
}


3.1.3. Checking SubclassOf-analogous Constraints

The usable content-independent queries here are identical to their counterparts in Section 3.1.2 except for the initialization of ?posConstr_condition and ?posConstr_conclusion since now they are related by a cstr:descriptive_constraint_conclusion relation or a cstr:prescriptive_constraint_conclusion relation. E.g., here is a query for checking subclassof-analogous positive descriptive constraints. See the line in italics for the new initialization.

SELECT ?objectNotMatchingPosConstr ?posConstr_condition ?posConstr_conclusion WHERE
{ ?posConstr_condition cstr:descriptive_constraint_conclusion ?posConstr_conclusion.
  FILTER (?posConstr_conclusion != owl:Nothing)
  ?objectNotMatchingPosConstr rdf:type ?posConstr_condition. #matches condition
  FILTER NOT EXISTS  #objects satisfying the conclusion must NOT be listed
  { ?objectNotMatchingPosConstr rdf:type ?posConstr_conclusion }
}

There are other ways to write the queries. For example:

  • FILTER (?posConstr_conclusion != owl:Nothing)” could be replaced by “FILTER NOT EXISTS { ?posConstr_conclusion cstr:prescriptiveConclusion owl:Nothing }”. The first way has the advantage of not being dependent on the chosen representation for constraint and hence this way minimizes the difference between the queries. On the other hand, with this way, owl:Nothing cannot be replaced by equivalent class expressions (in SPARQL).
  • In Section 3.1.2.2., the line “?objectNotMatchingPosConstr cstr:cloneWithoutType ?cloneWithoutType” before “FILTER NOT EXISTS” could be replaced by the line “BIND( uri( concat( str(?objectNotMatchingPosConstr), "_cloneWithoutType" ) ) as ?cloneWithoutType )” within the “FILTER NOT EXISTS” block.


3.1.4. Checking SubclassOf-based Constraints

The previous queries do not rely on inference engines to take into account the special meaning of CSTR classes. Hence, as explained in Section 2.3.1, these queries cannot be adapted for checking subclassOf-based constraints representing positive descriptive constraints. For prescriptive constraints, the queries are the same as their counterparts in Section 3.1.2.2 except for the initialization of ?posConstr_condition and ?posConstr_conclusion. E.g., for a positive prescriptive constraint, this initialization now is:

  ?posConstr_condition rdfs:subClassOf cstr:SubclassOf-based_prescriptive_constraint_condition,
                                       ?posConstr_conclusion.

The next query lists every object violating an individual-based inconsistency-implying constraint. Such an object matches – and hence has for type – a class ?negConstr_condition which i) has owl:Nothing as conclusion, and optionally ii) is subclass of the type cstr:SubclassOf-based_constraint_condition. If this last type is not used in the negative constraints of the KB, this second condition must actually not be used in the query.

SELECT ?objectMatchingNegConstr ?negConstr_condition WHERE
{ ?negConstr_condition rdfs:subClassOf owl:Nothing;
                       rdfs:subClassOf cstr:SubclassOf-based_constraint_condition.  #optional line
  ?objectMatchingNegConstr rdf:type ?negConstr_condition.
}

Except as a module for calculating the completeness degree of a KB, this last query is useless if, when building the KB, its consistency is already checked by an inference engine that delivers an error message when detecting that an object is instance of a subclass of owl:Nothing. By default, some Description Logic inference engines such as Corese [Corby, Faron-Zucker & Gandon, 2017] do not deliver error messages or warning messages when detecting such objects. Having to make inferences on instances of a subclass of owl:Nothing also makes Corese behaves abnormally. E.g., when an individual violates a subclassOf-based inconsistency-implying constraint – and hence is instance of owl:Nothing – as well as a positive constraint, this object does not appear in the results of the above described queries searching objects violating positive constraints. Thus, as noted in Section 2.3.1, using subclassOf-based constraints is not relevant for general-purpose knowledge sharing.


3.1.5. Checking Binary Relations Instead of Individuals

To list binary relations violating constraints – instead of individuals that have some relations violating constraints – it is sufficient to replace rdf:type by the “logical implication relation between statements” in the previous content-independent queries that check positive constraints. For referring to such relations, Tim Berners-Lee uses the type name log:implies [Berners-lee et al., 2008] in his Notation3 KRL. However, for this replacement to work, the used SPARQL engine must exploit an inference engine that can deduce the existence of such a relation when it exists between the matched statements. Description Logic inference engines generally do not do so.

Like queries on individuals, queries on relations can use additional filters. E.g., for the last query (Section 3.1.4) to operate only on negative facts, one may add at the end of its body:

?objectMatchingNegConstr rdf:type owl:NegativePropertyAssertion.


3.1.6. Evaluating Completenesses of a KB

A simple way to define or calculate a completeness degree for a KB is to divide “the number of relations (in the KB) that do not violate prescriptive constraints” by “the total number of relations”. Another completeness degree may be obtained by dividing “the number of individuals that do not violate prescriptive constraints” by “the total number of individuals”. The next query implements a variant of this last definition: instead of individuals, this query exploits “objects that are source of at least one relation to another object”. Furthermore, this query assumes that the constraints are represented as individual-based constraints. This query can be adapted to implement the above first definition via the method given in Section 3.1.5. Similarly, descriptive constraints could also be taken into account.

SELECT ( ((?nbObjs - ?nbAgainstPosCs - ?nbMatchingNegCs) / ?nbObjs)
         AS ?completeness )
{ {SELECT (COUNT(DISTINCT ?o) AS ?nbObjs)
    WHERE { ?o ?r ?o2 } } #any object source of a relation to another object
        # For considering only objects that have a type:
        # { {?o rdf:type ?t1} UNION {?o cstr:type ?t2} }
  }
  {SELECT(COUNT(DISTINCT ?objectNotMatchingPosConstr) AS ?nbAgainstPosCs)
    WHERE { ... #the body of a query checking an individual-based positive
                #  prescriptive constraint (see Section 3.1.2.2) must be copied here

            #if ?objectNotMatchingPosConstr also violates a negative constraint
            #  it must not be counted here (otherwise it would be counted twice),
            FILTER NOT EXISTS            # hence this code here
            { ?negConstr cstr:condition_class ?negConstr_condition;
                         cstr:conclusion_class owl:Nothing.
              ?objectNotMatchingPosConstr rdf:type ?negConstr_condition
            }
          }
  }
  {SELECT (COUNT(DISTINCT ?objectMatchingNegConstr) AS ?nbMatchingNegCs)
    WHERE { ?negConstr cstr:condition_class ?negConstr_condition ;
                       cstr:conclusion_class owl:Nothing.
            ?objectMatchingNegConstr rdf:type ?negConstr_condition
          }
  }
}



3.2. JavaScript Function for Ontology Completeness Evaluation

Like the previous SPARQL commands, the next function provides a way to list each constraint violation – and/or calculate the completeness of the ontology if the constraints are prescriptive – but there are a few differences. First, the completeness is now given with respect to each constraint in a particular set, not to all of them at once, and for the instances of a particular set of classes, not all individuals. Second, error handling is included. Third, the method specializes is called to perform the matching between relations associated to an object and relations in a constraint condition or conclusion, i.e., to test if each relation in the second set of relations has a specialization in the first set (details on such a specialization test can be found in [Chein & Mugnier, 2008] and [Swan, 2016]). This method can ignore the instanceOf relations from the handled object, and it needs to do so if the checked constraint is a prescriptive one, i.e., if its parameter is instance of cstr:Prescriptive_constraint. Hence, the code below needs not include a temporary removal of instanceOf relations from the objects.

JavaScript is used instead of pseudo-code since its syntactic sugar is rather well-known or intuitive and is object-oriented. This previous feature is required here to have a code that is short, clear and generic with respect to error handling. The objects in the ontology are assumed to be accessible in the object-oriented way used in the code.

The complexity of the next function is the complexity of the object matching multiplied by the number of given constraints, multiplied by the number of given classes, multiplied by the average number of instances by class.

function checkThatInstancesOfTheseClassesSatisFyTheseConstraints
         (setOfClasses,     //e.g.: {Man, Woman}
          setOfConstraints, //e.g.: {"if x is a Person, x has a parent"}
          errorHandling) //contains a class for generic error handling
{ if (setOfConstraints.length() == 0)
    return errorHandling.returnCodeIfNoPropertyToCheck;
  if (setOfClasses.length() == 0)
    return errorHandling.returnCodeIfNoClassToCheck;
  for co in setOfConstraints  //co may or may not be inconsistency-implying
  { var numberOfObjsSatisfyingCo= 0, numberOfObjsNotSatisfyingCo= 0;
    for cl in setOfClasses
      for obj in cl.instances()
        if (obj.specializes(co.condition))
          if ((co.isPositiveConstr() && obj.specializes(co.conclusion))
            numberOfObjsSatisfyingCo++;
          else { numberOfObjsNotSatisfyingCo++;
                 if (errorHandling.alertAtEachError) 
                   alert("Relations from '" + obj.toString() + 
                         "' do not satisfy:\n  " + co.toString());   
                  //e.g.: Relations from 'Tom' do not satisfy:
                } //        if x is a Person, x has a parent
    var completenessForThatConstraint= numberOfObjsSatisfyingCo /
               (numberOfObjsSatisfyingCo + numberOfObjsNotSatisfyingCo);
    if (completenessForThatConstraint < 1)
    { if (errorHandling.alertAtEachIncompleteness)
        alert("Only " + completenessForThatConstraint + "% of instances"
              + " of the given classes satisfy the constraint:\n  " + 
              + co.toString());
      if (errorHandling.returnCodeIfIncompletenessForSomeConstraint)
        return errorHandling.returnCodeIfIncompletenessForSomeConstraint;
    }
  }
  return errorHandling.returnCodeIfNoError;
}




4. Illustration of Applications (or Use Cases) and Limits of the Approach by Showing How To Represent Some Important Ontology Design Patterns Via Relations-between-classes based Constraints

Ontology design patterns or best practices (ODPs), e.g. those recommended by the W3C or those of the “ODP catalog”, are i) informal descriptions about how certain things should be represented, and/or ii) collections of types that should be reused whenever possible, or iii) lexical or syntactic rules to follow when importing or exporting formal or informal knowledge. As shown in Section 2, descriptive or prescriptive constraints are ways to representmust be reused whenever possible” and hence ways to formalize and implement ODPs related to the second point. The next subsections show how this can be done with the proposed approach. To seek the limits of this approach and to be as generic as possible, the used example ODPs are not tied to a particular ontology. If not directly re-used for a particular ontology, these examples can be used as templates for writing simpler constraints for particular ontologies.

4.1. Examples of Useful General Descriptive Constraints

4.1.1. Checking or Evaluating the Design of a Subtype Hierarchy

4.1.1.1. Approach and Rationale

Several research works in knowledge acquisition, model-driven engineering or ontology engineering, e.g. [Marino, Rechenmann & Uvietta, 1990] [Bachimont, Isaac & Troncy, 2002] [Dromey, 2006] [Rector et al., 2012], have advocated the use of tree structures when designing a subtype hierarchy, hence the use of i) single inheritance only, and ii) multiple views or viewpoints, one for each tree. Indeed, they argue that objects have a unique place in such trees and hence that such trees can be used as decision trees or ways to avoid redundancies, normalize KRs and ease KR search/handling. This is true but, for the same reasons and as further detailled in the next paragraph, the same advantages can be obtained by subtyping solely via either complete sets of exclusive (direct) subtypes – i.e. subtype partitions, e.g. via the use of disjoint unions of types – or incomplete sets of exclusive subtypes. Furthermore, this last method has the additional advantages of i) still allowing multi-inheritance, ii) not requiring a special inference engine to handle views (e.g. [Marino, Rechenmann & Uvietta, 1990] [Djakhdjakha, Hemam & Boufaïda, 2014]), iii) being less cumbersome for knowledge providers since with the last method they do not have to specify or use views each time they create subtypes, and iv) when this can be done, specifying that a set of (direct) subtypes is complete supports interesting inferences for checking or reasoning purposes. The fact that OWL-EL, OWL-RL or OWL-QL do not include disjoint unions of types is not a reason for not using them in ontologies designed for knowledge sharing purposes: this fact only means that, before using such ontologies with inference engines dedicated to OWL-EL, OWL-RL or OWL-QL, some expressiveness has to be lost, e.g. by automatically converting each combination of disjoint union and class equivalence into a combination of class disjointness and subtyping.

Indeed, an aside contribution of this article is to note that in order to keep the above cited advantages of the tree structures, it is sufficient (and necessary) that whenever two types are exclusive – i.e., whenever they must not share common subtypes or instances – the exclusion relation that links these two types must be specified, explicitly or implicitly. With tree structures, such exclusion relations are implicit but still specified. By systematically using sets of exclusive subtypes, all exclusion relations can be easily and explicitly specified, e.g. via the use of binary partitions such as “Spatial_entity / Non-spatial_entity”. Exclusion relations are a simple and efficient way to support ontology checking and knowledge retrieval. For example, when searching for particular “achievements” in a KB, people may not know which exact type to use and then which query to use: whether a query for specializing objects, for generalizing ones, for doing both with a maximal number of specializing/generalizing relations, etc. In this case, it may be easier for them to search for instances of all types not exclusive with some type they have in mind. This query may well collect all and only the objects they are interested in, if the ontology is systematically structured using sets of exclusive subtypes.

However, there are some kinds of types for which using only subtype partitions – or sometimes even just sets of exclusive subtypes – is cumbersome. An example for these kinds is the one of non-natural types [Guizzardi, 2005] [Mizoguchi, Kozaki & Kitamura, 2012] such as role types and phase types. Examples of general role types are Agent, Experiencer, Recipient, Cause and Instrument. Examples of more specific ones are Driver, Employee and King. Examples of phase types are Beginning and Child. Examples of role type partitions are “Cause / Non-cause” and “King / Non-king”. Systematically specifying such partitions is often cumbersome and useless since role types such as Non-cause and Non-king will probably not be instantiated. Always using sets of exclusive subtypes for non-natural types is also sometimes cumbersome, e.g. when only one subtype is known or relevant to represent. The situation is similar for some other kinds of types – such as those for attributes, information objects or relations – for which the distinction between “natural type” and “non-natural type” is hard to make or not relevant.

Thus, even though systematically using subtype partitions or sets of exclusive subtypes is an interesting ontology design pattern (ODP), for practical purposes this ODP should be generalized to include options allowing ontology designers not to use exclusive subtypes in certain circumstances, as long as such choices are explicitly represented. To that end, first considering only subclasses (not all subtypes), the use of certain subtypes of sub:subclass (inverse of rdfs:subClassOf) seems the easiest way compatible with OWL2 and the proposed approach. This is why this article introduces the relation types sub:nonNaturalSubclass and sub:loneSubclass. The prefix “sub:” is an abbreviation for the namespace http://www.webkb.org/kb/it/SUB. The next sub-section concludes with the representation of a constraint that enforces the above cited generalized ODP (restricted to subclasses). This ODP, which from now on is called the “Subclass ODP”, works by checking that each used subclassOf relation has for type a certain subtype of rdfs:subClassOf. The relation types sub:nonNaturalSubclass and sub:loneSubclass can be used for (exceptionally) not using exclusive subclasses. The first type also implies that the source of the relation is a non-natural class.

When – and only when – an ontology using such subtypes of rdfs:subClassOf has to be exploited by an inference engine that cannot handle such subtypes, a pre-treatment that expands the definition of these subtypes – and thus removes these subtypes – must first be applied on the ontology.

The next subsection is only about the Subclass ODP. This article does not propose a similar constraint for a “Subproperty ODP" to support the checking of relation subtypes. Indeed, OWL2 provides the rdfs:subPropertyOf and owl:propertyDisjointWith but does not allow the use of owl:disjointUnionOf or owl:unionOf on properties.


4.1.1.2. Implementation of the Subclass ODP with OWL2

Defining sub:subclassInExclusionSetOrAlone, the supertype of the subtyping relation types allowed by the Subclass ODP. For the reasons given in Section 2.3.3, the Turtle notation is used.

sub:subclass    owl:inverseOf owl:subClassOf.
sub:subProperty owl:inverseOf rdfs:subPropertyOf.
sub:subclassInExclusionSetOrAlone 
  rdfs:subPropertyOf sub:subclass; 
  sub:subProperty sub:subclassInDisjointUnion  #for a subclass in a subtype partition
                  sub:subclassInSetOfExclusiveSubclasses
                  sub:loneSubclass #for a subclass not in a set of exclusive subtypes
                  sub:nonNaturalSubclass.  #idem but for a non natural type

If the last three lines are removed from this definition, the constraint for the Subclass ODP will check that all subclasses are defined only via subtype partitions. In other words, the last three lines provides three more subtyping options.

Defining sub:nonNaturalSubclass. With the following full definition of this relation type, it does not have to be used explicitly for sub:subclassInExclusionSetOrAlone relations to be inferred: rdfs:subClassOf relation can also be used instead as long as all non-natural classes are categorized as subtypes of sub:NonNaturalClass.

sub:nonNaturalSubclass rdfs:subPropertyOf sub:subclassInExclusionSetOrAlone;
                       rdfs:domain owl:Class;   rdfs:range sub:NonNaturalClass.

Defining sub:subclassInDisjointUnion. If the following full definition of this relation type sub:subclassInDisjointUnion was correct, it would not have to be used explicitly for sub:subclassInExclusionSetOrAlone relations to be inferred: owl:disjointUnionOf relations or equivalent representations could be used instead.

sub:subclassInDisjointUnion
  owl:propertyChainAxiom (owl:equivalentClass  owl:disjointUnionOf  rdfs:member).

The problem with this definition is that an OWL list of disjoint classes is not source of rdfs:member relations to these classes: these classes are supposed to be accessed sequentially from the first (via an rdf:first relation) to the last (via a sequence of rdf:rest relations followed by a rdf:first relation). Since OWL2 does not allow regular expressions or recursive definitions, it does not allow a full definition of membership based on rdf:rest and rdf:first relations. Thus, this article proposes the following three solutions to support the checking of sub:subclassInExclusionSetOrAlone relations and equivalent representations, via the proposed content-independent queries.

  • One solution is to define membership in a way that is partial but sufficient for correctly handling the content of the KB, as illustrated here:

    sub:listMember
      sub:subclass sub:listMember1, sub:listMember2, sub:listMember3. #and so on if needed
    
    sub:listMember1 owl:equivalentProperty rdf:first.
    sub:listMember2 owl:propertyChainAxiom (rdf:rest rdf:first).
    sub:listMember3 owl:propertyChainAxiom (rdf:rest rdf:rest rdf:first). #and so on if needed
    
    Then, the following definition of sub:subclassInDisjointUnion is sufficient. However, these definitions are not within OWL2-RL, OWL2-EL and OWL2-QL since owl:disjointUnionOf is not within them.
    sub:subclassInDisjointUnion
      owl:propertyChainAxiom (owl:equivalentClass  owl:disjointUnionOf  sub:listMember).
    
  • Another solution is to use a KRL that permits a full definition of sub:listMember. Representations that use this type can still be translated into OWL2 by using the above kind of “partial but sufficient” representations.

  • A third solution is to pre-treat the KB for adding sub:subclassInDisjointUnion relations based on owl:disjointUnionOf relations in the KB. Here is a SPARQL1.1 update request supporting this.

    INSERT { ?st1 sub:subclassInDisjointUnion ?t.
             ?st2 sub:subclassInDisjointUnion ?t } WHERE
    { ?t owl:equivalentClass [rdf:type rdfs:Class;
                              owl:disjointUnionOf/rdf:rest*/rdf:first ?st1;
                              owl:disjointUnionOf/rdf:rest*/rdf:first ?st2]
      FILTER (?st1 != ?st2)
    }

Defining sub:subclassInSetOfExclusiveSubclasses. If the following full definition of this relation type was correct, it would not have to be used explicitly for sub:subclassInExclusionSetOrAlone relations to be inferred: owl:disjointUnionOf relations or equivalent representations could be used instead.

sub:subclassInSetOfExclusiveSubclasses  owl:propertyChainAxiom 
    (sub:subclass  owl:equivalentClass  owl:disjointUnionOf  rdfs:member).

The previous paragraph explained why the use of rdfs:member relations is incorrect and how there are three solutions to fix it. In the current case, the partial but sufficient definition is:

sub:subclassInSetOfExclusiveSubclasses  owl:propertyChainAxiom 
    (sub:subclass  owl:equivalentClass  owl:disjointUnionOf  sub:listMember).

Defining the constraint for the Subclass ODP. A positive descriptive constraint can now be defined to ensure that “if there is a subclass relation between two classes, this relation must be of type sub:subclassInExclusionSetOrAlone”. An equivalent constraint, albeit class-focused instead of relation-focused, is: “if a class C1 has a subclass relation to a class C2, this relation must be of type sub:subclassInExclusionSetOrAlone”. However, OWL2 does not allow the use of variables to refer to the relation or to C2. The OWL2 compatible version of this constraint is then “if a class C1 has a subclass relation, all subclass relations from C1 must be of type sub:subclassInExclusionSetOrAlone”. If “some” was used instead of “all”, not all the subclass relations from C1 would be checked. Using OWL2 and the CSTR types, here is a way to write this constraint without having to define a second-order class. The type owl:AllDisjointClasses is allowed in OWL2-EL, OWL2-QL and OWL2-RL. However, although an OWL2-RL inference engine can check if an expression using this type is respected, this engine cannot use this type for inferring whether or not an object is instance of the conclusion class below.

[] rdf:type cstr:Descriptive_constraint;
   cstr:condition_class      # if C is a class that has a subclass ...
      [rdf:type owl:Class; 
       owl:equivalentClass [rdf:type owl:Restriction;  #"any class that has a subclass"
                            owl:onProperty sub:subclass;  owl:someValuesFrom rdfs:Class] ];
   cstr:conclusion_class  #... then C has no subclass relation that is not
                          #            of type sub:subclassInExclusionSetOrAlone
      sub:ClassWithNoRelationOfType_subclassButNot-subclassInExclusionSetOrAlone.
           #(=> all subclass relations of type sub:subclassInExclusionSetOrAlone)
#with:
[] rdf:type owl:AllDisjointClasses;
   owl:members  #this relation type is usable even though rdfs:member is not
                #  usable for accessing each member of the destination list!
     (sub:ClassWithNoRelationOfType_subclassButNot-subclassInExclusionSetOrAlone
      sub:ClassWithSomeRelationOfType_subclassButNot-subclassInExclusionSetOrAlone). 
sub:ClassWithSomeRelationOfType_subclassButNot-subclassInExclusionSetOrAlone
   owl:equivalentClass
      [rdf:type owl:Restriction;
       owl:onProperty sub:subclassButNot-subclassInExclusionSetOrAlone;
       owl:someValuesFrom owl:Class].
sub:subclassButNot-subclassInExclusionSetOrAlone
   rdfs:subPropertyOf sub:subclass;
   owl:propertyDisjointWith sub:subclassInExclusionSetOrAlone.

This constraint can also be translated into a negative constraint and represented in inconsistency-implying form. That way, an OWL2-RL inference engine can be used for checking this constraint.

[] rdf:type cstr:Descriptive_constraint;
   cstr:condition_class  # if C is a class that has a subclass relation that is not
                         #               of type sub:subclassInExclusionSetOrAlone
      sub:ClassWithSomeRelationOfType_subclassButNot-subclassInExclusionSetOrAlone];
   cstr:conclusion_class owl:Nothing.  #... then there is a problem

The above constraint seems worth being generalized and applied to all transitive relations used in an ontology, e.g. part relations. Indeed, the reasons why the destinations of subtype relations are worth being organized into subtype partitions are also reasons why the destinations of part relations are worth being organized into part partitions. The definition of a part partition for an object mirrors the one for a subtype partition: it is a complete set of disjoint parts of the object; “disjoint” meaning that the parts cannot share common subparts, and “complete” meaning that the union of the parts is equal to the object.

Since OWL2 does not allow the use of variables referring to relation types, OWL2 does not enable the specification of a unique constraint for all transitive relations used in an ontology. One solution then is, for each transitive relation type that is not a subclass relation and that needs to be checked, to copy and adapt the above subclass related definitions and positive constraint. The creation of such positive constraints can be automatized using the next SPARQL command. It takes as a parameter each instance of owl:TransitiveProperty specified in the ontology, i.e. each transitive relation type ?trRelType, e.g. sub:subclass and sub:part which is the inverse of the general partOf relation type. From each ?trRelType, via a sub:aloneOrInPartitionTrRelType relation the command accesses the supertype of recommended relation types to use when representing ?trRelType relations. For example, for sub:subclass relations, the previous section showed that this recommended supertype can be sub:subclassInExclusionSetOrAlone. If subtyping should rather be done only via subtype partitions, the recommended supertype would rather be sub:subclassInPartition (same definition as the previous type but without the last three lines). For sub:part relations, the recommended supertype can be sub:partInPartition.

INSERT 
{ [] rdf:type cstr:Descriptive_constraint;
     cstr:condition_class  #class for the source of a transitive relation
        [?TrRelTypeDomain  owl:equivalentClass 
            [rdf:type owl:Restriction;   #class source of a transitive relation
             owl:onProperty ?trRelType;  #e.g., a sub:part relation
             owl:someValuesFrom owl:Thing] ]; #any thing (type or individual)
     cstr:conclusion_class
        [?conclusion_class owl:equivalentClass  
            [rdf:type owl:Restriction;
             owl:onProperty ?aloneOrInPartitionTrRelType; #e.g., sub:partInPartition
             owl:someValuesFrom owl:Thing] ]
}
WHERE{ ?trRelType rdf:type owl:TransitiveProperty;
                  sub:aloneOrInPartitionTrRelType ?aloneOrInPartitionTrRelType;
                  rdfs:domain ?TrRelTypeDomain  #e.g., owl:Thing (see below)
     }

Thus, based on the next partial definitions of sub:subclass and sub:part, the above command generates i) a descriptive constraint for checking sub:part relations, and ii) a descriptive constraint for checking sub:subclass relations. This last generated constraint is different from the one given in the previous sub-section, for two main reasons.

  • These generated constraints only support the checking of direct uses of recommended transitive relations, not the checking of alternative but equivalent ways to represent these recommended relations in OWL2. The reason is that OWL2 permits a full definition of sub:subclassInExclusionSetOrAlone (or, more exactly, a partial but sufficient definition of this type) but OWL2 does not permit a similar definition for sub:partInPartition nor for any other transitive type. However, if needed, sub:partInPartition relations and other transitive relations can be generated via SPARQL update requests similarly to the way relations of type sub:subclassInDisjointUnion can be generated, as seen in the previous sub-section.
  • Unlike all the other examples of constraints in this article, these generated constraints have to be checked using the log:implies based approach described in Section 3.1.5. Otherwise, these constraints are ambiguous, like the OWL2 compatible version in the previous if “some” was used instead of “all”. However, as noted, Description Logic inference engines generally do not deduce log:implies relations between statements (triples, ...) and hence cannot be reused with this solution.
sub:subclass rdf:type owl:TransitiveProperty;   rdfs:domain rdfs:Class;
             sub:aloneOrInPartitionTrRelType sub:subclassInExclusionSetOrAlone.
sub:part rdf:type owl:TransitiveProperty;   sub:trRelDomain owl:Thing;
         sub:aloneOrInPartitionTrRelType sub:partInPartition.

The use of owl:Thing in this last example may be a problem for some classic Description Logic inference engines. For example, owl:Thing cannot be used as source or destination of an rdfs:subClassOf relation in OWL-RL [OWL 2 profiles]. It is therefore preferable to use a class other than owl:Thing as the domain of a transitive relation type. It is also often easy and relevant, as in using sub:Spatial_entity for the domain of sub:spatial_part.



4.2. Examples of Useful General Prescriptive Constraints

4.2.1. Checking that Certain Relation Types Are Systematically Used
          From or To Objects of Certain Types

The last examples of generated constraints were descriptive: they stated that if transitive relations of certain general types are used, these relations must also be of certain specific types. A prescriptive constraint is needed to check that if certain kinds of relation types are declared in an ontology, every object instance of the domain (resp. range) of such relation types must be the source (resp. destination) of such relations. The 2nd-order type sub:MandatoryOutRelationType (resp. sub:MandatoryInRelationType) can be used for specifying or accessing such relation types. Then, from such types, relations of type sub:leafObjectType can be used to indicate the type that can be used to specify or access the objects (classes or individuals) that, as an exception, must not be checked. For example, the following representation states that every dividable object – i.e. every instance of sub:DividableThing – must be the source of a sub:part relation except for each object instance of sub:PartDestLeaf. By stating such an exception, ontology creators can make explicit their choice of not using a certain object as source of a sub:part relation even when it is possible.

sub:part rdf:type sub:MandatoryOutRelationType;  rdfs:domain sub:DividableThing;
         sub:leafObjectType sub:PartDestLeaf;

Similarly, for specifying that every class not marked as a subclass leaf must have a suclass, one can write:

sub:subclass rdf:type sub:MandatoryOutRelationType;   rdfs:domain owl:Class.
             sub:leafObjectType  sub:SubclassDestLeaf;

Below is a SPARQL command that generates a prescriptive constraint for each instance of sub:MandatoryOutRelationType. In this command, each occurrence of “out” must be replaced by “in” if the same is to be done for each instance of sub:MandatoryInRelationType. When the prescriptive constraint generated by this command is about relations which are not between classes, e.g. sub:part relations, a classic Description Logic inference engine can be reused to check this constraint, although this may require a temporary removal of relations from types such as sub:leafObjectType relations.

INSERT
{ [] rdf:type sub:Prescriptive_constraint;
     cstr:condition_class  #source class of mandatory "out relation"
        [?MandatoryOutRelTypeDomain  owl:equivalentClass 
            [rdf:type owl:Class; #?MandatoryOutRelTypeDomain minus ?OutRelLeafDestType
             owl:intersectionOf ( ?MandatoryOutRelTypeDomain
                                  [rdf:type owl:Class;
                                   owl:complementOf ?OutRelLeafDestType] ) ] ];
     cstr:conclusion_class
        [?conclusion_class owl:equivalentClass  
            [rdf:type owl:Restriction;  #there must be a relation of type ?outRelType
             owl:onProperty ?outRelType;   all:someValuesFrom owl:Thing] ]
}
WHERE { ?outRelType rdf:type sub:MandatoryOutRelationType;
        sub:leafObjectType ?OutRelLeafDestType;
        rdfs:domain ?MandatoryOutRelTypeDomain
      }

This generation-based approach can also be used to check that instances of a certain class – or, equivalently, instances of certain classes since a class may have subclasses – are systematically connected to at least one instance of a certain class via relations of a certain type, e.g. a transitive relation type or one defined via owl:propertyChainAxiom. For example, to generate a constraint checking that every instance of sub:Component is related via sub:part relations to an instance of sub:Component, one can write:

sub:partOf rdf:type sub:MandatoryOutRelationType; 
           rdfs:domain sub:Component;  rdfs:range sub:DividableThing.

The specifications given in Section 4.1 do not enforce the use of subtype partitions or part partitions. However, if the specifications above given in this subsection are also used, these partitions become mandatory except for non-natural types in the case of subtype partitions. Thus, if there are no non-natural types, doing so specifies that any two classes in the KB are either connected by a subclass relation or an exclusion relation. This could not be specified using only one prescriptive constraint since RDF+OWL does not allow the explicit use of variables for relating objects shared by both the condition and conclusion of a constraint.




5. Evaluation and Comparisons

The originality of the approach proposed in this article is that it enables i) the representation of constraints independently of their exploitation (this one is represented within content-independent queries), ii) the representation of both descriptive and prescriptive constraints with any KRL the expressiveness of which is at least equal to RDF or RDFS, and hence iii) the exploitation of most inference engines, especially via SPARQL queries.

Since the proposed approach relies on other methods and tools chosen by each user of the approach, it inherits from their theoretical or practical improvements. It would thus not be relevant to focus on theoretical aspects of a particular method or tool in this article. For a general comparison, [Baget et al., 2015] and [Swan, 2016] list theoretical points relevant to the proposed approach. Regarding the use of SPARQL to check constraints, [Tao et al., 2010] shows that SPARQL can be used for both expressing and validating integrity constraints based on some partial forms of the Unique Name Assumption and Closed World Assumption. It also shows that this validation is sound and complete when the expressiveness used for the constraints and the rest of the KB are respectively only “SROIQ and SRI” or only “SROI and SROIQ”. In the proposed approach, queries are used only for validating constraints, not expressing them, but this is only a generalization of the approach of [Tao et al., 2010] which does not change the associated theoretical results. In [Tao et al., 2010], the used partial forms of the Unique Name Assumption and Closed World Assumption are specified in SPARQL via its operator NOT EXISTS plus the use of relations of type owl:sameAs or owl:differentFrom. These forms can similarly be expressed via the commands seen in Section 3 and the use of relations of type owl:sameAs or owl:differentFrom in the constraints.

The proposed approach was validated experimentally by testing the degree to which a few constraints – including all those presented in this article – were followed in i) the “family relationship” focused sample ontology given in the OWL2 Primer W3C document and ii) a few ontologies from LOV, a Linked Data repository. The validation came from finding the right constraint violations and the right degrees with the proposed queries and, when necessary, preliminary update requests.

Besides testing these constraints, queries and requests, another goal of this validation phase was to represent ontology design patterns or best practices (ODPs) as constraints. As shown by the previous sections, the proposed kinds of descriptive or prescriptive constraints are ways to represent ODPs that are collections of types that should be reused whenever possible. However, during the above cited validation phase, no ODP satisfying the two following criteria was found: i) the ODP could be implemented via a constraint, and ii) the ODP was likely not to lead to a completeness degree close to 0% for a randomly chosen ontology. More generally, no widely followed ODP was found.

Querying a KB for detecting anti-patterns in it is analogous to querying it for detecting violations of ODPs in it. However, like the SPARQL based works of [Roussey & Zamazal, 2013, many works on anti-pattern detection use queries essentially as a way not to use an expressive inference engine for detecting certain problems]. Instead, the proposed approach exploits inference engines. With a sufficiently powerful KRL, any anti-pattern can be expressed as a negative constraint in inconsistency-implying form.

The introduction of this article summarized the strong distinction that exists between constraint-based completeness and represented-world-based completeness, and hence the reason why it would not be relevant to further compare the proposed approach with those of tools such as SWIQA and Sieve.

Since the proposed approach is based on a particular use of RDFS it should be compared to SHACL and SPIN.

Some transformation languages or systems exploit KRs. [Zamazal & Svátek, 2015] and [Corby & Faron-Zucker, 2015] present such systems. Although few of them explicitly have a function that detects KR patterns without also transforming the matched KRs (e.g., PatOMat [Zamazal & Svátek, 2015] is an exception), these languages or systems could easily be adapted to have such a function and hence be used for handling prescriptive constraints. However, all such systems appear to use rule-based languages with more expressiveness than what relations-between-classes based constraints allow. Typically, these languages allow the direct and explicit use of variables for relating objects shared by both the condition and conclusion of a rule. Indeed, using such languages can simplify the writing of prescriptive constraints. However, regarding what can be expressed and checked via constraints, this article shows that i) much can be achieved simply using relations-between-classes based constraints and SPARQL1.1, and ii) the power of the proposed approach then relies on the power of the inference engine used for object matching, rather than on the used language.

Some transformation systems, like PatOMat [Zamazal & Svátek, 2015], issue SPARQL queries for detecting patterns, based on non-SPARQL specifications for patterns and their transformations. Some other transformation systems directly propose an extension of SPARQL such as STTL [Corby & Faron-Zucker, 2015] to write specifications for patterns and their transformations. For instance, as shown in [Corby, Faron-Zucker & Gazzotti, 2016], STTL can be combined with LDScript [Corby, Faron-Zucker & Gandon, 2017] to specify STTL queries (compiled into SPARQL queries) for detecting patterns and then transforming the results. However, [Corby, Faron-Zucker & Gazzotti, 2016] does not discuss the exploitation of object matching capabilities of inference engines and it does not distinguish between prescriptive constraints and non-prescriptive ones. To sum up, the proposed SPARQL commands, including those generating prescriptive constraints, could also be reused in these transformation systems, although in an adapted form. Alternatively, using scripts may be more flexible, as shown with JavaScript in Section 3.2. This work will continue in both directions, respectively using STTL+LDScript and JavaScript, especially for prescriptive constraints related to the use of given KRL models and KRL notations. To date, bases for this future work are mostly unrelated and discussed in [Corby, Faron-Zucker & Gazzotti, 2016] and [Martin & Bénard, 2017a, 2017b].




6. Conclusion

This article focuses on the original goal of supporting the checking of constraint-based completeness via a few predefined content-independent queries or procedures as opposed to creating special-purpose queries for each constraint or ad hoc procedures not reusing inference engines. An example of application is the checking of the following of ontology design patterns (ODPs), KB design libraries (e.g., the KADS library [Breuker & van de Velde, 1994]) or top-level ontologies (e.g., DOLCE) in order to validate the quality of a KB or, during it design, help elicit missing knowledge from experts.

Section 2 answers the first two research questions: what kinds of constraints need to be considered for evaluating constraint-based completeness via content-independent queries, and how to represent constraints in any KRL that has an expressiveness at least equal to RDF or RDFS? Section 2 does so via complementary means.

Section 3 answers the third research question: how to implement the general approach with query languages such as SPARQL or slight extensions of it as well as in procedural languages? For query languages, Section 3.1 shows that only few (kinds of) content-independent queries are necessary and that pre-treatments on the KB are a way to counter-balance some lack of expressiveness of some query languages as well as implementing inference bypassing methods. For procedural languages, Section 3.2 shows that more control and flexibility are possible at lower costs.

Section 4 provides an ODP for the use of subClassOf relations and generalizes it into a multi-inheritance based ODP that i) works on any transitive relation, ii) keeps the advantages of the single inheritance based ODP, and iii) is more flexible and not dependent on special tools. More importantly, Section 4 shows how these ODPs can be represented as descriptive constraints exploitable by content-independent queries. Section 4 also shows how prescriptive constraints can be generated for checking the systematic use of all the subtypes of a certain type.

Section 5 evaluates the proposed techniques and compares the approach to other ones. A complement to this work will be to i) represent ODPs in several research areas (knowledge sharing, cooperation, security, etc.), using only relations between classes whenever possible, ii) organize them by relations of specialization or other kinds, and iii) test these ODPs via STTL+LDScript and JavaScript. The resulting analyzes will be added to this article.




7. Acknowledgments

Many thanks to Dr Olivier Corby (member of the Wimmics and SPARKS teams of, respectively, the INRIA and I3S CNRS laboratories at the University Côte d'Azur, France) for his questions and remarks on the approaches presented in this article and his help during the implementation of these approaches with SPARQL and LDScript via the Corese tool.

The interesting feedback provided by reviewers of published parts of this article was also well appreciated.




8. References

  1. Assmann U., Wagner G. (2006). Ontologies, metamodels and model-driven paradigm. In: Ontologies for Software Engineering and Technology, Springer-Verlag, Berlin, chapter 9.
  2. Bachimont B., Isaac A., Troncy R. (2002). Semantic Commitment for Designing Ontologies: A Proposal. In: EKAW 2002, Knowledge Engineering and Knowledge Management: Ontologies and the Semantic Web, LNCS, vol 2473, pp. 114–121, Springer Berlin, Siguenza, Spain.
  3. Baget J., Gutierrez A., Leclère M., Mugnier M., Rocher S., Sipieter C. (2015). Datalog+, RuleML and OWL 2: Formats and Translations for Existential Rules. In: Challenge+DC@RuleML 2015, 9th International Web Rule Symposium (RuleML), Berlin, Germany.
  4. Berners-lee T., Connolly D., Kagal L., Scharf Y., Hendler J. (2008). N3logic: A logical framework for the world wide web. Journal Theory and Practice of Logic Programming, 8(3), pp. 249–269, Cambridge University Press New York, NY, USA.
  5. Breuker J, van de Velde W. (1994). CommonKADS Library for Expertise Modelling: Reusable Problem Solving Components. IOS Press, Amsterdam.
  6. Calvanese D., De Giacomo G., Lembo D., Lenzerini M., Rosati R. (2007). EQL-lite: effective first-order query processing in description logics. In: IJCAI'07, pp. 274–279, Hyderabad, India.
  7. Chein M., Mugnier M. (2008). The BG Family: Facts, Rules and Constraints. Graph-based Knowledge Representation - Computational Foundations of Conceptual Graphs. Chapter 11 (pp. 311–334), Springer-Verlag London, 428p.
  8. Corby O., Faron-Zucker C. (2015). STTL: A SPARQL-based Transformation Language for RDF. In: WEBIST 2015, 11th International Conference on Web Information Systems and Technologies, Lisbon, Portugal.
  9. Corby O., Faron-Zucker C., Gazzotti R. (2016). Validating Ontologies against OWL 2 Profiles with the SPARQL Template Transformation Language. In: RR 2016, 10th International Conference on Web Reasoning and Rule Systems, LNCS 9898, pp. 39–45, Springer, Aberdeen, UK.
  10. Corby O., Faron-Zucker C., Gandon F. (2017). LDScript: a Linked Data Script Language. In: ISWC 2017, Vienna, Austria.
  11. Djakhdjakha L., Mounir H., Boufaïda Z. (2014). Towards a representation for multi-viewpoints ontology alignments. In: IJMSO, International Journal of Metadata, Semantics and Ontologies, 9(2), pp. 91–102, Inderscience Publishers, Geneva.
  12. Dromey R.G. (2006). Scaleable Formalization of Imperfect Knowledge. In: AWCVS 2006, 1st Asian Working Conference on Verified Software, pp. 29–31, Macao SAR, China.
  13. Galárraga L., Hose K., Razniewski S. (2017). Enabling completeness-aware querying in SPARQL. In: WebDB 2017, pp. 19–22, Chicago, IL, USA.
  14. Genesereth M.R., Fikes R.E. (1992). Knowledge Interchange Format, Version 3.0, Reference Manual. Report Logic 92-1, Logic Group, Stanford University, California.
    ftp://www.ksl.stanford.edu/pub/KSL_Reports/KSL-92-86.ps.gz
    https://pdfs.semanticscholar.org/590c/e95343370578dd1d357319c84d1dc7ff8fac.pdf
  15. Guizzardi G. (2005). Ontological foundations for structural conceptual models. PhD thesis (441p), Telematica Instituut / CTIT.
  16. Marino O., Rechenmann F., Uvietta P. (1990). Multiple Perspectives and Classification Mechanism in Object-Oriented Representation. In: ECAI 1990, pp. 425–430, Pitman Publishing London, Stockholm, Sweden.
  17. Martin Ph., Bénard J. (2017a). Categorizing or Generating Relation Types and Organizing Ontology Design Patterns. In: KAM'17, 23rd IEEE conference on Knowledge Acquisition and Management, Prague, Czech Republic.
  18. Martin Ph., Bénard J. (2017b). Creating and Using various Knowledge Representation Model and Notations. In: ECKM 2017, 18th European Conference on Knowledge Management, Barcelona, Spain.
  19. Mizoguchi R, Kozaki K., Kitamura Y. (2012). Ontological Analyses of Roles. In: IEEE FedCSIS 2012, pp. 489–496, Wroclaw, Poland.
  20. Rector A., Brandt S., Drummond N., Horridge M., Pulestin C., Stevens R. (2012). Engineering use cases for modular development of ontologies in OWL. Applied Ontology, 7(2), pp. 113–132, IOS Press.
  21. Roussey C., Zamazal A. (2013). Antipattern detection: How to debug an ontology without a reasoner. In: WoDOOM 2013, 2nd International Workshop on Debugging Ontologies and Ontology Mappings, pp. 45–56, Montpellier, France.
  22. Sowa J.F. (1992). Conceptual Graphs Summary. Conceptual Structures: Current Research and Practice, Ellis Horwood, pp. 3–51.
  23. Swan R. (2016). Querying Existential Rule Knowledge Bases: Decidability and Complexity. PhD thesis (159p), University of Montpellier.
  24. Tao J., Sirin E., Bao J., McGuinness D. (2010). Integrity Constraints in OWL. In: AAAI'10, pp. 1443–1448, Atlanta, USA.
  25. Zamazal O., Svátek V. (2015). PatOMat – Versatile Framework for Pattern-Based Ontology Transformation. Computing and Informatics, 34(2), pp. 305–336.
  26. Zaveri A., Rula A., Maurino A., Pietrobon R., Lehmann J., Auer S. (2016). Quality assessment for linked data: A survey. Semantic Web, 7(1), pp. 63–93.