Relations-between-classes based Constraints and Constraint-based Ontology Completeness

Dr Philippe A. MARTIN (www.phmartin.info;  article@phmartin.info),   Jun H. JO


Abstract. This article defines prescriptive constraints as rules enabling inference engines to check that certain formal objects are used -- not just inferred -- or not, in certain conditions. It shows why these constraints require not exploiting inheritance mechanisms (or other mechanisms automatically adding relations to objects) during the tests of rule conclusions. It gives a general method to do this, then SPARQL commands (queries or update requests) and a Javascript function to implement this method when the rules are represented via a subclassOf relation from the condition to the conclusion, as in OWL2-ER. The article illustrates these commands to check an ontology design pattern about subclassOf relations and, more generally, about transitive relations. More generally, the approach can be used to check the completeness of an ontology, or to represent in an ontology (rather than via requests or ad hoc procedures) constraints to calculate a degree of ontology completeness. The approach can thus help elicitation, modeling or validation of knowledge.

Keywords:  prescriptive constraint, ontology completeness, OWL2, SPARQL.

Table of Contents

1. Introduction 2. Constraints: Definitions and Implementation Approaches 2.1. Definitions 2.1.1. Considered Constraints: Those Not Used For Inferences, Just For Checks 2.1.2. Descriptive vs. Prescriptive (Hence Not Using All Possible Inferences) 2.1.3. Descriptive Constraints Restricted To Named Individuals 2.2. General Approach 2.2.1. Using Constraint Types 2.2.2. Bypassing Some Inferences When Checking The Conclusions Of Prescriptive Contraints That Are Not in Inconsistency-implying Forms 2.2.2.1. Using “Clones Without Types” 2.2.2.2. Using A Copy Of The Base of Facts 2.3. Relations-between-classes Based Constraints 2.3.1. Approaches 2.3.1.1. Comparison With OWL-ER 2.3.2. Proposed Types 2.3.3. Examples 2.3.3.1. SubclassOf-based Constraints 2.3.3.2. SubclassOf-analog Constraints 2.3.3.3. Individual-based Constraints 3. Exploiting Relations-between-classes Constraints in SPARQL and Javascript 3.1. SPARQL Commands for Checking SubclassOf-based Constraints 3.1.1. Examples of KB Pre-treatments 3.1.1.1. Temporarily Removing Individuals For Checking Constraints on Types With Classic Description Logic Inference engines 3.1.1.2. Creating “Clones Without Types” of Objects For Exploiting These Objects Without Inheritance Mechanism 3.1.1.3. Creating “Identifier relations” for Named Individuals 3.1.2. Checking Individual-based Positive Constraints 3.1.3.1. Checking Individual-based Positive Descriptive Constraints 3.1.3.2. Checking Individual-based Positive Prescriptive Constraints 3.1.3. Checking SubclassOf-analog Positive Constraints 3.1.3.1. Checking SubclassOf-analog Positive Descriptive Constraints 3.1.3.2. Checking SubclassOf-analog Positive Prescriptive Constraints 3.1.4. Checking SubclassOf-based Positive Constraints 3.1.4.1. Checking SubclassOf-based Positive Descriptive Constraints 3.1.4.2. Checking SubclassOf-based Positive Prescriptive Constraints 3.1.5. Checking Inconsistency-implying Constraints 3.1.6. Checking Binary Relations Instead of Individuals 3.1.7. Evaluating the Completeness of a KB 3.2. JavaScript Function for Ontology Completeness Evaluation 4. Examples of Applications and Limits 4.1. Application to Checking the Design of a Subtype Hierarchy 4.2. Generalization To Check the Use of Any Transitive Relation 4.3. Checking that Certain Relation Types Are Systematically Used 5. Evaluation, Comparisons and Conclusion 6. Acknowledgments 7. References




1. Introduction

Knowledge representations (KRs) are formal descriptions enabling automatic logical inferencing, and thus automatic KR comparison, search, merge, etc. KRs are logic formulas, e.g. binary predicates of 1st-order logic (FOL) called triples or property instances in RDF and binary relations in Conceptual Graphs (CGs) [Sowa, 1992]. In this article, for the purpose of clarity, we use the intuitive terminology of CGs: (information) objects are either types or individuals, and types are either relation types or concept types (classes and datatypes in RDF). A formal knowledge base (KB) is a collection of such objects written using a KR language (KRL). An ontology is a KB that is is essentially about types, rather than about individuals.

Creating or evaluating a KB is difficult. To ease it, an often performed subtask is to evaluate some (degree of) completeness of this KB, with respect to some criteria, i.e., more concisely, “its completeness”. Such an evaluation is performed within various tasks or fields but differently depending on the used tools and sometimes in implicit and ad hoc ways. Examples of such tasks or fields are: i) the automatic/manual extraction of knowledge or the creation of a KB, ii) the exploitation of ontology design patterns, and iii) the evaluation of ontologies or, more generally, datasets. In this last field, as noted in [Zaveri et al., 2016], completeness commonly refers to a degree to which the “information required to satisfy some given criteria or a given query” are present in the considered dataset. We complement this very general definition by distinguishing two kinds of completeness:

One way to define and calculate the constraint-based completeness of a KB is to divide “the number of statements satisfying the constraints in that KB” by “the total number of statements in the KB” When calculating content-based completeness, aggregating the results of completeness oracles may not be as simple. As a variant, instead of statements, one may want to consider objects and hence measure the percentage of objects for which all relations from/to them satisfy the constraints. Other variants may be defined by considering only certain kinds of objects or statements. Defining constraints as KRs in the KB permits the definition of queries that exploit these constraints, independently of their content. We therefore call them “content-independent queries“. When constraints are not defined as KRs but via (necessarily content-dependent) queries, a different query must be created for each variant of constraint checking or constraint-based completeness. Because of this lack of modularity, even if content-dependent queries are also stored in an ontology, they are less easily organised than content-independent queries.

This article does not address content-based completeness but the techniques we propose for representing and exploiting constraints may also be used for implementing the simplest of the rules used by completeness oracles. From now on, unless otherwise explicilty specified, “completeness” refers to constraint-based completeness.

Section 2.1 explores our first research question: what does “should and should not be represented in the dataset” means or, more precisely, given the “descriptive vs. prescriptive” distinction, what kinds of constraints need to be considered for evaluating constraint-based completeness via content-independent queries?

Section 2.2 and Section 2.3 presents the general approach we propose to answer the second research question of this article: how to represent constraints in a KRL-independent way – or, more precisely, in any KRL that has an expressiveness at least equal to RDF or RDFS – even though actually defining the semantics of some of these constraints would require much more expressive logics? Our solution relies i) on the representation of constraints via restricted constructs based on relations between classes (or to classes), e.g. rules using the rdfs:subClassOf relation for representing a restricted implication between the condition and conclusion of the rule, ii) on the use of certain special types for specifying that a statement is a constraint of a certain kind, and iii) on the exploitation of these types in content-independent queries. Such constraint representations can then be exploited via most inference engines and KR query languages instead of tools tied to a particular KRL or goal (knowledge acquisition, ontology design patterns, ontology evaluation, etc.).

Section 3.1 show commands (queries or update requests) that may be used as an answer to our third research question: how to implement our above cited general approach in SPARQL or slight extensions of it? Section 3.2 then shows how our general approach can be implemented in a simpler and more flexible way by exploiting inference engine via a procedural language instead of via aquery language such as SPARQL.

Section 4 illustrates applications of the given content-independent queries. To do so, it first generalizes some ontology design patterns (ODPs) or best practices (BPs) related to the use of subtypeOf relations and shows how the SPARQL commands of Section 3 provides a simple way to specify and use such ODPs. Section 4 then generalizes these results to any transitive relation. It also shows how to prescribe the systematic use of certain relation types. Like Section 3, is shows how some – but not all – expressiveness limits of subclassOf-based constraints can be overcome.

Section 5 relates our approach with other ones, evaluates it and concludes.




2. Constraints: Definitions and Implementation Approaches

2.1. Definitions

2.1.1. Considered Constraints: Those Not Used For Inferences, Just For Checks

In this article, as in KIF (Knowledge Interchange Format) [Genesereth & Fikes, 1992], a rule is a statement that can be represented in the form “X =>> Y” where “=>>” is a restricted version of the logical implication (“=>”): “=>>” only supports modus ponens, not modus tollens.

A rule that allows the derivation of a non-modal statement is one that can be represented in the form “X =>> Y” where Y does not include a modality (e.g., must). An example is “if x is a Person then x has a parent”. If this statement and “Tom is a Person” are in a KB, an inference engine can derive the non-modal statement “Tom has a parent”.

[Chein & Mugnier, 2008] defines constraints as positive or negative, respectively expressing statements of the form “if A,  B must be true” and “if A,  B must be false”. In other words, the authors define constraints as rules where the conclusion has a “must” modality. These are the kinds of constraints we consider, with the interpretation that in such constraints the “must” entails that the constraints can only be used for checking statements, i.e. that they are not rules allowing the derivation of non-modal statements. E.g., from the positive constraint “if x is a Person, x must have a parent” and “Tom is a Person” are in a KB, an inference engine should not derive “Tom has a parent”. It may derive “Tom must have a parent” but, in practice, such derivation is not made. As a somewhat opposite example, RDFS-aware engines do not exploit relations of type rdfs:domain or rdfs:range as relation signature constraints but as inference supporting statements since these engines may infer the types of the source or destination of relations having types defined via rdfs:domain or rdfs:range relations.

More formally, the positive and negative constraints (that we consider) can respectively be translated into the forms “A ∧ ¬B =>> false” and “A ∧ B =>> false” where A and B do not contain a “must” modality and A may be empty. In this article, the constraints that are already represented in a form ending by “=>> false” – or, equivalently, “=>> ⊥” – are called constraints in inconsistency-implying forms. Not all KRLs allow to represent rules (instead of implications). In those that do, using inconsistency-implying forms for negative constraints is easy. For positive constraints, representing the “¬B” part may not be possible.

In practice, constraints are rarely represented using modal logic based KRLs. Constraints are more commonly represented – or checked – using queries, e.g. via SPARQL or the nonmonotonic epistemic FOL query language EQL-Lite [Calvanese et al.]. In (unidirectional) rule based systems, rules with empty conclusions (or “false” as conclusions) are handled like constraints. However, this is a particularity of these systems. It should not be relied upon for general knowledge representation purpose. For such a purpose, the special semantics of constraints should be made explicit via special syntactic sugar or special types. Since KRLs rarely propose syntactic sugar for expressing constraints, a more generic approach for expressing that a statement is a constraint, not an inference supporting statement, is to state that the statement is an instance of a type expressing a particular kind of constraint, as explained in Section 2.2.1. Thus, these constraints can be retrieved and exploited by content-independent queries such as those we provide. These constraints may also be directly interpreted and exploited by inference engines designed to take into account the used constraint types. In any case, constraints i)  should not be represented in a way they can be exploited as inference supporting statements, or ii) the results of these inferences should not be detrimental, i.e., should not influence the checking of constraints. Both approaches will be illustrated in Section 2.3.1 and Section 3.


2.1.2. Descriptive vs. Prescriptive (Hence Not Using All Possible Inferences)

As noted in [Assmann & Wagner, 2006], one common distinction between engineering models is whether they are i) descriptive of some reality, e.g. like most ontologies (and, by default, all those written in RDF+OWL), or ii) prescriptive of what should be in the considered dataset, as with system specifications, meta-models, XML schemas, database schemas, SHACL constraints. Like definitions or axioms, descriptive constraints enable inference engines to check the use of certain formal terms, if and only if these terms are used (an example is given by the next paragraph). On the other hand, prescriptive constraints enable inference engines to check that certain formal terms are actually used (not just inferred) or not used, in certain conditions. E.g., prescriptive constraints can be used for checking that if the instance of a type are defined as (necessarily) having certain relations, these relations are explicitly given by users whenever they create an instance of such a type. Here, “explicitly” emphasizes that these relations should not exist just because they were automatically deduced, e.g. by inheritance, but only because they were set by a user (manually or automatically). More formally, if in a KB named Kb we call Kb_facts its base of facts (for a description-logic based KB, this would be its A-box) and Kb_PCs its set of prescriptive constraints on Kb_facts, for each prescriptive constraint of the form “X =>> Y” in Kb, what should be checked when “Kb => X” (i.e., when a specialization X´ of X can be found or inferred in Kb) is that “Kb_facts => Y” (i.e., that a specialization Y´ of Y can be found in Kb_facts without using the rest of Kb, in other words, a non-inferred Y´). For constraints in inconsistency-implying forms, there is no distinction between descriptive and prescriptive since they enable the detection of an incorrect KR whether it has been added automatically or not.

As an example, assume that a KB includes the rule “if X is a Person, X has a parent” and that a user enters the fact “John is a Person”. Even if this KB also includes the descriptive constraint “if X is a Person, X must have a parent in the represented world (descriptive-must)”, no error message should be given by a checking mechanism since this constraint is satisfied (by inferencing) without the user having to represent a parent for John. On the other hand, if the KB includes the prescriptive constraint “if X is a Person, X must have a parent in the base of bact (prescriptive-must)”, the adding of a new Person without a relation to a parent must now be rejected. In other words, if some mechanism automatically associates relations to some of the checked objects – e.g., by dynamic lookup for inherited relations during each object matching or via one forward chaining saturation – this mechanism must be temporarily disabled or bypassed for checking positive prescriptive constraints.

Thus, prescriptive constraints enable checks that descriptive constraints cannot, and they are not equivalent to using the closed world assumption. The techniques presented by this article for defining and checking prescriptive constraints can be performed with open-world assumption. Classic logical expressions are only descriptive. E.g., as above explained, simply stating that “any Person (necessarily) has a parent” is only descriptive. Rules (marked as) representing positive prescriptive constraints require a special interpretation, e.g. via a special command or procedure.


2.1.3. Descriptive Constraints Restricted To Named Individuals

When using a constraint to check if certain objects in a KB satisfy a methodology or an ontology design pattern, one may want to take into account automatically deduced relations but only if they are from or to named individual (“IRIs” in RDF terminology), not if they are between anonymous individuals (“blank nodes” in RDF terminology). Such a “partly descriptive - partly prescriptive” constraint may be termed “descriptive but restricted to named individuals”. It requires the author of the constraint to state within the constraint which individuals should be named.

Although RDFS provides the type rdfs:label for relations between an individual and its names, it does not provide a type for relating an individual to its identifiers (even though identifiers are unique names) since IRIs (International Resource Identifiers) are directly interpreted as named individuals in RDF. OWL also does not provide such a relation type which could be used for distinguishing which individuals are (or should be) named from those who are not (or need not be). In OWL2 DL, this distinction can be made if each named individual is declared as instance of the class owl:NamedIndividual. However, doing so in OWL2 Full still does not permit to make the distinction. SPARQL permits the distinction via the operators isIRI and isBlank. Hence, our solution is to provide i) the relation type cstr:id for permitting the authors of a constraint to specify which individual should have an identifier, ii) a SPARQL update request permitting the adding of cstr:id relations to each named individual type in a KB (an example is given in Section 2.3.3.1), and iii) a constraint checking query tailored to take those cstr:id relations into account.

Section 3.1 considers four kinds of content-independent queries: i) those for inconsistency-implying forms, ii) those for prescriptive positive constraints, iii) those for descriptive constraints restricted to named individuals, and iv) those for unrestricted descriptive constraints. The reason why no content-independent query is provided for negative constraints is that, as we have seen, when rules can be written, negative constraints can easily be represented in inconsistency-implying forms.



2.2. General Approach

2.2.1. Using Constraint Types

[Tao et al., 2010] shows that representing and checking certain kinds of integrity constraints that exploit some forms of the Unique Name Assumption or Closed World Assumption can be done via SPARQL content-dependent queries. As explained in the introduction, our goal is rather to enable the representation of constraints that i) can be exploited via content-independent queries, ii) can be represented via any KRL that has an expressiveness at least equal to RDFS, and iii) can be explicitly descriptive or prescriptive.

To that end, our approach is to introduce a few types for constraints. By setting instanceOf or subtypeOf relations from certain KRs to some of those types, KB authors can state that these KRs are constraints and can indicate their nature. Thus, these constraints can be exploited by content-independent queries or inference engines that understand the used constraint types. For these engines, the types change the way the statements should be interpreted. This approach is similar to the use of OWL2 types [OWL 2 profiles] in RDF statements for when OWL2-aware inference engines are used. The name of our ontology of constraint types is CSTR. In this ontology, cstr:Constraint is the supertype of all types of constraints. Similarly, the type cstr:Prescriptive_constraint, a subtype of cstr:Constraint, enable to retrieve all and only constraints types marked as prescriptive constraints. The prefix “cstr:” in these identifiers is an abbreviation for the namespace http://www.webkb.org/kb/it/CSTR. CSTR also includes types for marking a class as a constraint condition, and relation types for associating the condition and condition of a constraint. These types will be listed in Section 2.3.2.


2.2.2. Bypassing Some Inferences When Checking The Conclusions Of Prescriptive Contraints That Are Not in Inconsistency-implying Forms

For adequately checking prescriptive constraints that are not in inconsistency-implying forms, Section 2.1.2 introduced the need for temporarily disabling or bypassing inference mechanisms that automatically associate relations to objects. Yet, for inferencing completeness purposes, these mechanisms are useful to exploit when checking the condition of a positive prescriptive constraint. Thus, they should be disabled or bypassed only for testing the main (alias, first) object of the conclusion of the constraint, i.e. the object whose relations are mandatory for all objects matching the condition of the constraint. Each of the next two subsections proposes a bypassing method.

2.2.2.1. Using “Clones Without Types”

This method consists in statically (i.e., via a pre-treatment of the KB) or dynamically (i.e., during the checking of such constraints) creating a “clone without type” of each object matching the condition of such a constraint and then, when checking its conclusion, doing it on this clone. Such a clone has the same relations as the original object except for instanceOf relations (it has none; furthermore, if it is a named individual, it must have an identifier different from the original object). Thus, with clones without type, “inferences exploiting types to associate relations to an object” are avoided. As an abbreviation, from now on, we shall write that this method permits to avoid inheritance (in the case of RDFS or OWL entailments, “avoiding inheritance” means that, when searching relations on an object, the types of this object and their superclasses are not exploited). Creating clones without types is not necessarily easy since there may be information in the KB that lead some inference engine to regenerate some types for some clones. E.g., assuming that the relation type parent is source of a rdfs:domain relation with destination the type Person, if an object of type Person has a parent relation and if its type is removed, an inference engine may set it again. To avoid this particular case, instead of directly asserting relations such as rdfs:domain or rdfs:range, using inconsistency-implying constraints with such relation in the condition parts might be a solution. When SPARQL is used for creating a clone without type, as illustrated in Section 3.1.1.2, another problem is that the whole KB is duplicated, not just one object. Finally, this method based on clones without types does not work if there are inferences that do not exploit types (e.g., via duck typing instead of inheritance), or if a forward chaining saturation on the KB is automatically run before the above cited pre-treatment. However, these last two cases are rare.

This method relies on a temporary update of KRs before their checking by an inference engine. Thus, this method does not rely on a particular KRL, inference engine or tool feature. In other words, this solution is KRL independent and tool independent: it can be used with any KRL and any tool. Hence, depending on the domain and application, different inference engines can be reused to check or evaluate ontology completeness. However, with some query languages such as current standard versions of SPARQL, the temporary update cannot be done dynamically, a pre-treatment of the KB is necessary, for example with SPARQL update, as illustrated in Section 3.1.2. This is a limitation since few KB servers, e.g. few SPARQL endpoints, allow (most of) their users to modify the KB for checking it. With an extension of SPARQL such as LDScript [Corby, Faron-Zucker & Gandon, 2017], the temporary update can be done dynamically. As with SPARQL, the whole KB is duplicated but now it is temporary and done every time an object is matched with the conclusion of prescriptive constraint (Section ??? shows the LDSCRIPT code).

2.2.2.2. Using A Copy Of The Base of Facts

Given a KB, this method first consists in statically creating a copy of its base of facts in a separate KB (e.g., using another file) and then, when checking the conclusion of a constraint, doing the matching with objects coming from this other KB (e.g., this other file). Such an exploitation of several KBs is KRL independent and less brittle than the previous method. However, it currently cannot be used in SPARQL, even though the FROM part of a SELECT query may be used to indicate a particular KB. On the other hand, copying the base of facts can be done via SPARQL, using a minimal entailment regime: “simple” or “RDF”. This method can be used in LDScript via the loading of the base of facts into a named graph by the query checking prescriptive constraints (Section ??? shows the LDSCRIPT code). To avoid the name or IRI of the base of facts to be hardcoded in that query, information on the KBs and their relationship must be stored and exploited. As with the previous method, this one is not directly usable with SPARQL endpoints since none stores information about a copy of the base of facts of the KB it permits to access.



2.3. Relations-between-classes Based Constraints

2.3.1. Approaches

One way to represent and exploit (simple) rules in any KRL that has an expressiveness at least equal to RDFS is to i) use the rdfs:subClassOf relation for representing an implication between the condition and conclusion of the rule, and ii) either prevent this particular implication to be used for modus tollens or prevent its results to be detrimental. The situation is not much more complex when subclassOf rules are used as a way to represent constraints. There are three cases.

In other words, using subclassOf-based constraints when inferences based on subclassOf relations then have to be ignored is generally not relevant. However, the idea of using classes for representing the conditions and conclusion of a constraint without using variables is interesting. Here are the two simplest ways to do so.

A disadvantage of any solution using relations from/to classes but which are not subclassOf relations is that the result requires a KRL with an expressivity at least equal to RDF, not RDFS. With OWL, this means OWL-Full not just OWL-DL. Since each of the three approaches has its advantages and drawbacks, this article shows the types and commands we propose for the three approaches: Section 2.3.2 gives the types, Section 2.3.3 gives examples and Section 3.1 gives the commands.

SubclassOf-based constraints could be generalized using a subtypeOf relation instead of a subclassOf one. Conforming to the terminology given in the first paragraph of the introduction, this would permit the use of relation types and datatypes, hence not just classes, in the condition and conclusion of a constraint. However, checking objects such as relations and instances of datatypes can most often be done via constraints on class instances related to these objects. For example, relations can generally be checked via their sources or destinations, or via the assertions that include these relations, i.e. instances of classes such as owl:NegativeObjectPropertyAssertion. Hence, in this article, we do not propose the use of subtypeOf relations.

2.3.1.1. Comparison With OWL-ER

[Baget et al., 2015] shows that a sublanguage of OWL2 called OWL2-ER using subclassOf-based rules can represent many kinds of “existential rules” (hence the suffix “-ER”), i.e. rules in which the conclusion (or head) can include anonymous existentially quantified objects. A sublanguage of OWL2-ER is OWL2-RL [OWL 2 profiles] which can be translated into Datalog (definite Horn rules with equality), a purely declarative subset of Prolog. Similarly, OWL2-ER can be translated into – and hence represented by – Datalog+ [Baget et al., 2015] (the extension of Datalog to existential rules) and then into RuleML. Thus, not all OWL2 class expressions are allowed in OWL2-ER: rules are restricted to the logical form “∀X ∀Y (B(X,Y) => ∃Z H(X,Z))”, where “X, Y and Z are sets of variables, B (the body) is a conjunction with variables in X and Y, and H (the head) is a conjunction with variables in X and Z; hence Z denotes the set of variables that occur in H but not in B” [Baget et al., 2015]. Given its restrictions with respect to OWL2, OWL2-ER is roughly the part of Datalog+ that can be expressed using only OWL2, hence with binary relations only and without the syntaxic possibility of directly and explicitly using variables for relating objects shared by both the condition and conclusion of a rule.

In OWL2-ER, negative constraints can be represented in two ways: i) by using a class expression equivalent to the type owl:Nothing in a rule conclusion, hence a rule of the form ∀x (ClassExpression(x) => ⊥), ii) by using the type owl:NegativeObjectPropertyAssertion to express negated facts of the form ¬ ∃x ClassExpression(x). Neither OWL2-ER nor Datalog+ can directly represent a positive constraint and hence also do not distinguish between descriptive and prescriptive constraints. However, nothing prevents the types and approach we propose to be used with OWL2-ER statements to express and exploit such constraints. Conversely, with our approach, subclassOf-based constraints are not restricted to OWL2-ER or OWL2 class expressions. Here are examples of OWL2-ER rules given in [Baget et al., 2015]; to express the meaning of constraints reusing them, it is sufficient to replace the words “is” or “has” by the words “must be” or “must have” in the informal descriptions (each of them is prefixed by “#”).

   ∀X ∀Y ∀Z (isProject(X, Y, Z) => isMember(Z,X))
      #Every leader of a project is a member of this project
  ∀X ∀Y (Researcher(X) ∧ hasExpertise(X,Y) => ∃Z ∃L (isProject(Z,Y,L) ∧ isMember(X,Z)))
      #Every researcher expert in an area is a member of a project in that area
  ∀X ∀Y ∀Z (isProject(X,Y,Z) ∧ isProject(X,Y,Z´) => Z = Z´)
      #Every project has at most one leader

Since our approach is purposely KRL independent and not restricted to subclassOf-based constraints (and, a fortiori, to OWL2-ER statements), more formal details on OWL2-ER – or on what representing a rule using rdfs:subClassOf entails – would be irrelevant in this article. See [Baget et al., 2015] and [Swan, 2016] for those formal points. The SPARQL code given in the Section 3.1 provides the remaining necessary formal details.


2.3.2. Proposed Types

Figure 1 uses a concise version of UML to show subclassOf relations between some classes of CSTR. Figure 2 uses the FL notation – a notation somewhat similar to Turtle but more expressive and – to show subtypeOf relations between main types of CSTR. This figure shows how many of the first-order classes are instances of some second-order classes that are similarly organised and somewhat similarly named. E.g., to state that a class (class expression or named class) is the condition of a subclassOf-based constraint one can mark it as subtype of cstr:SubclassOf-based_prescriptive_constraint_condition or instance of cstr:Type_of_subclassOf-based_prescriptive_constraint_condition. The first case implies the second one but it also specifies that, if the marked class has subclasses, each definition of these subclasses is also a condition for a constraint. The other second-order classes have the same purpose.

For subclassOf-based constraints, this article only uses 1st order classes. Indeed, most inference engines for description logics cannot handle a KB that defines 2nd-order types, i.e. a KB that exploits 2nd-order types not predefined in those description logics. As explained in the second paragraph of Section 2.3.1, the two other representation approaches that we propose use other relations to classes and hence cannot be handled by these above cited inference engines.

Because of this limitation of these engines, to use them for checking constraints on class definitions, it is necessary to pre-treat the KB by removing the individuals. In such a case, 1st order classes become individuals and their subclass relations must be removed too. Section 3.1.1 shows the basis of how to do that in SPARQL. Conversely, if such engines must be used to check constraints on individuals the 2nd order types must first be removed.

Figure 1. Concise-UML representation of subclassOf relations between some classes in the CSTR ontology
                    cstr:Constraint              cstr:Constraint_condition
                       /\      /\                         /\
                       |      |                       |
cstr:Descriptive_constraint   |      cstr:SubclassOf-based_constraint_condition
         cstr:Prescriptive_constraint                 /\         /\
                                                      |         |
     cstr:SubclassOf-based_descriptive_constraint_condition     |
                   cstr:SubclassOf-based_prescriptive_constraint_condition

Legend. Each arrow “->” represents a subClassOf relation.
For concision purposes, boxes around classes are not drawn.

Figure 2. FL representation of subtypeOf relation between main types in the CSTR ontology
cstr#binary_relation  rdf#type: rdf:Property,
 > exclusion    //cstr#binary_relation has the following 5 exclusive subtypes
   { cstr#id    //  (this exclusion set is not complete: this is not a partition)
     cstr:descriptive_constraint_conclusion    cstr:condition_class
     cstr:prescriptive_constraint_conclusion   cstr:conclusion_class
   };

cstr#Constraint  rdf#type: cstr#Type_of_constraint,
 > cstr#Descriptive_constraint  cstr#Prescriptive_constraint;

cstr#Constraint_condition
 > (cstr:SubclassOf-based_constraint_condition
      rdf#type: cstr:Type_of_subclassOf-based_constraint_condition,
      > (cstr:SubclassOf-based_descriptive_constraint_condition
           rdf#type: cstr:Type_of_subclassOf-based_descriptive_constraint_condition)
        (cstr:SubclassOf-based_prescriptive_constraint_condition
           rdf#type: cstr:Type_of_subclassOf-based_prescriptive_constraint_condition)
   );

owl#Class
  > exclusion  //owl#Class has the following 3 exclusive subclasses
    { cstr#Type_of_constraint 
      (cstr#Type_of_constraint_condition
         > (cstr:Type_of_constraint_condition_that_is_a_type
              > (cstr:Type_of_subclassOf-based_constraint_condition
                   > cstr:Type_of_subclassOf-based_descriptive_constraint_condition
                     cstr:Type_of_subclassOf-based_prescriptive_constraint_condition
                ) ) )
    };

Legend.
“>” means “subtype”.
Namespace prefixes end by “#”, not “:”.
Thus, the syntactic form “X r: Y” can be read “X has for r Y”
(although FL syntactic sugar such as “exclusion”
 slightly changes this reading: see the comments prefixed by “//”).


2.3.3. Examples

In this article, except in Figure 1 and Figure 2, the Turtle notation is used when SPARQL is not used since SPARQL reuses this notation for representing relations. For clarity purposes, the names of relation types have a lowercase initial while other names have an uppercase initial. In SPARQL, Turtle or other graph-based notations, a statement of the form “ SourceConcept relationName1 DestConcept1ForRelName1 , DestConcept2ForRelName1 ; relationName2 DestConcept1ForRelName2 , DestConcept2ForRelName2 . ” can be read “SourceConcept has for relationName1 DestConcept1ForRelName1 as well as DestConcept2ForRelName1, and has for relationName2 DestConcept1ForRelName2, and has for relationName2 DestConcept2ForRelName2”.

2.3.3.1. SubclassOf-based Constraints

Here is the positive prescriptive constraint “if X is a Person, X must have a parent in the base of bact (prescriptive-must)” represented as a subclassOf-based constraint using the CSTR ontology.

    :Person  #class and constraint condition
       rdfs:subClassOf cstr:SubclassOf-based_prescriptive_constraint_condition ;
       rdfs:subClassOf  #conclusion (types of relations that must be present):
          [rdf:type owl:Restriction;  owl:onProperty :parent;
           owl:someValuesFrom :Person].

For the general descriptive version of this constraint, it is sufficient to replace “prescriptive” by “descriptive” in the previous representation. Here is a descriptive version where parents are restricted to be named individuals.

    :Person  #class and constraint condition
       rdfs:subClassOf cstr:SubclassOf-based_descriptive_constraint_condition ;
       rdfs:subClassOf  #conclusion (types of relations that must be present):
         [rdf:type owl:Restriction;  owl:onProperty :parent;
          owl:someValuesFrom :Named_person].
    :Named_person owl:equivalentClass
       [rdf:type owl:Class ;
        owl:intersectionOf ( :Person
                             [rdf:type owl:Restriction;  owl:onProperty cstr:id;
                              owl:someValuesFrom xsd:string]
                           )].

The negative constraint “if X is a Person_without_parent, X must not have a parent” may be translated into the inconsistency-implying form “X is a Person_without_parent =>> false”. Here is its representation as a subclassOf-based constraint using the CSTR ontology.

    :Person_without_a_parent  #class and constraint condition
       rdfs:subClassOf cstr:SubclassOf-based_constraint_condition ; #optional line
       owl:equivalentClass  #definition and constraint condition
          [rdf:type owl:Restriction;  owl:onProperty :parent;
           owl:maxCardinality  "0"^^xsd:nonNegativeInteger] ;
       rdfs:subClassOf owl:Nothing. #conclusion

2.3.3.2. SubclassOf-analog Constraints

For the positive prescriptive constraint “if X is a Person, X must have a parent in the base of bact (prescriptive-must)”, using the CSTR ontology:

    :Person  #class and constraint condition
       cstr:prescriptive_constraint_conclusion
          [rdf:type owl:Restriction;  owl:onProperty :parent;
           owl:someValuesFrom :Person].

For the general descriptive version of this constraint, it is sufficient to replace “prescriptive” by “descriptive” in the previous representation. Here is a descriptive version where parents are restricted to be named individuals.

    :Person  #class and constraint condition
       cstr:descriptive_constraint_conclusion
         [rdf:type owl:Restriction;  owl:onProperty :parent;
          owl:someValuesFrom :Named_person]. #as previously defined.

Since there is no downside in using rdfs:subClassOf for negative constraints, CSTR does not propose a replacement for it in this section and the next.

2.3.3.3. Individual-based Constraints

For the positive prescriptive constraint “if X is a Person, X must have a parent in the base of bact (prescriptive-must)”, using the CSTR ontology:

    [rdf:type cstr:Prescriptive_constraint;
     cstr:condition_class :Person;
     cstr:conclusion_class [rdf:type owl:Restriction;  owl:onProperty :parent;
                            owl:someValuesFrom :Person].

For the general descriptive version of this constraint, it is sufficient to replace “prescriptive” by “descriptive” in the previous representation. Here is a descriptive version where parents are restricted to be named individuals.

    [rdf:type cstr:Descriptive_constraint;
     cstr:condition_class :Person;
     cstr:conclusion_class [rdf:type owl:Restriction;  owl:onProperty :parent;
                            owl:someValuesFrom :Named_person].




3. Exploiting Relations-between-classes Based Constraints in SPARQL and Javascript

3.1. SPARQL Commands for Checking SubclassOf-based Constraints

In some extensions of SPARQL, e.g. LDScript [Corby, Faron-Zucker & Gandon, 2017], the next commands (queries or update requests) can be grouped into scripts or functions. The commands of Section 3.1.1 and Section 3.1.2 are kb-pretreatments and hence update requests. The commands of the remaining sections are queries. These queries are organised by kind and size. At first, the difficulty to understand them is directlyrelted to their size. In SPARQL, variable names begin by “?”.

3.1.1. Examples of KB Pre-treatments

3.1.1.1. Temporarily Removing Individuals For Checking Constraints on Types With Classic Description Logic Inference engines

The next command illustrates the pre-treatment referred to in the paragraph before Figure 1 in Section 2.3. The WHERE clause of this command selects each object ?o that does not have for type rdfs:Class, hence that is an individual. The DELETE clause removes the rdf:type relations from ?o and, from the destinations of these relations, removes the rdfs:subClassOf relations when they exist. As replacements for these rdfs:subClassOf relations, the INSERT clause sets cstr:type and cstr:subClassOf relations. These substitute relations later permit the re-assertion of the initial rdf:type and rdfs:subClassOf relations, when needed, via a similar command.

DELETE { ?o rdf:type  ?t .  ?t rdfs:subClassOf ?superClass . } 
 INSERT{ ?o cstr:type ?t .  ?t cstr:subClassOf ?superClass . }
 WHERE { ?o rdf:type  ?t .  ?t rdfs:subClassOf ?superClass . 
         #only when ?o is an individual:
         FILTER NOT EXISTS { ?o rdf:type rdfs:Class }
         FILTER NOT EXISTS { ?o rdf:type rdf:Property }
       }

3.1.1.2. Creating “Clones Without Types” of Objects For Exploiting These Objects Without Inheritance Mechanism

In Section 2.2.2.1, we described an approach to avoid the relation lookup mechanism we abbreviated as “inheritance”. SPARQL does not provide a way to suppress inheritance within a query, hence for our purposes within the part of a query that checks the conclusion of rule-based positive constraint. E.g., it does not allow the selection of a particular entailment regime within a query. However, the inheritance bypassing method given in Section 2.2.2 can be implemented in SPARQL. The command below (here, an INSERT request) shows how, assuming for the sake of clarity that there is no 2nd-order types nor anonymous individuals in the KB. For every object ?o in the KB, if this object is an individual, this command creates ?o2, a partial copy of ?o that has the same relations except for rdf:type relations. This partial copy has for identifier the one of ?o but with the suffix “_cloneWithoutType“. This command also relates ?o to ?o2 by a relation of type cstr:cloneWithoutType. With a SPARQL1.1 extension such as LDScript [Corby, Faron-Zucker, Gandon, 2017], such a pre-treatment of the KB is not needed: as illustrated below, a SPARQL CONSTRUCT query can be embedded into a SELECT query and thus permits the creation of the clones “on the fly”.

INSERT { ?o cstr:cloneWithoutType ?o2 . ?o2 ?r ?dest } WHERE
{ ?o ?r ?dest .   FILTER (?r != rdf:type) 
  FILTER NOT EXISTS { ?o rdf:type rdfs:Class }
  FILTER NOT EXISTS { ?o rdf:type rdf:Property }
  BIND (uri(concat(str(?o), "_cloneWithoutType")) as ?o2)
}

3.1.1.3. Creating “Identifier relations” for Named Individuals

???

INSERT { ?uri cstr:ID ?str } 
WHERE
{   { ?uri ?p ?o } union { ?s ?p ?uri }
    minus { ?uri a rdf:Property }
    minus { ?uri a rdfs:Class }
    minus { ?uri a owl:Class }
    
    FILTER isURI(?uri)
    
    filter (! strstarts(?uri, owl:))
    filter (! strstarts(?uri, rdfs:))
    filter (! strstarts(?uri, rdf:))
    filter (! strstarts(?uri, xsd:))
    
    bind (str(?uri) as ?str)
}


3.1.2. Checking Individual-based Positive Constraints

3.1.2.1. Checking Individual-based Positive Descriptive Constraints

The next query lists every object violating an individual-based positive descriptive constraint, including one restricted As shown by the code, such an object satisfies two conditions. First, this object matches – and hence has for type – the condition of a constraint ?posConstr that is of type cstr:Descriptive_constraint. Second, this object does not match – and hence has not for type – the conclusion of the constraint. Thus, this query requires a SPARQL engine that has an entailment regime enabling the matching (alias, categorization) of an individual with respect to a class expression and thence the deduction of an rdf:type relation between them. In the code of the commands below, such deduced rdf:type relations are highlighted in bold. If, for example, all individuals, constraint conditions and constraint conclusions are described in OWL2-QL, then an OWL2-QL entailment regime is required and sufficient. In that case, an inference engine that can handle the expressiveness of OWL2-QL is required.

SELECT ?objectNotMatchingPosConstr ?cloneWithoutType ?posConstr WHERE
{ ?posConstr rdf:type cstr:Descriptive_constraint ;
             cstr:condition_class ?posConstr_condition ;
             cstr:conclusion_class ?posConstr_conclusion .
  ?objectNotMatchingPosConstr rdf:type ?posConstr_condition.
  FILTER NOT EXISTS  #objects satisfying the conclusion must NOT be listed
  { ?objectNotMatchingPosConstr rdf:type ?posConstr_conclusion }
}

3.1.2.2. Checking Individual-based Positive Prescriptive Constraints
SELECT ?objectNotMatchingPosConstr ?cloneWithoutType ?posConstr WHERE
{ ?posConstr rdf:type cstr:SubclassOf-based_constraint ;
             cstr:condition_class ?posConstr_condition ;
             cstr:conclusion_class ?posConstr_conclusion .
  FILTER NOT EXISTS { ?posConstr rdfs:subClassOf owl:Nothing }   
  ?objectNotMatchingPosConstr rdf:type ?posConstr_condition.
  #the two functions called within the next two lines are defined below
  BIND( us:getCloneWithoutType(?objectNotMatchingPosConstr) as ?cloneWithoutType )
  BIND( us:copyKbButWithCloneWithoutType(?objectNotMatchingPosConstr, 
                                         ?cloneWithoutType) as ?g )
  FILTER NOT EXISTS { GRAPH ?g { ?cloneWithoutType rdf:type ?posConstr_conclusion } }
}


3.1.3. Checking SubclassOf-analog Positive Constraints

3.1.3.1. Checking SubclassOf-analog Positive Descriptive Constraints
SELECT ?objectNotMatchingPosConstr ?posConstr WHERE
{ ?posConstr cstr:prescriptiveConclusion ?posConstr_conclusion .
  FILTER NOT EXISTS { ?posConstr rdfs:subClassOf owl:Nothing }   
  ?objectNotMatchingPosConstr rdf:type ?posConstr. #matches condition
  FILTER NOT EXISTS  #objects satisfying the conclusion must NOT be listed
  { ?objectNotMatchingPosConstr rdf:type ?posConstr_conclusion }
}

3.1.3.2. Checking SubclassOf-analog Positive Prescriptive Constraints
SELECT ?objectNotMatchingPosConstr  ?posConstr WHERE
{ ?posConstr cstr:prescriptiveConclusion ?posConstr_conclusion .
  FILTER NOT EXISTS { ?posConstr rdfs:subClassOf owl:Nothing }   
  ?objectNotMatchingPosConstr rdf:type ?posConstr. #matches condition
  BIND ("/user/corby/home/AATest/test/phm3/kb0-5b.owl" as ?kb)
  BIND( us:getKB(?kb) as ?g )
  FILTER NOT EXISTS { GRAPH ?g { ?objectNotMatchingPosConstr rdf:type ?posConstr_conclusion } }
}

FUNCTION us:getKB (?kb)
{ LET (?g = xt:load(?kb)) { xt:entailment(?g) }  #triggers inferences on ?g
}


3.1.4. Checking SubclassOf-based Positive Constraints

3.1.4.1. Checking SubclassOf-based Positive Descriptive Constraints



3.1.4.2. Checking SubclassOf-based Positive Prescriptive Constraints

The next query lists every object violating a positive prescriptive constraint. As shown by the code, such an object satisfies two conditions. First, this object matches – and hence has for type – the condition of a constraint ?posConstr that is a subclass of cstr:SubclassOf-based_prescriptive_constraint_condition-ER and that does not have owl:Nothing in its conclusion. Second, the “clone without type” of such an object does not match – and hence has not for type – the conclusion of the constraint, i.e. its superclass. Thus, this query requires a SPARQL engine that has an entailment regime enabling the matching (alias, categorization) of an individual with respect to a class expression and thence the deduction of an rdf:type relation between them. In the code of the commands below, such deduced rdf:type relations are highlighted in bold. If, for example, all individuals, constraint conditions and constraint conclusions are described in OWL2-QL, then an OWL2-QL entailment regime is required and sufficient. In that case, an inference engine that can handle the expressiveness of OWL2-QL is required.

SELECT ?objectNotMatchingPosConstr ?posConstr  WHERE
{ #this command body (between this first '{' and the last '}' is referred to in Section 3.1.6
  ?posConstr rdfs:subClassOf cstr:SubclassOf-based_prescriptive_constraint_condition,
                             ?posConstr_conclusion .  #initializes this variable
  FILTER NOT EXISTS { ?posConstr rdfs:subClassOf owl:Nothing }
  ?objectNotMatchingPosConstr rdf:type ?posConstr. #matches condition
  FILTER NOT EXISTS  #objects satisfying the conclusion must not be listed
  { BIND(uri(concat(str(?objectNotMatchingPosConstr),"_cloneWithoutType"))
         as ?cloneWithoutType) 
    ?cloneWithoutType rdf:type ?posConstr_conclusion
  }
}

Here is the same query in LDScript, with an embedded query that temporarily creates the above cited partial copies “on the fly”, thus removing the necessity to modify the KB.

SELECT ?objectNotMatchingPosConstr ?posConstr WHERE
{ ?posConstr rdfs:subClassOf cstr:SubclassOf-based_prescriptive_constraint_condition ;
             cstr:implication ?posConstr_conclusion .
  FILTER NOT EXISTS { ?posConstr rdfs:subClassOf owl:Nothing }   
  ?objectNotMatchingPosConstr rdf:type ?posConstr. #matches condition
  #the two functions called within the next two lines are defined below
  BIND( us:getCloneWithoutType(?objectNotMatchingPosConstr) as ?cloneWithoutType )
  BIND( us:copyKbButWithCloneWithoutType(?objectNotMatchingPosConstr, 
                                         ?cloneWithoutType) as ?g )
  FILTER NOT EXISTS { GRAPH ?g { ?cloneWithoutType rdf:type ?posConstr_conclusion } }
}

FUNCTION us:getCloneWithoutType (?object)
{ uri(concat(str(?object),"_cloneWithoutType")) }

FUNCTION us:copyKbButWithCloneWithoutType
         (?objectNotMatchingPosConstr, ?cloneWithoutType)
{ LET (?g = CONSTRUCT { ?cloneWithoutType ?r ?dest .  ?x ?r2 ?y } WHERE
            { values ?cloneWithoutType { UNDEF }
              ?objectNotMatchingPosConstr ?r ?dest .  FILTER (?r != rdf:type)
              ?x ?r2 ?y .  FILTER (?x != ?objectNotMatchingPosConstr)
            }) 
  { xt:entailment(?g) }  #triggers inferences on ?g
}


3.1.5. Checking Inconsistency-implying Constraints

The next command lists every object violating an inconsistency-implying constraint, i.e. every object that matches – and hence has for type – a type ?negConstr subclass of the type cstr:SubclassOf-based_constraint_condition and that has owl:Nothing as conclusion. Since for inconsistency-implying constraints, there is no distinction between descriptive and prescriptive, this query is much simpler than the previous one and even useless if the inference engine used for matching objects is used directly to check all the KB.

SELECT ?objectMatchingNegConstr ?negConstr 
 WHERE { ?negConstr rdfs:subClassOf cstr:SubclassOf-based_constraint_condition ,
                                    owl:Nothing .
         ?objectMatchingNegConstr rdf:type ?negConstr .
       }


3.1.6. Checking Binary Relations Instead of Individuals

To list binary relations violating prescriptive constraints instead of individuals (that have some relations violating constraints), it is sufficient to replace rdf:type by the “logical implication relation between statements” in the previous two queries. For referring to such a relation, Tim Berners-Lee uses the name log:implies [Berners-lee et al., 2008] in his Notation3 KRL. For this to work, the used SPARQL engine must exploit an inference engine that can deduce the existence of such a relation when it exists between the matched statements. Like queries on individuals, queries on relations can use additional filters. For example, for the previous command to operate only on negative statements, one may add at the end of its body:

?objectMatchingNegConstr rdf:type owl:NegativeObjectPropertyAssertion .


3.1.7. Evaluating the Completeness of a KB

A simple way to define or calculate the completeness degree of a KB is to divide “the number of relations (in the KB) that do not violate prescriptive constraints” by “the total number of objects (linked to at least another object)”. Instead of relations, the next command search individuals but can be adapted as explained in Section 3.1.5 to implement the above definition of completeness.

SELECT ( ((?nbObjs - ?nbAgainstPosCs - ?nbMatchingNegCs) / ?nbObjs)
         AS ?completeness)
{ {SELECT (COUNT(DISTINCT ?o) AS ?nbObjs)
    WHERE { ?o ?r ?o2 } } #any object related to another
        # For considering only objects that have a type:
        # { {?o rdf:type ?t1} UNION {?o cstr:type ?t2} }
  {SELECT(COUNT(DISTINCT ?objectNotMatchingPosConstr) AS ?nbAgainstPosCs)
    WHERE { ... #the body of the 2nd command should be copied here
          } }
  {SELECT (COUNT(DISTINCT ?objectMatchingNegConstr) AS ?nbMatchingNegCs)
    WHERE { ... #the body of the 3rd command should be copied here
          } }
}



3.2. JavaScript Function for Ontology Completeness Evaluation

Like the previous SPARQL commands, the next function provides a way to list each constraint violation – and/or calculate the completeness of the ontology if the constraints are prescriptive – but there are a few differences. First, the completeness is now given with respect to each constraint in a particular set, not to all of them at once, and for the instances of a particular set of classes, not all individuals. Second, error handling is included. Third, the method specializes is called to perform the matching between relations associated to an object and relations in a constraint condition or conclusion, i.e., to test if each relation in the second set of relations has a specialization in the first set (details on such a specialization test can be found in [Zaveri et al., 2016], [Chein & Mugnier, 2008] and [Swan, 2016]). This method can ignore the instanceOf relations from the handled object, and it needs to do so if the checked constraint is a prescriptive one, i.e., if its parameter is instance of cstr:Prescriptive_constraint. Hence, the code below needs not include a temporary removal of instanceOf relations from the objects.

JavaScript is used instead of pseudo-code since its syntactic sugar is rather well-known or intuitive and is object-oriented. This previous feature is required here to have a code that is short, clear and generic with respect to error handling. The objects in the ontology are assumed to be accessible in the object-oriented way used in the code.

The complexity of the next function is the complexity of the object matching multiplied by the number of given constraints, multiplied by the number of given classes, multiplied by the average number of instances by class.

function checkThatInstancesOfTheseClassesSatisFyTheseConstraints
         (setOfClasses,     //e.g.: {Man, Woman}
          setOfConstraints, //e.g.: {"if X is a Person, X has a parent"}
          errorHandling) //contains a class for generic error handling
{ if (setOfConstraints.length() == 0)
    return errorHandling.returnCodeIfNoPropertyToCheck;
  if (setOfClasses.length() == 0)
    return errorHandling.returnCodeIfNoClassToCheck;
  for co in setOfConstraints  //co may or may not be inconsistency-implying
  { var numberOfObjsSatisfyingCo= 0, numberOfObjsNotSatisfyingCo= 0;
    for cl in setOfClasses
      for obj in cl.instances()
        if (obj.specializes(co.condition))
          if ((co.isPositiveConstr() && obj.specializes(co.conclusion))
            numberOfObjsSatisfyingCo++;
          else { numberOfObjsNotSatisfyingCo++;
                 if (errorHandling.alertAtEachError) 
                   alert("Relations from '" + obj.toString() + 
                         "' do not satisfy:\n  " + co.toString());   
                  //e.g.: Relations from 'Tom' do not satisfy:
                } //        if X is a Person, X has a parent
    var completenessForThatConstraint= numberOfObjsSatisfyingCo /
               (numberOfObjsSatisfyingCo + numberOfObjsNotSatisfyingCo);
    if (completenessForThatConstraint < 1)
    { if (errorHandling.alertAtEachIncompleteness)
        alert("Only " + completenessForThatConstraint + "% of instances"
              + " of the given classes satisfy the constraint:\n  " + 
              + co.toString());
      if (errorHandling.returnCodeIfIncompletenessForSomeConstraint)
        return errorHandling.returnCodeIfIncompletenessForSomeConstraint;
    }
  }
  return errorHandling.returnCodeIfNoError;
}




4. Examples of Applications and Limits

4.1. Application to Checking the Design of a Subtype Hierarchy

Several research works in knowledge acquisition, model-driven engineering or ontology engineering, e.g. [Marino, Rechenmann & Uvietta, 1990] [Bachimont, Isaac & Troncy, 2002] [Dromey, 2006] [Rector et al., 2012], have advocated the use of tree structures when designing a subtype hierarchy, hence the use of i) single inheritance only, and ii) multiple views or viewpoints, one for each tree. Indeed, they argue that objects have a unique place in such trees and hence that such trees can be used as decision trees or ways to avoid redundancies, normalize KRs and ease KR search/handling. This is true but subtyping exclusively via subtype partitions, i.e. disjoint unions of types, also has the same advantages (for the same reasons) while i) still allowing multi-inheritance, ii) not requiring a special inference engine to handle views (e.g. [Marino, Rechenmann & Uvietta, 1990] [Djakhdjakha, Hemam & Boufaïda, 2014]), and iii) being less cumbersome for knowledge providers since with such partitions they do not have to specify or use views each time they create subtypes.

We have realized that, in order to keep the above cited advantages, it is necessary and sufficient that whenever two types are exclusive – i.e., whenever they should not share common subtypes or instances – the exclusion relation that links these two types should be specified, explicitly or implicitly. With tree structures, such exclusion relations are implicit but still specified. By systematically using subtype partitions, all exclusion relations can be easily and explicitly specified, e.g. via the use of binary partitions such as “Spatial_entity / Non-spatial_entity”. Exclusion relations are a simple and efficient way to support ontology checking and knowledge retrieval. For example, when searching for particular “achievements” in a KB, people may not know which exact type to use and then which query to use: whether a query for specializing objects, for generalizing ones, for doing both with a maximal number of specializing/generalizing relations, etc. In this case, it may be easier for them to search for instances of all types not exclusive with some type they have in mind. This query may well collect all and only the objects they are interested in, if the ontology is systematically structured using subtype partitions.

However, unlike for natural types (or natural kinds), using subtype partitions is a bit cumbersome for non-natural types [Guizzardi, 2005] [Mizoguchi, Kozaki & Kitamura, 2012] such as role types and phase types. Examples of general role types are Agent, Experiencer, Recipient, Cause and Instrument. Examples of more specific ones are Driver, Employee and King. Examples of phase types are Beginning and Child. Examples of role type partitions are “Cause / Non-cause” and “King / Non-king”. Systematically specifying such partitions is often cumbersome and useless since role types such as Non-cause and Non-king will probably not be instantiated. The manual or automatic instantiation of non-natural types such as Child, King or Taxi-driver may be useful but, for reusability purposes, should be done in a way that contextualizes that instantiation to some region of time or space. Such contextualization is not needed for natural types. More generally, for ontology checking or search purposes, it is interesting to distinguish non-natural types from natural ones. Using subtype partitions for non-natural types would not help and the classic solution of connecting each non-natural type to a second-order type via an instanceOf relation is quite cumbersome too. Instead, to create a non-natural subtype, we propose the use of the relation type cstr:nonNaturalSubclass or subtypes of it. This relation type is the inverse of the relation type cstr:nonNaturalSubclassOf which is a subtype of rdfs:subClassOf. It has a range restricted to cstr:NonNaturalClass which is a subtype of rdfs:Class since non-natural types are concept types which are not datatypes. When – and only when – an ontology using cstr:nonNaturalSubclass relation types has to be exploited by an inference engine that does not handle subtypes of rdfs:subClassOf, a pre-treatment that expands the definition of these subtypes – and thus removes these subtypes – must first be applied on the ontology.

To sum up, for reusability purposes, we propose that subtyping should only be done via subtype partitions or via cstr:nonNaturalSubclass (or subtypes of it). In the OWL2 model, “via subtype partitions” means stating that the subtyped class is equivalent to – or is a subclass of – a subtype partition. The relation types cstr:equivDisjointUnion and cstr:subclassOfDisjointUnion can be defined to refer to these two ways. The next assertion shows how this last type can be defined using the OWL2 model and the Turtle notation. The definition of cstr:equivDisjointUnion is identical except that rdfs:subClassOf is replaced by owl:equivalentClass.

cstr:subclassOfDisjointUnion #from a class to a set of classes
  rdfs:domain owl:Class ;  #the source class
  rdfs:range [rdf:type owl:AllDisjointClasses] ;  #the set of classes
  owl:propertyChainAxiom ( rdfs:subClassOf owl:unionOf ).

Then, to refer to the relation between a class and any of its direct subtypes specified via the use of a subtype partition or cstr:nonNaturalClass, we can define cstr:nonNaturalOrPartitionSubclass. Using Turtle and the OWL2 model:

cstr:subclass owl:inverseOf rdfs:subClassOf .

cstr:nonNaturalSubclass 
  rdfs:subPropertyOf cstr:nonNaturalOrPartitionSubclass ;
  rdfs:domain owl:Class ;    rdfs:range cstr:NonNaturalClass.

cstr:subclassOfDisjointUnionThatIncludes
  rdfs:subPropertyOf cstr:nonNaturalOrPartitionSubclass ; 
  owl:propertyChainAxiom ( cstr:subclassOfDisjointUnion rdfs:member ).

cstr:equivDisjointUnionThatIncludes
  rdfs:subPropertyOf cstr:nonNaturalOrPartitionSubclass ; 
  owl:propertyChainAxiom ( cstr:equivDisjointUnion rdfs:member ).

Now, a positive prescriptive constraint can be defined to ensure that “if there is a subclass relation between two classes, this relation must be of type cstr:nonNaturalOrPartitionSubclass”. An equivalent constraint is: “if a class C1 has a subclass relation to a class C2, this relation must be of type cstr:nonNaturalOrPartitionSubclass”. Since OWL2 does not allow the use of variables to refer to the relation or to C2, a subclassOf-based version is: “if a class C1 has a subclass relation, C1 must have a relation of type cstr:nonNaturalOrPartitionSubclass”. A subclassOf-based definition of that constraint is given below, using Turtle. To deal with the ambiguity of the last version and perform the intended checking, the objects of the KB must not be checked directly (with the SPARQL command of Section 3.1.3). Instead, the SPARQL1.1 command (specified in Section 3.1.5) that checks each relation, one by one, must be used, i.e. the command that uses log:implies.

cstr:Subclass  #class - and constraint (condition) - referring to
               # i) a class that has a subclass, and hence also
               # ii) a subclassOf relation from a class
  rdfs:subClassOf cstr:SubclassOf-based_prescriptive_constraint_condition ;
  owl:equivalentClass             #this defines the class/condition:
    [rdf:type owl:Restriction ;   #  "any class that has a subclass"
     owl:onProperty cstr:subclass ;   owl:someValuesFrom rdfs:Class ];
  rdfs:subClassOf #conclusion (types of relations that must be present):
    [rdf:type owl:Restriction ;   #any class satisfying the next 2 lines
     owl:onProperty cstr:nonNaturalOrPartitionSubclass;
     owl:someValuesFrom rdfs:Class ].



We believe that the above positive constraint is worth generalizing and applying to all transitive relations used in an ontology, e.g. part relations. The reasons why the destinations of subtype relations are worth being organized into subtype partitions are also the reasons why the destinations of part relations are worth being organized into part partitions. The definition of a part partition for an object mirrors the one for a subtype partition: it is a complete set of disjoint parts of the object; “disjoint” meaning that the parts cannot share common subparts, and “complete” meaning that the union of the parts is equal to the object. The notion of non-natural subtype does not seem to have an intuitive equivalent for part relations but this is not a problem.

Since OWL2-ER or subclassOf-based constraints do not enable the use of variables for relation types, they do not enable the specification of a unique prescriptive constraint for all transitive relations used in an ontology. One solution then is, for each transitive relation type that is not a subclass relation and that needs to be checked, to copy and adapt the above subclass related definitions and positive constraint. The creation of such positive constraints can be automatized using the next SPARQL1.1 command. It takes as a parameter each instance of owl:TransitiveProperty specified in the ontology, i.e. each transitive relation type ?trRelType, e.g. cstr:subclass and cstr:part which is the inverse of the general partOf relation type. From each one, via a cstr:nonNaturalOrPartitionTrRelType relation, it accesses the recommended kind of relation types to use for such transitive relations. For example, for cstr:subclass relations, we have seen that this recommended supertype can be cstr:nonNaturalOrPartitionSubclass. For cstr:part relations, this supertype can be cstr:partitionPart. OWL2 does not support a full definition of this last type but still enables its declaration, along with its domain and range, and hence permits its use and some checking of its uses.

INSERT 
{ ?TrRelTypeDomain  #class for (the source of) a transitive relation
    rdfs:subClassOf cstr:SubclassOf-based_prescriptive_constraint_condition ;
    owl:equivalentClass             #this defines the condition:
      [rdf:type owl:Restriction ;   #  "source of a transitive relation"
       owl:onProperty ?trRelType; #e.g., cstr:part
       owl:someValuesFrom owl:Thing]; #any thing, class or individual
    rdfs:subClassOf  #constraint conclusion:
      [rdf:type owl:Restriction ; #any class satisfying the next 2 lines
       owl:onProperty ?nonNaturalOrPartitionTrRelType; #e.g.,
       owl:someValuesFrom owl:Thing]                 #cstr:partitionPart
}
WHERE { ?trRelType 
          rdf:type owl:TransitiveProperty ;
          cstr:nonNaturalOrPartitionTrRelType 
              ?nonNaturalOrPartitionTrRelType ;
          rdfs:domain ?TrRelTypeDomain }     #e.g., owl:Thing (see below)

Thus, based on the next two statements, here in Turtle, the above command generates a prescriptive constraint for checking cstr:subclass relations as well as the corresponding prescriptive constraint for checking cstr:part relations. The generated prescriptive constraint for checking cstr:subclass relations is not identical to the one given in the previous sub-section since owl:Thing is used instead of rdfs:Class as destination of an owl:someValuesFrom relation. However, this loss of precision is not a problem for checking cstr:subclass relations. On the other hand, as illustrated here with cstr:part, the above command may make owl:Thing the condition of a constraint. If so, owl:Thing is restricted via owl:equivalentClass and rdfs:subClassOf relations. This may be a problem for some classic Description Logic inference engines. For example, owl:Thing cannot be used as source or destination of an rdfs:subClassOf relation in OWL-RL [OWL 2 profiles]. However, it is often easy and relevant to use a class other than owl:Thing as the domain of a transitive relation type, e.g. cstr:Spatial_entity as the domain of cstr:spatial_part.

cstr:subclass rdf:type owl:TransitiveProperty; rdfs:domain rdfs:Class ;
              cstr:nonNaturalOrPartitionTrRelType 
                                   cstr:nonNaturalOrPartitionSubclass .
cstr:part rdf:type owl:TransitiveProperty;   cstr:trRelDomain owl:Thing;
          cstr:nonNaturalOrPartitionTrRelType cstr:partitionPart .



4.3. Checking that Certain Relation Types Are Systematically Used

The above kinds of prescriptive constraints check only that if a transitive relation type is used, it is used via certain subtypes of it. A complementary kind of prescriptive constraint is needed to check that if certain kinds of relation types are declared in an ontology, every object instance of the domain (resp. range) of such relation types should be the source (resp. destination) of such relations. The 2nd-order type cstr:MandatoryOutRelationType (resp. cstr:MandatoryInRelationType) can be used to specify or access such relation types. Then, from such types, relations of type cstr:leafObjectType can be used to indicate the type that can be used to specify or access the objects (classes or individuals) that, as an exception, should not be checked. For example, the following statement is sufficient to state that every dividable object – i.e. every instance of cstr:DividableThing – should be the source of a cstr:part relation except for each object instance of cstr:PartDestLeaf. This exception makes explicit the choice of an ontology creator not to use an object as source of a cstr:part relation even when it is possible.

cstr:part rdf:type cstr:MandatoryOutRelationType ; 
          cstr:leafObjectType cstr:PartDestLeaf ;
          rdfs:domain cstr:DividableThing .

Below is a SPARQL1.1 command that generates such a subclassOf-based prescriptive constraint for each instance of cstr:MandatoryOutRelationType. In this command, each occurrence of “out” should be replaced by “in” to do the same for each instance of cstr:MandatoryInRelationType. If SPARQL1.1 is used for checking these generated prescriptive constraints, the command (specified in Section 3.1) that checks all relations from an object should be used, not the command that checks each relation one by one. (This is the reverse when checking the use of certain relations, as noted in Section 4.1) When the prescriptive constraint generated by this command is about relations which are not between classes, e.g. cstr:part relations, a classic Description Logic inference engine can be reused to check this constraint, although this may require a temporary removal of relations from types such as cstr:leafObjectType.

INSERT
{ ?MandatoryOutRelTypeDomain  #source class of mandatory "out relation"
    rdfs:subClassOf cstr:SubclassOf-based_prescriptive_constraint_condition ;
    owl:equivalentClass   #condition: ?MandatoryOutRelTypeDomain minus
      [rdf:type owl:Class ;                  # ?OutRelLeafDestType
       owl:intersectionOf ( ?MandatoryOutRelTypeDomain
                            [rdf:type owl:Class ;
                             owl:complementOf ?OutRelLeafDestType] ) ];
    rdfs:subClassOf  #constraint conclusion: there must be a relation of
      [rdf:type owl:Restriction;                     #  type ?outRelType
       owl:onProperty ?outRelType;   owl:someValuesFrom owl:Thing ]
}
WHERE { ?outRelType rdf:type cstr:MandatoryOutRelationType ;
        cstr:leafObjectType ?OutRelLeafDestType ;
        rdfs:domain ?MandatoryOutRelTypeDomain }




5. Evaluation, Comparisons and Conclusion

Our approach enables i) the representation of prescriptive constraints with any KRL the expressiveness of which is at least equal to RDFS, and ii) the exploitation of any inference engine, including via SPARQL queries. This is its originality. The first contribution of this article is to have shown how, to which extent (i.e, with which kinds of KB pre-treatment to compensate for the limits of SPARQL or OWL2), and why: i) the possibility of representing prescriptive constraints via KRs instead of via queries or ad hoc procedures, and hence also ii) the possibility of reusing these constraints and inference engines for various tasks related to checking or evaluating the completeness of a KB. E.g., to build a KB, a knowledge engineer can represent some ODPs or generic task models [Breuker & van de Velde, 1994] as prescriptive constraints, use an advanced inference engine to check the completeness of his ontology and, guided by the results, elicit the missing knowledge from experts.

A second contribution of this article is to generalize a single inheritance based ODP for subtypeOf relations into a multi-inheritance based ODP that i) works on any transitive relation, ii) keeps the advantages of the single inheritance based ODP, and iii) is more flexible and not dependent on special tools.

Our approach is difficult to evaluate theoretically since it relies on other methods and tools. It inherits from their theoretical or practical improvements. [Baget et al., 2015] and [Swan, 2016] list theoretical points relevant for our approach. We have only validated our approach experimentally, by using it for checking ontologies or calculating their degree of completeness. This degree is weak for constraints implementing ontology design patterns that are not well known, such as those in Section 4. We shall continue these experiments and add their analyzes to this article. Regarding the use of SPARQL to check constraints, [Tao et al., 2010] shows that SPARQL can be used for both expressing and validating integrity constraints based on some partial forms of the Unique Name Assumption and Closed World Assumption. It also shows that this validation is sound and complete when the expressiveness used for the kb and the constraints are respectively “SRI and SROIQ” or “SROIQ and SROI”. In our approach, queries are used only for validating constraints, not expressing them, but this is only a generalization of the approach of [Tao et al., 2010] that does not change the associated theoretical results. In [Tao et al., 2010], the used partial forms of the Unique Name Assumption and Closed World Assumption are specified in SPARQL via its operators EXISTS and NOT EXISTS plus the use of relations of type owl:sameAs or owl:differentFrom. These forms can be similarly expressed via the commands seen in Section 3 and the use of relations of type owl:sameAs or owl:differentFrom in the constraints.

Since our approach is based on a particular use of RDFS it should be compared to SHACL and SPIN.

Some transformation languages or systems exploit KRs. [Zamazal & Svátek, 2015] and [Corby & Faron-Zucker, 2015] present such systems. Although few of them explicitly have a function to detect KR patterns without also transforming the matched KRs (e.g., PatOMat [Zamazal & Svátek, 2015] is an exception), these languages or systems could easily be adapted to have such a function and hence be used for handling prescriptive constraints. However, to our knowledge, all such systems use rules more expressive than those usable in subclassOf-based constraints. E.g., these systems generally use rule-based languages allowing the direct and explicit use of variables for relating objects shared by both the condition and conclusion of a rule. Using such languages can simplify the writing of prescriptive constraints. However, regarding what can be expressed and checked via prescriptive constraints, this article show that i) much can be achieved simply using subclassOf-based constraints and SPARQL1.1, and ii) the power of our approach then relies on the power of the inference engine used for object matching (rather than on the used language).

Some transformation systems, like PatOMat [Zamazal & Svátek, 2015], issue SPARQL queries for detecting patterns, based on specifications for patterns and their transformations in another language. Some other transformation systems directly propose an extension of SPARQL such as STTL [Corby & Faron-Zucker, 2015] to write specifications for patterns and their transformations. For instance, as shown in [Corby, Faron-Zucker & Gazzotti, 2016], STTL can be combined with LDScript [Corby, Faron-Zucker & Gandon, 2017] to specify STTL queries (compiled into SPARQL queries) detecting patterns and then transforming the results. However, [Corby, Faron-Zucker & Gazzotti, 2016] does not discuss the exploitation of object matching capabilities of inference engines and it does not distinguish between prescriptive constraints and non-prescriptive ones. Our SPARQL commands, including those generating prescriptive constraints, could thus also be reused in these transformation systems, in an adapted form. Alternatively, using scripts may be more flexible, as shown with JavaScript in Section 3.2. We will continue to work in both directions, respectively using STTL+LDScript and JavaScript, especially for prescriptive constraints related to the use of given KRL models and KRL notations. To date, bases for this future work are mostly unrelated and discussed in [Corby, Faron-Zucker & Gazzotti, 2016] and [Martin & Bénard, 2017a, 2017b].




6. Acknowledgments

We warmingly thank Dr Olivier Corby (Wimmics and SPARKS teams of, respectively, the INRIA and I3S CNRS laboratories at the University Côte d'Azur, France) for his questions and remarks on the approaches presented in this article and his help in implementing and testing these approaches with SPARQL and LDScript via the Corese tool.

We are also grateful for the interesting feedback provided by reviewers of our article for EGC 2018. This EGC article was a preliminary and much smaller French version of the current article.




7. References

  1. Assmann U., Wagner G (2006). Ontologies, metamodels and model-driven paradigm. In: Ontologies for Software Engineering and Technology, Springer-Verlag, Berlin, chapter 9.
  2. Bachimont B., Isaac A., Troncy R. (2002). Semantic Commitment for Designing Ontologies: A Proposal. In: EKAW 2002, Knowledge Engineering and Knowledge Management: Ontologies and the Semantic Web, LNCS, vol 2473, pp. 114–121, Springer Berlin, Siguenza, Spain.
  3. Baget J., Gutierrez A., Leclère M., Mugnier M., Rocher S., Sipieter C. (2015). Datalog+, RuleML and OWL 2: Formats and Translations for Existential Rules. In: Challenge+DC@RuleML 2015, 9th International Web Rule Symposium (RuleML), Berlin, Germany.
  4. Berners-lee T., Connolly D., Kagal L., Scharf Y., Hendler J. (2008). N3logic: A logical framework for the world wide web. Journal Theory and Practice of Logic Programming, 8(3), pp. 249–269, Cambridge University Press New York, NY, USA.
  5. Breuker J, van de Velde W. (1994). CommonKADS Library for Expertise Modelling: Reusable Problem Solving Components. IOS Press, Amsterdam.
  6. Calvanese D., De Giacomo G., Lembo D., Lenzerini M., Rosati R. (2007). EQL-lite: effective first-order query processing in description logics. In: IJCAI'07, pp. 274-279, Hyderabad, India.
  7. Chein M., Mugnier M. (2008). The BG Family: Facts, Rules and Constraints. Graph-based Knowledge Representation - Computational Foundations of Conceptual Graphs. Chapter 11 (pp. 311-334), Springer-Verlag London, 428p.
  8. Corby, O., Faron-Zucker, C. (2015). STTL: A SPARQL-based Transformation Language for RDF. In: WEBIST 2015, 11th International Conference on Web Information Systems and Technologies, Lisbon, Portugal.
  9. Corby, O., Faron-Zucker, C., Gazzotti, R. (2016). Validating Ontologies against OWL 2 Profiles with the SPARQL Template Transformation Language. In: RR 2016, 10th International Conference on Web Reasoning and Rule Systems, LNCS, vol 9898, pp. 39–45, Springer, Aberdeen, UK.
  10. Corby O., Faron-Zucker C., Gandon F. (2017). LDScript: a Linked Data Script Language. In: ISWC 2017, Vienna, Austria.
  11. Djakhdjakha L., Mounir H., Boufaïda Z. (2014). Towards a representation for multi-viewpoints ontology alignments. In: IJMSO, International Journal of Metadata, Semantics and Ontologies, 9(2), pp. 91–102, Inderscience Publishers, Geneva.
  12. Dromey R.G. (2006). Scaleable Formalization of Imperfect Knowledge. In: AWCVS 2006, 1st Asian Working Conference on Verified Software, pp. 29–31, Macao SAR, China.
  13. Galárraga L., Hose, Razniewski S. (2017). Enabling completeness-aware querying in SPARQL. In: WebDB 2017, pp. 19–22, Chicago, IL, USA.
  14. Genesereth M.R. & Fikes R. E. (1992). Knowledge Interchange Format, Version 3.0, Reference Manual. Report Logic 92-1, Logic Group, Stanford University, Califoria.
    ftp://www.ksl.stanford.edu/pub/KSL_Reports/KSL-92-86.ps.gz
    https://pdfs.semanticscholar.org/590c/e95343370578dd1d357319c84d1dc7ff8fac.pdf
  15. Guizzardi G. (2005). Ontological foundations for structural conceptual models. PhD thesis (441p), Telematica Instituut / CTIT.
  16. Marino, O., Rechenmann, F., Uvietta, P. (1990). Multiple Perspectives and Classification Mechanism in Object-Oriented Representation. In: ECAI 1990, pp. 425–430, Pitman Publishing London, Stockholm, Sweden.
  17. Martin Ph., Bénard J. (2017a). Categorizing or Generating Relation Types and Organizing Ontology Design Patterns. In: KAM'17, 23rd IEEE conference on Knowledge Acquisition and Management, Prague, Czech Republic.
  18. Martin Ph., Bénard J. (2017b). Creating and Using various Knowledge Representation Model and Notation. In: ECKM 2017, 18th European Conference on Knowledge Management, Barcelona, Spain.
  19. Mizoguchi R, Kozaki K., Kitamura Y. (2012). Ontological Analyses of Roles. In: IEEE FedCSIS 2012, pp. 489–496, Wroclaw, Poland.
  20. Rector A., Brandt S., Drummond N., Horridge M., Pulestin C., Stevens R. (2012). Engineering use cases for modular development of ontologies in OWL. Applied Ontology, 7(2), pp. 113–132, IOS Press.
  21. Sowa J.F. (1992). Conceptual Graphs Summary. Conceptual Structures: Current Research and Practice, Ellis Horwood, pp. 3-51.
  22. Swan R. (2016). Querying Existential Rule Knowledge Bases: Decidability and Complexity. PhD thesis (159p), University of Montpellier.
  23. Tao J., Sirin E., Bao J., McGuinness D. (2010). Integrity Constraints in OWL. In: AAAI'10, pp. 1443–1448, Atlanta, USA.
  24. Zamazal O., Svátek V. (2015). PatOMat – Versatile Framework for Pattern-Based Ontology Transformation. Computing and Informatics, 34(2), pp. 305–336.
  25. Zaveri A., Rula A., Maurino A., Pietrobon R., Lehmann J., Auer S. (2016). Quality assessment for linked data: A survey. Semantic Web, 7(1), pp. 63–93.