d_constraints.html

Relations-between-classes based Constraints and Constraint-based Ontology-Completenesses _____________________________________________________________________________________

Dr Philippe A. MARTIN (www.phmartin.info; article .REMOVE_THIS_TEXT. @ .REMOVE_THIS_TEXT. phmartin.info)

This article is a very stong extension of the one accepted at Quatic 2018 which, for example, did not include the (content of the) subsections 2.2.2.1, 2.2.2.3, 2.3.1.1, 2.3.2, 3.2.3, 3.2.5, 3.7, 4.1 and 4.2. A preliminary (-> smaller and older) version of the one for Quatic 2018 has also been published in French in the proceedings of EGC 2018, with Jun H. JO as co-author.
In the current version of this article, the text is dispayed i) in dark green if it has been added or modified in July 2018, ii) in violet if it has been added or modified in September 2018, and iii) in red if it has been added or modified after September 2018.

Abstract. This article first distinguishes constraints from rules, and descriptive constraints from prescriptive ones. Both kinds can be used for calculating constraint-based completenesses (as opposed to real-world-based completeness), i.e. evaluating how much of a knowledge base is complete with respect to some constraints, e.g. for evaluating how well this base follows given ontology design patterns or best practices. Such evaluations may also guide knowledge elicitation and modelisation. This article explores the ways constraints can be represented via relations between classes, hence via any knowledge representation language (KRL) that has an expressiveness at least equal to RDF or RDFS. This article also focuses on the checking of such constraints using SPARQL or slight extensions of it. Compared to the popular practice of both representing and checking constraints via queries, this approach is as simple, offers more possibilities for exploiting both knowledge and constraints, and permits the selection and use of inference engines adapted to the expressiveness of the exploited knowledge instead of the use of restricted or ad hoc constraint-validation tools. This approach is also modular in the sense it separates content from usage: the represented “content focused constraints” can then be exploited via few “content independent” queries, one for each usage and kind of constraint.

Keywords: prescriptive and descriptive constraints, ontology completeness, OWL, SPARQL.

1. Introduction

Knowledge representations (KRs) are formal descriptions enabling automatic logical inferencing, and thus automatic KR comparison, search, merge, etc. KRs are logic formulas, e.g. the binary predicates of 1st-order logic; these predicates are called triples or property instances in RDF and binary relations in Conceptual Graphs (CGs) [Sowa, 1992]. For the purpose of clarity, this article uses the intuitive terminology of CGs: (information) objects are either types or individuals, and types are either relation types or concept types (classes and datatypes in RDF). A formal knowledge base (KB) is a collection of such objects written using a KR language (KRL). An ontology is a KB that is essentially about types, rather than about individuals.

Creating a KB or evaluating its quality – for knowledge sharing or exploitation purposes, or for designing or generating software, or evaluating their qualities – are difficult. Models and constraints (e.g. design patterns) help these tasks and can be stored into an ontology. E.g., the author of this article is building an ontology representing and organizing ontology design patterns as well as software design patterns; both kinds are organized by the same top-level types. [Zaveri et al., 2016], a survey on quality assessment for Linked Data, provides many dimensions and metrics for evaluating the quality of KBs and hence for helping the selection or design of KBs. One of the quality dimensions is the (degree of) completeness of a KB, with respect to some criteria or, more concisely, “its completeness”. Evaluating such a degree is common in various tasks or fields but is performed differently by different tools and sometimes in implicit or ad hoc ways. Examples of such tasks or fields are: i) the automatic/manual extraction of knowledge or the creation of a KB, ii) the exploitation of ontology design patterns, KB design libraries (e.g., the KADS library) or top-level ontologies (e.g., DOLCE), and iii) the evaluation of ontologies or, more generally, datasets. In this third field, as noted in [Zaveri et al., 2016], completeness commonly refers to a degree to which the “information required to satisfy some given criteria or a given query” are present in the considered dataset. To complement this very general definition, this article distinguishes two kinds of completeness:

Constraint-based completeness measures the percentage of elements in a dataset that satisfy explicit representations of what – or how – information must be represented in the dataset. These representations are constraints such as integrity constraints or, more generally, those expressed by ontology design patterns and schemas of databases or of structured documents. E.g.: the constraint that, in a particular dataset, at least one movie must be associated to each movie actor, or that all relations must be binary.
Real-world-based completeness measures the degree to which certain kinds of real-world information are represented in the dataset. E.g., regarding movies associated to an actor, calculating this completeness may consist in dividing “the number of movies associated to this actor in the dataset” by “the number of movies he actually played in”. Either the missing information are found in a gold standard dataset or the degree is estimated via completeness oracles [Galárraga & Razniewski, 2017], i.e. rules or queries estimating what is missing in the dataset to answer a given query correctly. The four kinds of completeness collected by [Zaveri et al., 2016] – schema/property/population/interlinking completeness – “assume a closed-world-assumption where a gold standard dataset is available”. Hence, they are all subkinds of real-world based completeness.

One way to define or calculate a constraint-based completeness of a KB is to divide “the number of statements satisfying the constraints in that KB” by “the total number of statements in the KB”. When calculating real-world-based-based completeness, aggregating the results of completeness oracles or making comparisons with the gold standard dataset may not be as simple. As a variant, one may want to consider objects instead of just statements, and thus measure the percentage of objects for which all relations from/to them satisfy the constraints. Other variants may be defined by considering only certain kinds of objects or statements. Defining constraints via KRs in the KB, instead of via queries, permits the definition of “content-independent queries” to exploit these constraints. Otherwise, a different (content-dependent) query has to be created for each variant of constraint-based checking or completeness. Because of this lack of modularity, when stored in an ontology, content-dependent queries are also less easily organized than content-independent ones.

This article does not address real-world based completeness but the techniques this article proposes for representing and exploiting constraints may also be used for representing certain domain-specific parts of the rules used for calculating real-world based completeness. From now on, unless otherwise explicitly specified, “completeness” refers to constraint-based completeness.

Section 2.1 explores the first research question of this article: what does the expression “must and must not be represented in the dataset” entail or, more precisely, given the “descriptive vs. prescriptive” distinction, what kinds of constraints need to be considered for evaluating constraint-based completeness via content-independent queries?

Section 2.3 proposes a general approach to answer the second research question of this article: how to represent constraints in a KRL independent way – or, more precisely, in any KRL that has an expressiveness at least equal to RDF or RDFS – even though actually defining the semantics of some of these constraints would require much more expressive logics? The proposed solution relies i) on the representation of constraints via restricted constructs based on relations between classes (or to classes), e.g. rules using the rdfs:subClassOf relation for representing a restricted implication between the condition and conclusion of the rule, ii) on the use of certain special types for specifying that a statement is a constraint of a certain kind, and iii) on the exploitation of these types in content-independent queries. Such constraint representations can then be exploited via most inference engines and KR query languages instead of tools tied to a particular KRL or goal (knowledge acquisition, ontology design pattern application, ontology evaluation, etc.).

The sections 3.1 to 3.6 show commands (queries or update requests) that may be used as an answer to a third research question: how to implement the above cited general approach in a SPARQL-like language: SPARQL or slight extensions of it. A restriction associated to such a query language is that the relations must be binary. Section 3.7 shows that using a procedural language instead of a query language permits the implementation of the proposed general approach in a simpler and more flexible way.

Section 4 illustrates applications or use cases for the given content-independent queries. Like Section 3, it also shows how some – but not all – expressiveness limits of subclassOf-based constraints can be overcome by pre-treating the KB, e.g. via SPARQL.

Section 5 evaluates the proposed approach and compares it to other ones.

2. Constraints: Definitions and Representation Approaches

2.1. Definitions and General Approaches

2.1.1. Considered Constraints: Those Not Used For Inferences, Just For Checks

In this article, as in KIF (Knowledge Interchange Format) [Genesereth & Fikes, 1992], a rule is a statement that can be represented – directy or via a loss-less translation – in the form “X =>> Y” where “=>>” is a restricted version of the logical implication ( “X => Y”): it only supports modus ponens, not modus tollens. An inference supporting statement is either a rule or a statement that can be represented in the form “X => Y”.

A rule allowing the derivation of a non-modal statement is a rule that can be represented in the form “X =>> Y” where Y does not include a modality (e.g., must). An example is “if x is a Person then x has a parent”. If this statement and “Tom is a Person” are in a KB, an inference engine can derive the non-modal statement “Tom has a parent”.

[Chein & Mugnier, 2008] defines constraints as positive or negative, respectively expressing statements of the form “if A, B must be true” and “if A, B must be false”. Thus, the authors define constraints as rules where the conclusion has a “must” modality. These are the kinds of constraints considered in this article, with the interpretation that in such constraints the “must” entails that the constraints can only be used for checking statements, not deriving statements. Thus, constraints are not rules allowing the derivation of non-modal statements. More formally, this means that such positive or negative constraints can respectively be translated into the forms “A ∧ ¬B =>> false” and “A ∧ B =>> false” where A and B do not contain a “must” modality and A may be empty. As an example, consider the positive constraint “if x is a Person, x must have a parent”. From this constraint and the fact “Tom is a Person”, an inference engine must not derive “Tom has a parent”. It may derive “Tom must have a parent” but, in practice, such derivation is not performed. As a somewhat opposite example, RDFS-aware engines do not exploit relations of type rdfs:domain or rdfs:range as relation signature constraints but as inference supporting statements: when a relation r has a type partially defined with an rdfs:domain (vs. rdfs:range) relation, RDFS-aware engines infer a type for the source (vs. destination) of r.

In this article, constraints that are directly represented in a form ending by “=>> false” – or, equivalently, “=>> ⊥” – are called constraints in inconsistency-implying form. Not all KRLs allow to represent rules (instead of – or in addition to – implications); in those that do, representing negative constraints using the inconsistency-implying form is easy but using this form for representing positive constraints may not possible: the KRL may not permit the representation of the negation in the “¬B” part. This in why in this article i) negative constraints are represented in inconsistency-implying form, and ii) positive constraints are in the form “A =>> B” but have a type that distinguishes them from actual rules. Furthermore, as in most rule-based systems, in the rest of this article the A and B parts share variables. More precisely, these parts are representations of relations from a same object (i.e. from a type or an individual, including a relation or a more complex statement since they are particular kinds of individuals). Thus, checking if a positive constraint is violated by an object in a dataset is checking if the object matches – that is, specializes – the condition of the constraint but not its conclusion. In other words, both the condition and conclusion of a positive constraint should be logically derivable from the object, possibly using various inference supporting statements from the KB.

In the research literature on constraints, these ones are generally not represented – or checked – via modal logic based KRLs but rather using queries, e.g. via SPARQL or the nonmonotonic-epistemic-logic query language EQL-Lite [Calvanese et al.]. In (unidirectional) rule based systems, rules with empty conclusions (or “false” as conclusions) are handled like constraints. However, this is a particularity of these systems. It should not be relied upon for general knowledge representation purpose. For such a purpose, the special semantics of constraints has to be made explicit via special syntactic sugar or special types. Since KRLs rarely propose syntactic sugar for expressing constraints, a more generic approach for expressing that a statement is a constraint, as opposed to an inference supporting statement, is to state that this statement is an instance of a type expressing a particular kind of constraint, as explained in Section 2.1.4. Then, these constraints can be retrieved and exploited by content-independent queries such as those provided below. These constraints can also be directly interpreted and exploited by inference engines designed to take into account the used constraint types. In any case, either i) constraints are not represented in a way they can be exploited as inference supporting statements, or ii) the results of these inferences must not be detrimental, i.e., must not influence the checking of constraints. Both techniques will be illustrated in Section 2.3.1 and Section 3.

2.1.2. Prescriptive (i.e. Not Using All Possible Inferences) vs. Descriptive

As noted in [Assmann & Wagner, 2006], one common distinction between engineering models is whether they are i) descriptive of some reality, like most ontologies are, or ii) prescriptive of what must be in the considered dataset, as with system specifications, meta-models, XML schemas, database schemas, SHACL, etc. Similarly, this article distinguishes two kinds of constraints: descriptive constraints and prescriptive ones. As detailed at the end of this subsection, the notion of prescriptive constraints as defined in this section is original. Before seeing precise definitions, let us first see different purposes of these two kinds of contraints, as well as general (purpose-derived) definitions and examples for these constraints.

Different purposes. Like a definition or an axiom, a descriptive constraint is a declarative way to ensure that, if certain terms are used (i.e., if a certain condition involving these terms is met), these terms are used in a certain correct way. On the other hand, using a prescriptive constraint is meant to ensure that if a certain condition is met, certain formal terms are i) actually used, and ii) that this use is not due to inferences. However, given that the conclusion of descriptive constraints can often (or, with a sufficiently expressive KRL, always) also be exploited for checking that certain terms are actually used, supporting this goal cannot be used as a distinguishing feature for prescriptive constraints: their main interest comes from this checking that certains uses of certain terms are not due to inferences.
General definitions and examples. To sum up the previous paragraph and the previous subsection, descriptive and prescriptive constraints are rules only usable for checking statements and, unlike descriptive constraints, prescriptive ones ensure that certain uses of certain terms are not due to inferences. A prescriptive constraint is only about the content of a KB; e.g.: “if x is a Person, x must have a parent in the base of facts (prescriptive-must)”. A descriptive constraint is still a partial representation of a world; e.g.: “if x is a Person, x must have a parent in the represented world (descriptive-must)”. Rules which are not constraints are also representations of a world; e.g.: “according to most laws, if x is a Person, x must not kill another Person (deontic-must)” and “if x is a Person, x has/had a parent (according to observations)”. A prescriptive constraint is meant to ensure that whenever people or software agents create an instance of a certain type, they explicitly associate certain relations to this instance. Here, “explicitly” specifies that these relations must not exist just because they were automatically deduced – e.g. by exploiting type definitions or, in other words, by type definition expansion – but only because these relations were (manually or automatically) set by an agent when creating or modifying the dataset. As an example, assume that a KB includes the non-modal rule “if x is a Person, x has a parent” and that a user enters that “John is a Person” in the base of facts of this KB (this base is the set of relations from/to individuals; for a description-logic based KB, this is its A-box). Even if this KB also includes the descriptive constraint “if x is a Person, x must have a parent in the represented world (descriptive-must)”, an error message should not be given by a KB checking mechanism since this constraint is satisfied (by inferencing) without the user having to represent a parent for John. On the other hand, if the KB includes the prescriptive constraint “if x is a Person, x must have a parent in the base of facts (prescriptive-must)”, the adding of a new person without a relation to a parent must now be rejected.

In the approach proposed by this article, constraints have types or use particular relation types. These types permit the distinction between the different kinds of constraints and their distinction from non-modal rules: the “must” modalities are represented via these types. These distinctions can thus be exploited by content-independent queries, inference engines or special procedures.

A constraint in inconsistency-implying form (e.g., “A ∧ B =>> false”) is meant to ensure that certain uses of terms are not used in a KB. In this case, “prescriptive” means that the non-existence of these uses is not due to inferences, and hence “descriptive” means that it does not matter whether or not the non-existence of these uses is due to inferences. However, what would be the interest of such prescriptive constraints? Indeed, inferences leading to the existence of unwanted relations need to be caught too. The situation is different from the one for wanted relations where, as above illustrated, it may be interesting to specify whether or not these relations can be inferred. Thus, at least in this article, all constraints in inconsistency-implying form are considered to be “descriptive”. If the used KRL supports negations, a constraint in inconsistency-implying form can specify that the non-existence of a certain use of a term is an inconsistency and thus that this certain use must exist in the KB – e.g. “A ∧ ¬B =>> false” may be used to ensure that B is true when A is true. Such a constraint is descriptive since it allows the existence of this certain use to be due to any inference.

When trying to match an object (in the dataset) to the condition of a constraint, in order to avoid missing a match i) the expressiveness used for the inference regime must be at least equal to the language expressiveness that was necessary for describing the object, and thus ii) there is no reason to restrict the exploitation of certain inference supporting statements. This is also true when trying to match an object to the conclusion of a descriptive constraint. However, when trying to match an object O to the conclusion C of a prescriptive constraint, a certain set of inference supporting statements should not be used since a prescriptive constraint is meant to ensure that certain uses of certain formal terms – i.e. the use of these terms in certain relations – is not due to inferences. In this article, this set is called R. It is the set of all inference supporting statements that could automatically associate relations to an object – and thus, in the above generic case, to the object O – except for type relations (and subtype relations if the object is a type). There are several reasons for that exception. In the following arguments, the expression “type relations” also refers to subtype relations and thus “non-type relations” also exclude subtype relations. One pragmatic reason for the exception is that, for matching O to C, it is necessary to exploit type relations from O as well as from all the objects that are destinations of non-type relations from O. Fortunately, this is not the sole reason: there is also no point for the conclusion of a prescriptive constraint to check if O is of a certain type. Indeed, this does not help ensure that non-inferred genuine knowledge is associated to O since i) without their definitions involving other relations than type relations, types would not constrain any use for their instances and, in that sense, would not have any genuine meaning, and ii) exploiting these definitions for checking whether O has a certain relation would be checking if O has a certain inferred relation, something which a prescriptive constraint is not meant to support. This last point is also a reason of its own for the above cited exception. For a definition of what prescriptive constraints mean with respect to how they are checked, first note that inferencing via forward chaining saturation statically adds relations to objects. On the other hand, with backward chaining, inferred relations are temporarily or virtually added or associated to objects. Second, consider KB0 the version of the KB before the exploitation of R by forward chaining saturation adds the inferred relations. Then, whether or not inferencing involves forward chaining saturation, testing if O matches C is trying to find if O0 permits the derivation of C via inference supporting statements in “KB0 minus R” (KB0\R), i.e. testing if “O0 (KB0\R) ⊢ C”.

In OWL, only a class or an instance of a class can have an automatically associated non-type relation – with “non-type” still also excluding subtype. E.g., a relation type cannot have such an associated relation; associating a property chain to a relation type is no exception since this association is made via a subtype relation. Thus, in OWL, the automatic association of a non-type relation to an object is only supported by class definitions via property restrictions, i.e. instances of owl:Restriction. In other words, this association is only supported by class definitions not restricted to specifying that the class is i) subclass, superclass, intersection or union of other classes, or ii) type or instance of other objects. Thus, in an OWL based KB, R is the set of restrictions, and KB0\R is KB0 where the body or content of each restriction has been removed. In other kinds of KBs, R is different since they offer other ways to express universal quantification and thus other inference-based means to associate non-type relations to objects. In any case, removing inferred non-type relations or not taking them into account can be done automatically, in different ways depending on the features on the used system: general purpose inference engine, ad hoc procedural implementation, etc. The following summary of approaches not only refers to R but also to Tbox, i.e. R plus the types, their subtype relations and their type relations. In a Description Logic based KB, Tbox refers to its T-box while KB\Tbox refers to its A-box, i.e. the individuals, the relations between them, and the type relations from individuals to types.

When the system can exploit two KBs at the same time. In this case, one KB (KB0') can be initialized with KB0 and the other (KB0\R') with KB0\R, although in both KBs each existentially quantified anonymous object must be given a generated unique identifier – so that from each object in KB0\R', its possibly extended counterpart in KB0' can be retrieved. Then, for each prescriptive constraint C and object O in KB0, C can be checked if the system can i) test if O in KB0' matches the condition of C (in KB0'), and then, if needed, ii) test if O in KB0\R' matches the conclusion of C (in KB0\R'). This method works whether inferences involves forward chaining saturation or just backward chaining. If the constraint checking system is implemented in an ad hoc procedural way, via calls to a function testing if an object matches another one in a given KB, specifying and connecting the two above cited tests is not difficult. In the current standard versions of SPARQL, using “inline data” (and hence the VALUES keyword) or “named graphs” (and hence the FROM keyword), it is possible to perform queries on different KBs and use a same object identifier in the queries to relate them. However, it is not yet possible to use a same variable to relate the results of queries on different KBs. With an extension of SPARQL such as LDScript [Corby, Faron-Zucker & Gandon, 2017], based on the results of one query, it would be possible to generate another query with the results of the first query hardcoded in it, and thus solve the problem. The names of the KBs would also have to be hard-coded. This approach is not explored in this article.
When the system can exploit only one KB but, within this KB, at least KB0\Tbox can be duplicated in a way that does not trigger R (or at least not in a problematic way). The idea here is to distinguish between inferred statements and those in KB0. To do so, a first way is that each inferred object or statement is automatically marked as such, e.g. via a special type or relation associated to it or to at least one of its parts. A second way is that, when initializing the KB with KB0, each object in KB0\Tbox – or the whole KB0 for simplicity purposes – is duplicated into a “saving form” that later permits to regenerate the object as it was in KB0. One such form is a reification of the object, with a special type or special relations for i) distinguishing this “reification for duplication purposes” from other reifications, ii) distinguishing its initial relations from those that might be added to it via inferences, and iii) automatically discarding this saving form from the results of a query when it appears in such results. With both ways, for each prescriptive constraint C and object O that is not a duplication, C can be checked in two steps. The first step is to test if O matches the condition of C. The second step is to test the conclusion, if still needed. With the first way, this second step consists in performing this test without involving the marked inferred objects or statements. Since in the general case a general purpose inference engine cannot be parameterized to support this first way, it is not explored in this article. With the second way, i.e. the saving form based way, the conclusion matching test is performed not with the current O but with a complete or partial copy of O as it was in KB0. In this article, this copy is called a “clone without inferred relations” or simply a clone. This clone is not necessarily a complete copy of the initial O. It is only a partial copy when i) this is not detrimental for the conclusion matching test, and ii) this permits the clone to be directly the “saving form” instead of being regenerated from this form. In practice, “initializing the KB” is simply opening or importing the file containing KB. Creating the saving form can be performed by pre-treating the KB, i.e. modifying the KB as soon as it is initialized and before any potential forward chaining saturation process is triggered. KB treatments or pre-treatments may for example performed via SPARQL update requests. Regenerating clones from saving forms may be performed statically, i.e. via a treatment of the KB before the checking of prescriptive constraints. It may also be performed dynamically, i.e. by temporarily generating certain clones just before checking the conclusion of certain constraints.
When the system can exploit only one KB but all inferences are performed via backward chaining and either i) R can be temporarily disabled or removed, or ii) KB0\R can be temporarily duplicated. With backward chaining, inferred relations are only temporarily or virtually added or associated to objects. Thus, if all inferences are performed via backward chaining, KB0 is not statically modified and, more importantly, testing whether or not an object O matches the conclusion of a prescriptive constraint C can be performed by i) temporarily removing R or disabling the use of R, just for the time of the test, or ii) testing the match in a temporary copy of the KB where R is absent. Since in the general case a general purpose inference engine does not support specifications for temporary disabling the use of certain inference supporting statements, this method is not explored in this article. The two other cited ways first involve duplicating R into saving forms – before each conclusion matching test or, once and for all, when the KB is initialized. Then, just before testing if an object O matches the conclusion, either i) R is removed before the test and restaured from the saving forms after the test, or ii) a temporary copy of (a relevant part of) the KB except for R is created and the test is performed within this copy. In this last described approach, “relevant” means “all the objects that can be reached from O via a sequence of relations” instead of all the KB.

Section 2.2.2 proposes three methods based on clones and/or saving forms: two methods based on the last described approach in the above second point and one method based the last described approach in the third point. These methods exploit a general purpose inference engine and a SPARQL-like language. Each method has different limitations.

The notion of prescriptive constraints as above defined is original: to our knowledge, in other works on constraints, these ones apply on statements of the KB whether they have been inferred or originally provided. Since these works do not provide ways to make the differences between those two kinds of statements, the constraints they handle are by default descriptive. Yet in other words, since these works do not provide a mechanism for temporary disabling – or ignoring the results of – some inferences only when checking the conclusions of the constraints, the only way to handle these constraints as prescriptive constraints (in its above defined sense) is to disable – or ignore the results of – some inferences not only during the checking of their conclusions but also during the checking of their conditions. Thus, some inferences may be missing for a complete checking of the checking of the conditions.

A constraint, whether descriptive or prescriptive, is intended to check the existence (or absence of existence) of certain objects in a KB. To that end, constraint checking needs to exploit some form of closed world assumption. When SPARQL is used for checking constraints, the closed world assumption related parts can be represented via the operators EXISTS and NOT EXISTS. In this article, SPARQL is used for the content-independent queries (cf. Section 3) and only the operator EXISTS had to be used in these queries. Furthermore, with such queries, there is no need to represent any closed world assumption related part in the constraints themselves. Thus, the inference engines used for matching objects of the KB to the constraints need not use the closed world assumption. Hence, regular inference engines for OWL can for example be reused.

2.1.3. Restriction of Some Constraint Parts to Named or Anonymous Individuals

It is sometimes interesting to create constraints allowing certain relations from individuals to be inferred only if these individuals are named, i.e. only if they are not anonymous individuals (hence, in the RDF terminology, not “blank nodes”). E.g., assume that a user wants to ensure that each instance of Person has a parent relation, inferred or not as long as its destination is named. To do so, using a prescriptive constraint is not possible since, by definition, it would not accept inferred named parents. A solution is to write two constraints: i) an inconsistency implying one stating that an instance of Person without a(n inferred or not) parent relation is an error, and ii) a descriptive constraint specifying that the destination of each of these parent relations must be named.

As long as named individuals are represented in the constraints the same way they are in the rest of the KB, any way can be used and no special content-independent query has to be introduced. One way is to type named individuals with owl:NamedIndividual – this is the way used in one of the examples of Section 2.3.3. Since doing so by hand may be tedious, Section 3.1.2 proposes a SPARQL update request that automatically adds such a type to named individuals.

2.2. General Approach for Representing And Exploiting Constraints

2.2.1. Using Constraint Types

[Tao et al., 2010] shows that representing and checking certain kinds of integrity constraints that exploit some forms of the Unique Name Assumption or Closed World Assumption can be performed via SPARQL queries. Instead, as explained in the introduction, the goal is here to enable the representation of constraints that i) can be exploited via content-independent queries, ii) can be represented via any KRL that has an expressiveness at least equal to RDF or RDFS, and iii) can be marked as descriptive or prescriptive (this distinction is not made in [Tao et al., 2010]).

To that end, the proposed approach is to introduce a few types for constraints. By setting instanceOf or subtypeOf relations from certain KRs to some of those types, KB authors can state that these KRs are constraints and can indicate which kind of constraints. Thus, these constraints can be exploited by content-independent queries or inference engines that understand the used constraint types. This means that these engines know that these types represent particular “must” modalities and know how to handle such modalities. This approach is similar to the use of OWL types in RDF statements and their exploitation by OWL-aware inference engines. The name of the proposed ontology of constraint types is CSTR. In this ontology, cstr:Constraint is the supertype of all types of constraints. Similarly, the type cstr:Prescriptive_constraint, a subtype of cstr:Constraint, enables one to state that some rules are actually prescriptive constraints or to retrieve all and only such constraints. The prefix “cstr:” in these identifiers is an abbreviation for the namespace http://www.webkb.org/kb/it/CSTR. CSTR also includes types for constraint conditions and types for relations between the condition and condition of a constraint. These types are listed in Section 2.3.2.

2.2.2. Using “Clones Without Inferred Relations” For Bypassing Some Inferences When Checking The Conclusions Of Prescriptive Constraints

For the reasons given in Section 2.1.2, this section proposes three clone based methods supporting the checking of prescriptive constraints for when the system can exploit only one KB at any given time. The methods of the next three subsections exploits a general purpose inference engine and a SPARQL-like language. They are each dependent on different features of the used system but they do not rely on a particular logic, KRL, inference engine or tool. In other words, these methods are KRL independent and tool independent in the sense that they can be used with any KRL and any tool satisfying the listed features. Thus, depending on the domain and application, different inference engines can be reused to check or evaluate ontology completeness.

2.2.2.1. Checking Prescriptive Constraints Via the Cloning of At Least KB0\Tbox When R is Exploited Via Forward Chaining Saturation

This method has two or three steps. First, just after the KB has been initialized with KB0, i.e. when the file containing KB0 has been imported, each object of at least KB0\Tbox is duplicated into a saving form, e.g. by reification. For simplicity purposes, the whole KB may be duplicated this way. The clones may also be created at that time (Section 3.2.3.3 gives a SPARQL command to perform such a step). If so, the second step is to exploit the saving forms for removing inferred relations from the clones (cf. Section 3.2.3.4) and the third step is to perform the checking of the prescriptive constraints (for each object O matching the condition of a constraint C, testing the matching of the conclusion should be done with the clone of O, not O; cf. Section 3.2.3.1). If the clones are not created along with the saving forms, the second step can be the generation of these clones from these forms, and the third step is the constraint checking. A third option is to merge the second and third steps, i.e. to dynamically generate each needed clone (based on the saving forms) just before testing the match of a constraint conclusion (cf. Section 3.2.3.2).

This method relies on the following conditions.

Testing if O matches the conclusion of C only requires backward chaining, and inferences associating relations to objects are performed via forward chaining. In an OWL-based KB, this means that this testing ignores property restrictions but that associating relations to objects exploit them (via forward chaining).
The saving forms for KB0\Tbox are automaticaly generated before forward chaining saturation is triggered, i.e. via a KB pre-treatment.
Furthermore, forward chaining saturation is not triggered when the original version of objects are regenerated, e.g. because such an inferencing can be temporarily disabled.
These conditions and the one of the previous point are for example verified in Corese [Corby, Faron-Zucker & Gandon, 2017], an OWL2-RL inference engine and a SPARQL engine.
If SPARQL is used for the above cited pre-treatment, e.g. via the update request given in Section 3.2.3.3, the whole KB is duplicated, not just one object. Hence, given a KB of a particular size, the used KB management system must be able handle at least twice its size.
If a dynamic generation of the clones is performed and a query language is used for it, this language must allow the specification of a sub-request generating a clone before it is used in the conclusion matching test. Such a sub-request is not possible in SPARQL but, as illustrated in Section 3.2.3.2, possible in LDScript, the SPARQL extension used by Corese. However, only CONSTRUCT sub-requests are allowed by LDScript, not INSERT or DELETE sub-requests. For reasons explained in Section 3.2.3.2, using CONSTRUCT for that sub-request requires the temporary duplication of nearly the whole KB before each conclusion matching test. Thus, the used KB management system (KBMS) must be able to handle this duplication in terms of memory and required speed. For a big enough KB, no such KBMS may exist.

2.2.2.2. Checking Prescriptive Constraints Via the Cloning of At Least KB0\Tbox Using “Clones Without Types”

This method has two differences with the previous one. First, each saving form is directly a clone. It has the same form as the original object but does not have its type relations (and, if it is a named individual, it has an identifier different from the original object). In this article, each of such clones is called a clone without type. Indeed, this method assumes that R is such that all inferences which automatically associate relations to an object do so based on its type. Given this first difference, the second one is that this method can be used whether or not the used inference engine performs forward chaining saturation. If the engine only uses backward chaining, i.e. if no inferred relation has been statically added to objects in the KB via forward chaining saturation, it is possible to dynamically create a clone without type for each object before testing if this clone matches the conclusion of a prescriptive constraint. However, the alternative of using a KB pre-treatment for cloning all the objects once and for all, when the KB is initialized, is more efficient and works regardless of whether or not forward chaining saturation is involved. This is why no LDScript query for the dynamic generation of clones is proposed in Section 3.2.4. Section 3.2.4.1 gives a SPARQL INSERT request to perform the above cited KB pre-treatment. The SPARQL query for checking prescriptive constraints given in Section 3.2.3.1 not only works for the method described in the previous subsection, it also works for the method described in the present subsection. Thus, this query is not repeated in Section 3.2.4 which therefore has only one subsection.

This method relies on the following conditions. The last three are the counterparts of the last three conditions given for the previous method.

R, the set of inference supporting statements that could automatically associate non-type relations to an object, must now be restricted to inference supporting statements that exploit the type of this object. This makes no difference for an OWL based KB but, for an object oriented KB, this excludes the use of inference supporting statements supporting duck typing instead of just inheritance.
Inference supporting statements that lead to the association of a type to an object should now not be used in the KB. In an OWL based KB, this only excludes the use of i) definitions based on rdfs:domain or rdfs:range relations, and ii) definitions based on owl:equivalentClass relations to restrictions (there is no problem if only rdfs:subClassOf is used). Here is an example of how the use of rdfs:domain relation can lead certain inference engines to regenerate types for some clones. Assuming there is an rdfs:domain relation from the relation type parent to the type Person, if an object of type Person Person is source of a parent relation and this object has its type removed, an inference engine may set it again. To avoid such a case, instead of using rdfs:domain or rdfs:range relations, one may write inconsistency-implying constraints that are equivalent to these relations except that they are usable only for checking purposes. An example is given at the end of each of the subsections of Section 2.3.3.
Prescriptive constraints should now not be used for checking that objects with certain relations have certain non-inferred types. However, constraints in inconsistency-implying form can be used for checking that objects with certain relations have certain types (inferred or not), and the condition of a prescriptive constraint can be used for testing the types of an object.
If forward chaining saturation is run on the KB, there must be a way for this to happen after the clones are created.
If SPARQL is used for creating clones without types, as illustrated in Section 3.2.4.1, the whole KB is duplicated. The used KBMS must be able to handle this.
If a dynamic generation of the clones is performed and a query language is used for it, the conditions cited for this case in the previous subsection also apply for this method.

2.2.2.3. Checking Prescriptive Constraints Via the Creation of Saving Forms for R and the Creation of a Temporary Copy of (a Relevant Part of) the KB without R When All Inferences are Performed Via Backward Chaining

Section 2.1.2 summarized the two steps of this method. Here are more details. The first step is to duplicate R into saving forms. This may be done once and for all, via a KB pre-treatment. A less efficient way is to save R just before testing if an object matches the conclusion of a prescriptive constraint. The saving can for example be performed by reification, as in the first method. Section 3.2.5.1 provides a SPARQL query implementing the KB pre-treatment for an OWL based KB. The second step is to check the prescriptive constraints, in the following way: just before each testing if an object matches a conclusion, create a temporary copy of the relations that can be directly or indirectly reached from the object, except for relation in R, and perform the test within this copy. Section 3.2.5.2 provides an LDScript query implementing this second step.

This method relies on the following conditions. They are the counterparts of the last three conditions given for the previous method.

All inferences are performed via backward chaining.
In terms of memory, the used KB management system must be able to handle the temporary duplication of the KB (minus R).
If a dynamic generation of the clones is performed and a query language is used for it, the conditions cited for this case in Section 2.2.2.1 also apply for this method.

2.3. Representing Constraints Via Relations Between Classes

2.3.1. Approaches

One way to represent and exploit (simple) rules in a KRL that has an expressiveness at least equal to RDF or RDFS is to use an rdfs:subClassOf relation for representing the implication between the condition and conclusion of a rule. However then, either this implication must not be used for modus tollens or the results must not be detrimental. The situation is not much more complex when subclassOf rules are used as a way to represent constraints. There are three cases.

If the conclusion is (equivalent to) owl:Nothing, i.e. if the inconsistency-implying form is used, the rule is semantically a constraint and, depending on the inference engines, modus tollens may or may not be a danger.
Otherwise, if a prescriptive constraint is represented, the “Clones Without Inferred Relations” based method prevents the results of modus ponens or modus tollens to influence the checking of constraints (this is where these results could have been detrimental).
Otherwise, i.e. if a descriptive constraint is represented, one must use an inference engine that does not exploit rules for modus ponens nor modus tollens when the condition of the rule is subtype of cstr:SubclassOf-based_constraint_condition or instance of cstr:Type_of_subclassOf-based_constraint_condition.

In other words, using subclassOf-based constraints when inferences based on subclassOf relations then have to be ignored is generally not relevant. However, the idea of using classes for representing the conditions and conclusion of a constraint without using variables is interesting. Here are the two simplest ways to do so.

The “subclassOf-analogous” way: it consists in relating the condition class and the conclusion class by a relation that is not a subclassOf one. To support this, CSTR proposes the relation types cstr:descriptive_constraint_conclusion and cstr:prescriptive_constraint_conclusion.
The “individual-based constraint” way: it consists in creating a constraint individual and, from it, relations to express its type (i.e., descriptive vs. prescriptive) and the classes for its condition and its conclusion. To support this, CSTR proposes the relation types cstr:condition_class and cstr:conclusion_class. Since this way uses an individual to relate two classes, it can be seen as indirectly relation-between-classes based, as opposed to the subclassOf-based and subclassOf-analogous ways which are directly relation-between-classes based. A similar solution would be to use a type instead of an individual for representing the constraint. This solution is slightly closer to the two other ones but a bit artificial. Another reason it is not used in this article is that the individual-based solution is easier to name and distinguish from the other two basic solutions.

A disadvantage of any solution using relations from/to classes when these relations are not subclassOf ones is that the result requires a KRL with an expressivity at least equal to RDF. For OWL-based representations, this means interpreting them with the RDF-Based Semantics, not the OWL Direct Semantics. Since each of the three approaches has its advantages and drawbacks, this article proposes types and commands for the three approaches: Section 2.3.2 gives the types, Section 2.3.3 gives examples and Section 3.1 gives the requests.

SubclassOf-based constraints could be generalized using a subtypeOf relation instead of a subclassOf one. Conforming to the terminology given in the first paragraph of the introduction, this would permit the use of relation types and datatypes, hence not just classes, in the condition and conclusion of a constraint. However, checking objects such as relations and instances of datatypes can most often be done via constraints on class instances related to these objects. For example, relations can generally be checked via their sources or destinations, or via the assertions that include these relations, i.e. instances of classes such as owl:NegativeObjectPropertyAssertion. Hence, for relating the condition and conclusion of a constraint, this article does not propose the use of subtypeOf relations that are not rdfs:subClassOf relations.

2.3.1.1. Comparison With OWL-ER

[Baget et al., 2015] shows that a sublanguage of OWL2 called OWL2-ER using subclassOf-based rules can represent many kinds of “existential rules” (hence the suffix “-ER”), i.e. rules in which the conclusion (or head) can include anonymous existentially quantified objects. A sublanguage of OWL2-ER is OWL2-RL [OWL 2 profiles] which can be translated into Datalog (definite Horn rules with equality), a purely declarative subset of Prolog. Similarly, OWL2-ER can be translated into – and hence represented by – Datalog+ [Baget et al., 2015] (the extension of Datalog to existential rules) and then into RuleML. Thus, not all OWL2 class expressions are allowed in OWL2-ER: rules are restricted to the logical form “∀X ∀Y (B(X,Y) => ∃Z H(X,Z))”, where “X, Y and Z are sets of variables, B (the body) is a conjunction with variables in X and Y, and H (the head) is a conjunction with variables in X and Z; hence Z denotes the set of variables that occur in H but not in B” [Baget et al., 2015]. Given its restrictions with respect to OWL2, OWL2-ER is roughly the part of Datalog+ that can be expressed using only OWL2, hence with binary relations only and without the syntactic possibility of directly and explicitly using variables for relating objects shared by both the condition and conclusion of a rule.

In OWL2-ER, negative constraints can be represented in two ways: i) by using a class expression equivalent to the type owl:Nothing in a rule conclusion, hence a rule of the form ∀x (ClassExpression(x) => ⊥), ii) by using the type owl:NegativeObjectPropertyAssertion to express negated facts of the form ¬ ∃x ClassExpression(x). Neither OWL2-ER nor Datalog+ can directly represent a positive constraint and hence also do not distinguish between descriptive and prescriptive constraints. However, OWL2-ER statements can be used for expressing and exploiting constraints with the types and approach proposed in this article. Conversely, with the proposed approach, subclassOf-based constraints are not restricted to OWL2-ER or OWL2 class expressions. Here are examples of OWL2-ER rules given in [Baget et al., 2015]; to express the meaning of constraints reusing them, it is sufficient to replace the words “is” or “has” by the words “must be” or “must have” in the informal descriptions (each of them is prefixed by “#”).

   ∀X ∀Y ∀Z (isProject(X, Y, Z) => isMember(Z,X))
      #Every leader of a project is a member of this project
  ∀X ∀Y (Researcher(X) ∧ hasExpertise(X,Y) => ∃Z ∃L (isProject(Z,Y,L) ∧ isMember(X,Z)))
      #Every researcher expert in an area is a member of a project in that area
  ∀X ∀Y ∀Z (isProject(X,Y,Z) ∧ isProject(X,Y,Z´) => Z = Z´)
      #Every project has at most one leader

Since the proposed approach is purposely KRL independent and not restricted to subclassOf-based constraints (and, a fortiori, to OWL2-ER statements), more formal details on OWL2-ER – or on what representing a rule using rdfs:subClassOf entails – would be irrelevant in this article. See [Baget et al., 2015] and [Swan, 2016] for those formal points. The SPARQL code given in Section 3 provides the remaining necessary formal details.

2.3.2. Proposed Types

Figure 1 uses a concise version of UML to show subclassOf relations between some classes of CSTR. Figure 2 uses the FL notation – a notation somewhat similar to Turtle but more expressive and – to show subtypeOf relations between main types of CSTR. This figure shows how many of the first-order classes are instances of some second-order classes that are similarly organized and somewhat similarly named. E.g., to state that a class (class expression or named class) is the condition of a subclassOf-based constraint one can mark it as subtype of cstr:SubclassOf-based_prescriptive_constraint_condition or instance of cstr:Type_of_subclassOf-based_prescriptive_constraint_condition. The first case implies the second one but it also specifies that, if the marked class has subclasses, each definition of these subclasses is also a condition for a constraint. The other second-order classes have the same purpose.

In its constraint examples, this article only uses 1st-order classes. Indeed, most inference engines for description logics cannot handle a KB that defines 2nd-order types, i.e. a KB that exploits 2nd-order types not predefined in those description logics. As explained in the second paragraph of Section 2.3.1, the two other proposed representation approaches use other relations to classes and hence cannot be handled by these above cited inference engines.

Because of this limitation of these engines, to use them for checking constraints on class definitions, it is necessary to pre-treat the KB by removing the individuals. In such a case, 1st order classes become individuals and their subclass relations must be removed too. Section 3.1 shows the basis of how to do that in SPARQL. Conversely, if such engines must be used to check constraints on individuals the 2nd-order types must first be removed.

Figure 1. Concise-UML representation of subclassOf relations between some classes in the CSTR ontology
cstr:Constraint cstr:Constraint_condition /\ /\ /\ \| \| \| cstr:Descriptive_constraint \| cstr:SubclassOf-based_constraint_condition cstr:Prescriptive_constraint /\ /\ \| \| cstr:SubclassOf-based_descriptive_constraint_condition \| cstr:SubclassOf-based_prescriptive_constraint_condition Legend. The class names with no prefix are from the CSTR ontology. Each arrow “->” represents a subClassOf relation. For concision purposes, boxes around classes are not drawn.

Figure 1. Concise-UML representation of subclassOf relations between some classes in the CSTR ontology

                    cstr:Constraint              cstr:Constraint_condition
                       /\      /\                         /\
                       |      |                       |
cstr:Descriptive_constraint   |      cstr:SubclassOf-based_constraint_condition
         cstr:Prescriptive_constraint                 /\         /\
                                                      |         |
     cstr:SubclassOf-based_descriptive_constraint_condition     |
                   cstr:SubclassOf-based_prescriptive_constraint_condition

Legend. The class names with no prefix are from the CSTR ontology.
Each arrow “->” represents a subClassOf relation.
For concision purposes, boxes around classes are not drawn.

Figure 2. FL representation of subtypeOf relation between main types in the CSTR ontology
cstr#binary_relation rdf#type: rdf#Property, > exclusion //cstr#binary_relation has the following 5 exclusive subtypes { cstr#id // (this exclusion set is not complete: this is not a partition) cstr#descriptive_constraint_conclusion cstr#condition_class cstr#prescriptive_constraint_conclusion cstr#conclusion_class (cstr#NonInferredRelation < rdf#Statement) (cstr#RelationFromPropertyRestriction < rdf#Statement) }; cstr#Constraint rdf#type: cstr#Type_of_constraint, > cstr#Descriptive_constraint cstr#Prescriptive_constraint; cstr#Constraint_condition > (cstr#SubclassOf-based_constraint_condition rdf#type: cstr#Type_of_subclassOf-based_constraint_condition, > (cstr#SubclassOf-based_descriptive_constraint_condition rdf#type: cstr#Type_of_subclassOf-based_descriptive_constraint_condition) (cstr#SubclassOf-based_prescriptive_constraint_condition rdf#type: cstr#Type_of_subclassOf-based_prescriptive_constraint_condition) ); owl#Class > exclusion //owl#Class has the following 3 exclusive subclasses { cstr#Type_of_constraint (cstr#Type_of_constraint_condition > (cstr#Type_of_constraint_condition_that_is_a_type > (cstr#Type_of_subclassOf-based_constraint_condition > cstr#Type_of_subclassOf-based_descriptive_constraint_condition cstr#Type_of_subclassOf-based_prescriptive_constraint_condition ) ) ) }; Legend. “>” means “(has for) subtype”. “<” means “(has for) supertype”. Namespace prefixes end by “#”, not “:”. Thus, the syntactic form “X r: Y” can be read “X has for r Y” (although FL syntactic sugar such as “exclusion” slightly changes this reading: see the comments prefixed by “//”).

Figure 2. FL representation of subtypeOf relation between main types in the CSTR ontology

cstr#binary_relation  rdf#type: rdf#Property,
 > exclusion    //cstr#binary_relation has the following 5 exclusive subtypes
   { cstr#id    //  (this exclusion set is not complete: this is not a partition)
     cstr#descriptive_constraint_conclusion    cstr#condition_class
     cstr#prescriptive_constraint_conclusion   cstr#conclusion_class
     (cstr#NonInferredRelation < rdf#Statement)
     (cstr#RelationFromPropertyRestriction < rdf#Statement)
   };

cstr#Constraint  rdf#type: cstr#Type_of_constraint,
 > cstr#Descriptive_constraint  cstr#Prescriptive_constraint;

cstr#Constraint_condition
 > (cstr#SubclassOf-based_constraint_condition
      rdf#type: cstr#Type_of_subclassOf-based_constraint_condition,
      > (cstr#SubclassOf-based_descriptive_constraint_condition
           rdf#type: cstr#Type_of_subclassOf-based_descriptive_constraint_condition)
        (cstr#SubclassOf-based_prescriptive_constraint_condition
           rdf#type: cstr#Type_of_subclassOf-based_prescriptive_constraint_condition)
   );

owl#Class
  > exclusion  //owl#Class has the following 3 exclusive subclasses
    { cstr#Type_of_constraint 
      (cstr#Type_of_constraint_condition
         > (cstr#Type_of_constraint_condition_that_is_a_type
              > (cstr#Type_of_subclassOf-based_constraint_condition
                   > cstr#Type_of_subclassOf-based_descriptive_constraint_condition
                     cstr#Type_of_subclassOf-based_prescriptive_constraint_condition
                ) ) )
    };

Legend.
“>” means “(has for) subtype”. “<” means “(has for) supertype”.
Namespace prefixes end by “#”, not “:”.
Thus, the syntactic form “X r: Y” can be read “X has for r Y”
(although FL syntactic sugar such as “exclusion”
slightly changes this reading: see the comments prefixed by “//”).

2.3.3. Examples

In this article, except in Figure 1 and Figure 2, the Turtle notation is used when SPARQL is not used since SPARQL reuses this notation for representing relations. For clarity purposes, the names of relation types have a lowercase initial while other names have an uppercase initial. In SPARQL, Turtle or other graph-based notations, a statement of the form “ SourceConcept relationName1 DestConcept1ForRelName1, DestConcept2ForRelName1; relationName2 DestConcept1ForRelName2, DestConcept2ForRelName2 . ” can for example be read “SourceConcept has for relationName1 DestConcept1ForRelName1 as well as DestConcept2ForRelName1, and has for relationName2 DestConcept1ForRelName2, and has for relationName2 DestConcept2ForRelName2”.

2.3.3.1. Individual-based Constraints

Here is the positive prescriptive constraint “if x is a Person, x must have a parent in the base of facts (prescriptive-must)” represented as an individual-based constraint using the CSTR ontology.

  [] rdf:type cstr:Prescriptive_constraint;
     cstr:condition_class :Person;
     cstr:conclusion_class
       [rdf:type owl:Class;
        owl:equivalentClass [rdf:type owl:Restriction;  owl:onProperty :parent;
                             owl:someValuesFrom :Person] ].

For the general descriptive version of this constraint, it is sufficient to replace “prescriptive” by “descriptive” in the previous representation. Here is a version where parents are restricted to be named individuals.

  [] rdf:type cstr:Descriptive_constraint;
     cstr:condition_class :Person;
     cstr:conclusion_class
       [rdf:type owl:Class;
        owl:equivalentClass [rdf:type owl:Restriction;  owl:onProperty :parent;
                             owl:someValuesFrom :Named_person] ].
  #with:
  :Named_person owl:equivalentClass 
     [rdf:type owl:Class;
      owl:intersectionOf ( :Person  owl:NamedIndividual )].

The negative constraint “if x is a Person_without_parent, x must not have a parent” may be translated into the inconsistency-implying form “x is a Person_without_parent =>> false”. Here is its representation as an individual-based constraint using the CSTR ontology.

  [] rdf:type cstr:Descriptive_constraint;  #this rdf:type relation is optional
     cstr:condition_class                   #  for a negative constraint
        [rdf:type owl:Class; 
         owl:equivalentClass [rdf:type owl:Restriction;  owl:onProperty :parent;
                              owl:maxCardinality  "0"^^xsd:nonNegativeInteger] ];
     cstr:conclusion_class owl:Nothing.

As referred to in the description of the “clones without types” based method in Section 2.1.2, here are examples of how to write inconsistency-implying constraints that are equivalent to the use of rdfs:domain or rdfs:range relations except that they are usable only for checking purposes. These examples respectively represent that “if a parent relation has something that is Not_a_person as source (alias, domain), this relation is incorrect” and “if a parent relation has something that is Not_a_person as destination (alias, range), this relation is incorrect”.

  [] cstr:condition_class    #constraint-based version of an rdfs:domain relation:
       [rdf:type owl:Class;
        owl:intersectionOf ( :Not_a_person
                             [rdf:type owl:Restriction;  owl:onProperty :parent;
                              owl:someValuesFrom owl:Thing ] ) ];
     cstr:conclusion_class owl:Nothing.

  [] cstr:condition_class   #constraint-based version of an rdfs:range relation:
       [rdf:type owl:Restriction;  owl:onProperty :parent;
                                   owl:someValuesFrom :Not_a_person];
     cstr:conclusion_class owl:Nothing.
  #with:
  [] rdf:type owl:AllDisjointClasses;  owl:members (:Person :Not_a_person).

2.3.3.2. SubclassOf-analogous Constraints

For the positive prescriptive constraint “if x is a Person, x must have a parent in the base of facts (prescriptive-must)”, using the CSTR ontology:

  :Person  #class and constraint condition
     cstr:prescriptive_constraint_conclusion
        [rdf:type owl:Restriction;  owl:onProperty :parent;
         owl:someValuesFrom :Person].

For the general descriptive version of this constraint, it is sufficient to replace “prescriptive” by “descriptive” in the previous representation. Here is a version where parents are restricted to be named individuals.

  :Person  #class and constraint condition
     cstr:descriptive_constraint_conclusion
       [rdf:type owl:Restriction;  owl:onProperty :parent;
        owl:someValuesFrom :Named_person]. #as previously defined

Here is the negative constraint “x is a Person_without_parent =>> false” represented as a subclassOf-analogous constraint using the CSTR ontology.

  :Person_without_a_parent  #class and constraint condition 
     owl:equivalentClass  #definition and constraint condition
        [rdf:type owl:Restriction;  owl:onProperty :parent;
         owl:maxCardinality  "0"^^xsd:nonNegativeInteger];
     cstr:prescriptive_constraint_conclusion  owl:Nothing.

Here are the subclassOf-analogous version for the constraints “a parent relation has something that is Not_a_person as source (alias, domain) =>> false” and “a parent relation has something that is Not_a_person as destination (alias, range) =>> false”.

  :Not-a-person_with_a_parent   #constraint-based version of an rdfs:domain relation:
     owl:equivalentClass  #constraint-based version of an rdfs:domain relation:
        [rdf:type owl:Class; 
         owl:intersectionOf ( :Not_a_person
                              [rdf:type owl:Restriction;  owl:onProperty :parent;
                               owl:someValuesFrom owl:Thing ] ) ];
     cstr:conclusion_class owl:Nothing.

  :Thing_having_for_parent_a_Not-a-parent  #constraint for an rdfs:range relation:
     owl:equivalentClass 
        [rdf:type owl:Restriction;  owl:onProperty :parent;
                                    owl:someValuesFrom :Not_a_person ];
     cstr:prescriptive_constraint_conclusion  owl:Nothing.

2.3.3.3. SubclassOf-based Constraints

For the positive prescriptive constraint “if x is a Person, x must have a parent in the base of facts (prescriptive-must)”, using the CSTR ontology:

  :Person  #class and constraint condition
     rdfs:subClassOf cstr:SubclassOf-based_prescriptive_constraint_condition;
     rdfs:subClassOf  #conclusion (types of relations that must be present):
        [rdf:type owl:Restriction;  owl:onProperty :parent;
         owl:someValuesFrom :Person].

For the general descriptive version of this constraint, it is sufficient to replace “prescriptive” by “descriptive” in the previous representation. However, as explained in Section 2.3.1, such descriptive versions require using an inference engine that takes into account the special meaning of cstr:SubclassOf-based_descriptive_constraint_condition. Here is a particular case where parents are restricted to be named individuals.

  :Person  #class and constraint condition
     rdfs:subClassOf cstr:SubclassOf-based_descriptive_constraint_condition;
     rdfs:subClassOf  #conclusion (types of relations that must be present):
       [rdf:type owl:Restriction;  owl:onProperty :parent;
        owl:someValuesFrom :Named_person]. #as previously defined

Using the CSTR ontology, the negative constraints of the previous subsection are identical to their subclassOf-analogous versions except for the fact that cstr:prescriptive_constraint_conclusion is replaced by rdfs:subClassOf.

3. Exploiting Relations-between-classes Based Constraints With a SPARQL-like language or With Javascript

Only Section 3.7 uses Javascript. All the other subsections use a SPARQL-like language, specifically SPARQL or LDScript. In some extensions of SPARQL, e.g. LDScript, the proposed next SPARQL-like commands (queries or update requests) can be grouped into scripts or functions. The commands of Section 3.1 are KB pre-treatments and hence update requests. The commands of the remaining subsections – except 3.7 – are queries. These queries are organized by kind and size. At first, the difficulty to understand them is directly related to their size. In SPARQL and LDScript, variable names begin by “?”.

3.1. Examples of KB Pre-treatments With SPARQL

3.1.1. Finding Named Individuals and Marking Them As Such

In its WHERE clause, the next command uses an heuristic to find each individual in an RDF compatible KB. The heuristic is that each “object ?o which i) has a type having a superclass, and ii) is not typed as a class” is considered an individual. The WHERE clause also checks that ?o is named, via the SPARQL function isIRI. The INSERT clause then types the found named individual ?o with owl:NamedIndividual. The used heuristic proved more reliable than simply looking for each object that is not a relation and that does not have rdfs:Class as type. With this other method, many conditions would then have to be added for filtering out objects such as i) classes defined via an equivalence to a restriction, and ii) owl:Thing and some other types from OWL, RDFS and XSD. This is why this heuritic is also used in every command presented below if it requires detecting that an object is an individual.

INSERT { ?o rdf:type owl:NamedIndividual } WHERE  
{ ?o rdf:type ?t.  #?o has a type ?t
  ?t rdfs:subClassOf ?superClass.  #?t has a superclass
  FILTER NOT EXISTS { ?o rdf:type rdfs:Class }  #?o is not a class
  FILTER isIRI(?o)  #?o is named with an IRI
  FILTER NOT EXISTS { ?o rdf:type owl:NamedIndividual } #?o not yet typed as owl:NamedIndividual
}

3.1.2. Temporarily Removing Individuals For Checking Constraints on Types With Classic Description Logic Inference Engines

The next command shows a reversible way to remove individuals – as well as subclassOf relations between their types – to allow classic Description Logic inference engines to handle user-defined relations between types, as explained in the paragraph before Figure 1 in Section 2.3. To find individuals the WHERE clause of this command uses the same heuristic as in the previous subsection. It does not rely on the type owl:NamedIndividual since this would tie this content-independent query to a particular OWL2 way to state that an object is a named individual. The DELETE clause removes the rdf:type relations from each found individual ?o and, from the destinations of these relations, removes the rdfs:subClassOf relations when they exist. As replacements for these rdfs:subClassOf relations, the INSERT clause sets cstr:type and cstr:subClassOf relations. These substitute relations later permit the re-assertion of the initial rdf:type and rdfs:subClassOf relations, when needed, via a similar command.

DELETE { ?o rdf:type  ?t .  ?t rdfs:subClassOf ?superClass } 
 INSERT{ ?o cstr:type ?t .  ?t cstr:subClassOf ?superClass }
 WHERE { ?o rdf:type  ?t .  ?t rdfs:subClassOf ?superClass 
         FILTER NOT EXISTS { ?o rdf:type rdfs:Class } 
       }

3.2. Checking Individual-based Constraints With a SPARQL-like language

3.2.1. Checking Individual-based Positive Descriptive Constraints

The next query lists every object violating at least one of the individual-based positive descriptive constraints. As shown by the code, such an object satisfies two conditions. First, this object matches – and hence has for type – the condition of a constraint ?posConstr that is of type cstr:Descriptive_constraint. Second, this object does not match – and hence has not for type – the conclusion of the constraint. Thus, this query requires a SPARQL engine with an entailment regime enabling the matching (alias, categorization) of an individual to a class expression and thence the deduction of an rdf:type relation between them. In the code of the commands in this Section 3, such deduced rdf:type relations are highlighted in bold. For inferencing completeness purposes, such deductions require an entailment regime able to handle the expressiveness used in the constraints and the rest of the KB.

SELECT ?objectNotMatchingPosConstr ?posConstr_condition ?posConstr_conclusion 
WHERE                  #"posConstr" abbreviates "positive constraint"
{ ?posConstr rdf:type cstr:Descriptive_constraint;
             cstr:condition_class  ?posConstr_condition;
             cstr:conclusion_class ?posConstr_conclusion. 
  FILTER (?posConstr_conclusion != owl:Nothing)
  ?objectNotMatchingPosConstr rdf:type ?posConstr_condition.  #matches condition
  FILTER NOT EXISTS  #objects satisfying the conclusion must NOT be listed
  { ?objectNotMatchingPosConstr rdf:type ?posConstr_conclusion }
}

In the rest of this Section 3, when a query that checks constraints has some code that has not been used in a previous command, this code is in italics.

3.2.2. Checking Individual-based Inconsistency-implying Constraints

The next query lists every object violating an individual-based inconsistency-implying constraint.

SELECT ?objectMatchingNegativeConstr ?negativeConstr_condition WHERE
{ ?negativeConstr cstr:condition_class  ?negativeConstr_condition;
                  cstr:conclusion_class owl:Nothing. 
  ?objectMatchingNegativeConstr rdf:type ?negativeConstr_condition.
}

3.2.3. Checking Individual-based Positive Prescriptive Constraints Via the Cloning (By Reification) of At Least KB0\Tbox When R is Exploited Via Forward Chaining Saturation

The next four subsections give a SPARQL-like implementation of the method described in Section 2.2.2.1. Two of the discussed variants are here supported.

3.2.3.1. Checking the Prescriptive Constraints With Already Created Clones

The next query assumes that i) the clones without inferred relation have already been created, e.g. via the SPARQL requests of Section 3.2.3.3 and Section 3.2.3.4 below, and ii) these clones are accessible from regular objects via relations of type cstr:cloneWithoutInferredRelation. In this query, only two lines differ from the query for descriptive constraints given in Section 3.2.1. The first is the line for accessing the clone. The second is the line testing if the clone – instead of the regular object – matches the constraint conclusion.

SELECT ?objectNotMatchingPosConstr ?posConstr_condition ?posConstr_conclusion WHERE
{ ?posConstr rdf:type cstr:Prescriptive_constraint;
             cstr:condition_class  ?posConstr_condition;
             cstr:conclusion_class ?posConstr_conclusion.
  FILTER (?posConstr_conclusion != owl:Nothing)
  ?objectNotMatchingPosConstr rdf:type ?posConstr_condition.  #matches condition
  ?objectNotMatchingPosConstr cstr:cloneWithoutInferredRelation ?clone 
  FILTER NOT EXISTS  #objects with clones satisfying the conclusion must not be listed
  { ?clone rdf:type ?posConstr_conclusion }
}

3.2.3.2. Checking the Constraints Via a Dynamic Creation of the Clones

Here is an LDScript query which, for each object satisfying the condition of a constraint, calls a function before testing if the clone of this object matches the conclusion of this constraint. This function generates the clone in a temporary graph ?g, via a CONSTRUCT command. Like SPARQL, LDScript does not permit the use of a same variable to relate the results of queries on different KBs or graphs – indeed, as explained in Section 2.1.2, a same object identifier can be used in two queries as a way to insert in the second query the result of the first, but a variable cannot be used for the same purpose. Thus, the conclusion matching test has to be performed within the generated graph. Hence, this graph cannot solely contains the clone of the initially tested object: it must also duplicate – and thus permits the access to – all the relations directly or indirectly accessible from this initial object, except for those which have been inferred, i.e. those that have no saving forms. Here, the saving forms are assumed to have been created via the KB pre-treatment specified in the next subsection although without also creating the clone (to that end, in this KB pre-treatment the line to remove is indicated via a comment). Since CONSTRUCT is used, not INSERT, running this command does not require the priviledge to modify the KB.

SELECT ?objectNotMatchingPosConstr ?posConstr_condition ?posConstr_conclusion WHERE
{ ?posConstr rdf:type cstr:Prescriptive_constraint;
             cstr:condition_class  ?posConstr_condition;
             cstr:conclusion_class ?posConstr_conclusion.
  FILTER (?posConstr_conclusion != owl:Nothing)
  ?objectNotMatchingPosConstr rdf:type ?posConstr_condition.  #matches condition
  #the two functions called in the next two lines are defined below
  BIND( cstr:createCloneOf(?objectNotMatchingPosConstr) as ?clone )
  BIND( cstr:createTemporaryGraphContainingTheGivenCloneAndTheWholeKBminusTheGivenObject
             (?clone,?objectNotMatchingPosConstr) as ?g )
  FILTER NOT EXISTS { GRAPH ?g { ?clone rdf:type ?posConstr_conclusion } }
}

FUNCTION cstr:createCloneOf (?object)  #the returned clone does not yet have relations
{ uri(concat(str(?object),"_cloneWithoutInferredRelation")) 
}

FUNCTION cstr:createTemporaryGraphContainingTheGivenCloneAndTheWholeKBminusTheGivenObject
              (?clone, ?objectNotMatchingPosConstr)
{ LET (?g = CONSTRUCT { ?clone ?r ?dest .  ?x ?r2 ?y } WHERE
            { VALUES ?clone { UNDEF }  #the clone is not predefined to be only certain objects
              ?objectNotMatchingPosConstr ?r ?y.  #anything reachable from the object
              ?x ?r2 ?y.  FILTER (?x != ?objectNotMatchingPosConstr) #only indirectly reachable
              ?m rdf:subject ?clone.  ?m rdf:predicate ?r.  ?m rdf:object ?dest.
              ?m rdf:type cstr:NonInferredRelation
            }) 
  { xt:entailment(?g) }  #triggers forward chaining saturation on ?g
}

This technique of generating a temporary graph and then querying within it could also be used for dynamically typing named individuals with owl:NamedIndividual. Thus, this technique could permit the checking of constraints restricted to named individuals without using the KB pre-treatement given in Section 3.1.1, hence without modifying the KB.

3.2.3.3. KB Pre-treatment Creating the Saving Forms and the Clones

For every object ?o in the whole KB, if ?o is an individual, the next command i) if needed, creates ?o2, a clone of ?o that has the same relations, and ii) reifies each of these relations to store the fact it is not an inferred relation. The object ?o2 has for identifier the one of ?o but with the suffix “_cloneWithoutInferredRelation”. For the sake of clarity, two assumptions were made when writing this command. One is that the KB does not include user-defined 2nd-order types. Another assumption is that KB0\Tbox is the set of individuals of the KB along with non-type relations associated to them. This is the case if the KB is RDFS based or OWL based.

INSERT  #the next line must be removed if the clones must not also be created 
{ ?o cstr:cloneWithoutInferredRelation ?o2.  ?o2 ?r ?dest. 
  ?m rdf:type cstr:NonInferredRelation. #this last type is subtype of rdf:Statement
  ?m rdf:subject ?o2.  ?m rdf:predicate ?r.  ?m rdf:object ?dest   #reification of ?r by ?m
} 
WHERE
{ ?o rdf:type ?t . FILTER NOT EXISTS { ?o rdf:type rdfs:Class }
              ?t rdfs:subClassOf ?superClass .                    #?o is a typed individual
  FILTER NOT EXISTS { ?o  cstr:cloneWithoutInferredRelation ?c1 } #?o must not have a clone
  FILTER NOT EXISTS { ?c2 cstr:cloneWithoutInferredRelation ?o }  #?o must not be a clone

  { #Case 1: supporting the cloning of each individual ?o having at least 1 relation of type
    #        ?r different from owl:sameAs (the INSERT clause adds the relations "?o2 ?r ?dest"
    #        from the clone and also reifies these relations)
    ?o ?r ?dest.  FILTER(?r!=owl:sameAs)
  }
  UNION #Case 2: preventing the useless cloning of an individual having only 1 relation when 
        #        this relation is of type owl:sameAs
  { ?o ?r ?dest  #?o has at least one relation from it
    FILTER NOT EXISTS {?o ?r2 ?dest2. FILTER(?r2!=owl:sameAs) }
  }

  BIND( uri(concat(str(?o),"_cloneWithoutInferredRelation")) as ?o2 )
  #the identifier of ?m incudes those of ?o2, ?r and ?dest:
  BIND( uri(concat(str(?o2),concat("_",
            concat(str(?r),concat("_",
            concat(str(?dest),"_reification")))))) as ?m )
}

3.2.3.4. Removing Inferred Relations From Clones

This DELETE command is supposed to be run after forward chaining saturation is performed and before running a query for checking prescriptive constraints. This may be done manually or, at least with LDScript, by using a script that runs this command just before running the query. This command removes the inferred relations associated to the clones, i.e. the relations which have no reification stating they are not inferred.

DELETE { ?o2 ?r ?dest } WHERE
{ ?o  cstr:cloneWithoutInferredRelation ?o2 .  ?o2 ?r ?dest. 
  FILTER NOT EXISTS { ?m rdf:subject ?o2.  ?m rdf:predicate ?r.  ?m rdf:object ?dest.
                      ?m rdf:type cstr:NonInferredRelation }
}

3.2.4. Checking Individual-based Positive Prescriptive Constraints Via the Cloning of At Least KB0\Tbox Using “Clones Without Types”

This section is about a SPARQL-like implementation of the method described in Section 2.2.2.2. However, as explained in that subsection, this one has only one subsection. One reason is that the SPARQL query given in Section 3.2.3.1 also works for this method.

3.2.4.1. KB Pre-treatment Creating “Clones Without Types”

The next command is similar to the one in Section 3.2.3.3. The differences are that, when treating an object, the next command does not also reify it and does not copy its type into the clone. This last difference is highlighted via the use of italic characters.

INSERT { ?o cstr:cloneWithoutInferredRelation ?o2 . ?o2 ?r ?dest } WHERE
{ ?o rdf:type ?t.  FILTER NOT EXISTS { ?o rdf:type rdfs:Class }
              ?t rdfs:subClassOf ?superClass.                     #?o is a typed individual
  FILTER NOT EXISTS { ?o  cstr:cloneWithoutInferredRelation ?c1 } #?o must not have a clone
  FILTER NOT EXISTS { ?c2 cstr:cloneWithoutInferredRelation ?o }  #?o must not be a clone

  { #Case 1: supporting the cloning of each individual ?o having at least 1 relation of type
    #        ?r different from owl:sameAs and rdf:type (the INSERT clause adds the relations
    #        "?o2 ?r ?dest" from the clone)
    ?o ?r ?dest.  FILTER(?r!=owl:sameAs)  FILTER(?r!=rdf:type)
  }
  UNION #Case 2: preventing the useless cloning of an individual having only 1 relation when 
        #        this relation is of type owl:sameAs or rdf:type
  { ?o ?r ?dest  #there is at least one relation from ?o 
    FILTER NOT EXISTS {?o ?r2 ?dest2.  FILTER(?r2!=owl:sameAs)  FILTER(?r2!=rdf:type)  }
  }

  BIND( uri(concat(str(?o),"_cloneWithoutInferredRelation")) as ?o2 )
}

3.2.5. Checking Individual-based Positive Prescriptive Constraints Via the Creation of Saving Forms for R and the Creation of a Temporary Copy of the KB without R When All Inferences are Performed Via Backward Chaining

This section is about a SPARQL-like implementation of the method described in Section 2.2.2.3.

3.2.5.1. KB Pre-treatment Creating Saving Forms For R In an OWL based KB

In a OWL based KB, R is the set of restrictions and KB0\R is KB0 where the body of each restriction is empty. The next command reifies the relations of such bodies.

INSERT{ ?m rdf:subject ?restr.  ?m rdf:predicate ?r.  ?m rdf:object ?dest.  #reification of ?r
        ?m rdf:type cstr:RelationFromPropertyRestriction. #subtype of rdf:Statement
      }
WHERE { ?restr rdf:type owl:Restriction.  ?restr ?r ?dest.  FILTER(?r != owl:equivalentClass)
        #the identifier of ?m incudes those of ?o2, ?r and ?dest:
        BIND( uri(concat(str(?restr),concat("_",
                  concat(str(?r),concat("_",
                  concat(str(?dest),"_reification")))))) as ?m )
      }

3.2.5.2. Checking the Prescriptive Constraints Via the Checking of Conclusions Within a Temporary Copy of (a Relevant Part of) the KB without R

Here is an LDScript query which, to test if an object matches a constraint, i) tests if the object matches the condition of the constraint, and if so, ii) via a CONSTRUCT sub-command, temporarily creates a copy of all the relations that can be directly or indirectly reached from the object, except for the relations reified via the command in the previous subsection, and iii) tests if the object matches the conclusion of the constraint within the temporary graph.

SELECT ?objectNotMatchingPosConstr ?posConstr_condition ?posConstr_conclusion WHERE
{ ?posConstr rdf:type cstr:Prescriptive_constraint;
             cstr:condition_class  ?posConstr_condition;
             cstr:conclusion_class ?posConstr_conclusion.
  FILTER (?posConstr_conclusion != owl:Nothing)
  ?objectNotMatchingPosConstr rdf:type ?posConstr_condition.  #matches condition
  #the function called in the next line is defined below
  BIND( cstr:createTemporaryGraphContainingTheKBminusR(?objectNotMatchingPosConstr) as ?g )
  FILTER NOT EXISTS { GRAPH ?g { ?objectNotMatchingPosConstr rdf:type ?posConstr_conclusion } }
}

FUNCTION cstr:createTemporaryGraphContainingTheKBminusR (?objectNotMatchingPosConstr)
{ LET (?g = CONSTRUCT { ?objectNotMatchingPosConstr ?r ?y } WHERE
            { ?objectNotMatchingPosConstr ?r ?y.  #anything reachable from the object
              FILTER NOT EXISTS { ?m rdf:subject ?x.  ?m rdf:predicate ?r.  ?m rdf:object ?y.
                                  ?m rdf:type cstr:RelationFromPropertyRestriction }
            }) 
}

3.3. Checking SubclassOf-analogous Constraints With a SPARQL-like language

The usable content-independent queries here are identical to their counterparts in Section 3.2 except for the initialization of ?posConstr_condition and ?posConstr_conclusion since now they are related by a cstr:descriptive_constraint_conclusion relation or a cstr:prescriptive_constraint_conclusion relation. E.g., here is a query for checking subclassof-analogous positive descriptive constraints. See the line in italics for the new initialization.

SELECT ?objectNotMatchingPosConstr ?posConstr_condition ?posConstr_conclusion WHERE
{ ?posConstr_condition cstr:descriptive_constraint_conclusion ?posConstr_conclusion.
  FILTER (?posConstr_conclusion != owl:Nothing)
  ?objectNotMatchingPosConstr rdf:type ?posConstr_condition.  #matches condition
  FILTER NOT EXISTS  #objects satisfying the conclusion must NOT be listed
  { ?objectNotMatchingPosConstr rdf:type ?posConstr_conclusion }
}

There are other ways to write the queries. For example:

“FILTER (?posConstr_conclusion != owl:Nothing)” could be replaced by “FILTER NOT EXISTS { ?posConstr_conclusion cstr:prescriptiveConclusion owl:Nothing }”. The first way has the advantage of not being dependent on the chosen representation for constraint and hence this way minimizes the difference between the queries. On the other hand, with this way, owl:Nothing cannot be replaced by equivalent class expressions (in SPARQL).
In Section 3.2.3.1 and Section 3.2.3.4, the line “?objectNotMatchingPosConstr cstr:cloneWithoutInferredRelation ?cloneWithoutInferredRelation” before “FILTER NOT EXISTS” could be replaced by the line “BIND( uri( concat( str(?objectNotMatchingPosConstr), "_cloneWithoutInferredRelation" ) ) as ?cloneWithoutInferredRelation )” within the “FILTER NOT EXISTS” block.

3.4. Checking SubclassOf-based Constraints With a SPARQL-like language

The previous queries do not rely on inference engines to take into account the special meaning of CSTR classes. Hence, as explained in Section 2.3.1, these queries cannot be adapted for checking subclassOf-based constraints representing positive descriptive constraints. For prescriptive constraints, the queries are the same as their counterparts in Section 3.2 except for the initialization of ?posConstr_condition and ?posConstr_conclusion. E.g., for a positive prescriptive constraint, this initialization now is:

  ?posConstr_condition rdfs:subClassOf cstr:SubclassOf-based_prescriptive_constraint_condition,
                                       ?posConstr_conclusion.

The next query lists every object violating an individual-based inconsistency-implying constraint. Such an object matches – and hence has for type – a class ?negConstr_condition which i) has owl:Nothing as conclusion, and optionally ii) is subclass of the type cstr:SubclassOf-based_constraint_condition. If this previous type is not used in the negative constraints of the KB, this second condition must actually not be used in the query.

SELECT ?objectMatchingNegConstr ?negConstr_condition WHERE
{ ?negConstr_condition rdfs:subClassOf owl:Nothing;
                       rdfs:subClassOf cstr:SubclassOf-based_constraint_condition.  #optional line
  ?objectMatchingNegConstr rdf:type ?negConstr_condition.
}

Except as a module for calculating the completeness degree of a KB, this previous query is useless if, when building the KB, its consistency is already checked by an inference engine that delivers an error message when detecting that an object is instance of a subclass of owl:Nothing. By default, some Description Logic inference engines such as Corese [Corby, Faron-Zucker & Gandon, 2017] do not deliver error messages or warning messages when detecting such objects. Having to make inferences on instances of a subclass of owl:Nothing also makes Corese behaves abnormally. E.g., when an individual violates a subclassOf-based inconsistency-implying constraint – and hence is instance of owl:Nothing – as well as a positive constraint, this object does not appear in the results of the above described queries searching objects violating positive constraints. Thus, as noted in Section 2.3.1, using subclassOf-based constraints is not relevant for general-purpose knowledge sharing.

3.5. Checking Binary Relations Instead of Individuals With a SPARQL-like language

To list binary relations violating constraints – instead of individuals that have some relations violating constraints – it is sufficient to replace rdf:type by the “logical implication relation between statements” in the previous content-independent queries that check positive constraints. For referring to such relations, Tim Berners-Lee uses the type name log:implies [Berners-lee et al., 2008] in his Notation3 KRL. However, for this replacement to work, the used SPARQL engine must exploit an inference engine that can deduce the existence of such a relation when it exists between the matched statements. Description Logic inference engines generally do not do so.

Like queries on individuals, queries on relations can use additional filters. E.g., for the last query (Section 3.4) to operate only on negative facts, one may add at the end of its body:

?objectMatchingNegativeConstr rdf:type owl:NegativePropertyAssertion.

3.6. Evaluating a Constraint-based Completeness With a SPARQL-like language

A simple way to define or calculate a completeness degree for a KB is to divide “the number of relations (in the KB) that do not violate constraints” by “the total number of relations”. Another completeness degree may be obtained by considering only prescriptive constraints. Yet another one may be obtained by dividing “the number of individuals that do not violate prescriptive constraints” by “the total number of individuals”. The next query implements a variant of this last definition: instead of individuals, this query exploits “objects that are source of at least one relation to another object”. Furthermore, this query assumes that the constraints are represented as individual-based constraints. It also considers constraints in inconsistency implying form as (negative) prescriptive constraints. Via the method given in Section 3.5, this query can be adapted to exploit binary relations violating constraints instead of individuals violating constraints.

SELECT ( ((?nbObjs - ?nbAgainstPosCs - ?nbMatchingNegCs) / ?nbObjs)
         AS ?completeness )
{ {SELECT (COUNT(DISTINCT ?o) AS ?nbObjs)
    WHERE { ?o ?r ?o2 } #any object source of a relation to another object
        # For considering only objects that have a type:
        # { {?o rdf:type ?t1} UNION {?o cstr:type ?t2} }
  }
  {SELECT(COUNT(DISTINCT ?objectNotMatchingPosConstr) AS ?nbAgainstPosCs)
    WHERE { ... #the body of a query checking an individual-based positive
                #  prescriptive constraint (see Section 3.2.3.1) must be copied here

            #if ?objectNotMatchingPosConstr also violates a negative constraint
            #  it must not be counted here (otherwise it would be counted twice),
            FILTER NOT EXISTS            # hence this code here
            { ?negativeConstr cstr:condition_class ?negativeConstr_condition;
                              cstr:conclusion_class owl:Nothing.
              ?objectNotMatchingPosConstr rdf:type ?negativeConstr_condition
            }
          }
  }
  {SELECT (COUNT(DISTINCT ?objectMatchingNegativeConstr) AS ?nbMatchingNegCs)
    WHERE { ?negativeConstr cstr:condition_class ?negativeConstr_condition ;
                       cstr:conclusion_class owl:Nothing.
            ?objectMatchingNegativeConstr rdf:type ?negativeConstr_condition
          }
  }
}

3.7. JavaScript Function for Ontology Completeness Evaluation

The next function shows a way to list each constraint violation and calculate the completeness of the ontology wrt. these violations. There are however a few differences with the command given in the previous subsection.

The completeness is now given i) with respect to each constraint in a particular set, not with respect to all prescriptive constraints of the KB, and ii) for the instances of a particular set of classes, not all individuals.
Error handling is included.
The method specializes is called to perform the matching between relations associated to an object and relations in a constraint condition or conclusion, i.e., to test if each relation in the second set of relations has a specialization in the first set (details on such a specialization test can be found in [Chein & Mugnier, 2008] and [Swan, 2016]). This method can ignore the instanceOf relations from the handled object. It does so when the checked constraint is a prescriptive one, i.e., if its parameter is instance of cstr:Prescriptive_constraint. In other words, for such constraints this method is based on the same idea as the one based on “clones without types” but does not need to generate such clones. With such a method, the code below needs not include a temporary removal of instanceOf relations from the objects.

JavaScript is here used instead of pseudo-code since its syntactic sugar is rather well-known or intuitive and is object-oriented. This previous feature is required here to have a code that is short, clear and generic with respect to error handling. The objects in the ontology are assumed to be accessible in the object-oriented way used in the code.

The complexity of the next function is the complexity of the object matching multiplied by the number of given constraints, multiplied by the number of given classes, multiplied by the average number of instances by class.

function checkThatInstancesOfTheseClassesSatisFyTheseConstraints
         (setOfClasses,     //e.g.: {Man, Woman}
          setOfConstraints, //e.g.: {"if x is a Person, x has a parent"}
          errorHandling) //contains a class for generic error handling
{ if (setOfConstraints.length() == 0)
    return errorHandling.returnCodeIfNoPropertyToCheck;
  if (setOfClasses.length() == 0)
    return errorHandling.returnCodeIfNoClassToCheck;
  for co in setOfConstraints  //co may or may not be inconsistency-implying
  { var numberOfObjsSatisfyingCo= 0, numberOfObjsNotSatisfyingCo= 0;
    for cl in setOfClasses
      for obj in cl.instances()
        if (obj.specializes(co.condition))
          if ((co.isPositiveConstr() && obj.specializes(co.conclusion))
            numberOfObjsSatisfyingCo++;
          else { numberOfObjsNotSatisfyingCo++;
                 if (errorHandling.alertAtEachError) 
                   alert("Relations from '" + obj.toString() + 
                         "' do not satisfy:\n  " + co.toString());   
                  //e.g.: Relations from 'Tom' do not satisfy:
                } //        if x is a Person, x has a parent
    var completenessForThatConstraint= numberOfObjsSatisfyingCo /
               (numberOfObjsSatisfyingCo + numberOfObjsNotSatisfyingCo);
    if (completenessForThatConstraint < 1)
    { if (errorHandling.alertAtEachIncompleteness)
        alert("Only " + completenessForThatConstraint + "% of instances"
              + " of the given classes satisfy the constraint:\n  " + 
              + co.toString());
      if (errorHandling.returnCodeIfIncompletenessForSomeConstraint)
        return errorHandling.returnCodeIfIncompletenessForSomeConstraint;
    }
  }
  return errorHandling.returnCodeIfNoError;
}

4. Illustration of Applications (or Use Cases) and Limits of the Approach By Showing How To Represent Some Important Ontology Design Patterns Via Relations-between-classes based Constraints

Ontology design patterns or best practices (ODPs), e.g. those recommended by the W3C or those of the “ODP catalog”, are i) informal descriptions about how certain things should be represented, and/or ii) collections of types that should be reused whenever possible, or iii) lexical or syntactic rules to follow when importing or exporting formal or informal knowledge. As shown in Section 2, descriptive or prescriptive constraints are ways to represent two meanings of “must be reused whenever possible” and hence ways to formalize and implement ODPs related to the second point. The next subsections show how this can be done with the proposed approach. To seek the limits of this approach and to be as generic as possible, the used example ODPs are not tied to a particular ontology. If not directly re-used for a particular ontology, these examples can be used as templates for writing simpler constraints for particular ontologies.

4.1. Examples of Useful General Descriptive Constraints

4.1.1. Checking or Evaluating the Design of a Subtype Hierarchy

4.1.1.1. Approach and Rationale

Several research works in knowledge acquisition, model-driven engineering or ontology engineering, e.g. [Marino, Rechenmann & Uvietta, 1990] [Bachimont, Isaac & Troncy, 2002] [Dromey, 2006] [Rector et al., 2012], have advocated the use of tree structures when designing a subtype hierarchy, hence the use of i) single inheritance only, and ii) multiple views or viewpoints, one for each tree. Indeed, they argue that objects have a unique place in such trees and hence that such trees can be used as decision trees or ways to avoid redundancies, normalize KRs and ease KR search/handling. This is true but, for the same reasons and as further detailled in the next paragraph, the same advantages can be obtained by subtyping solely via either complete sets of exclusive (direct) subtypes – i.e. subtype partitions, e.g. via the use of disjoint unions of types – or incomplete sets of exclusive subtypes. Furthermore, this last method has the additional advantages of i) still allowing multi-inheritance, ii) not requiring a special inference engine to handle views (e.g. [Marino, Rechenmann & Uvietta, 1990] [Djakhdjakha, Hemam & Boufaïda, 2014]), iii) being less cumbersome for knowledge providers since with the last method they do not have to specify or use views each time they create subtypes, and iv) when this can be done, specifying that a set of (direct) subtypes is complete supports interesting inferences for checking or reasoning purposes. The fact that OWL-EL, OWL-RL or OWL-QL do not include disjoint unions of types is not a reason for not using them in ontologies designed for knowledge sharing purposes: this fact only means that, before using such ontologies with inference engines dedicated to OWL-EL, OWL-RL or OWL-QL, some expressiveness has to be lost, e.g. by automatically converting each combination of disjoint union and class equivalence into a combination of class disjointness and subtyping.

Indeed, an aside contribution of this article is to note that in order to keep the above cited advantages of the tree structures, it is sufficient (and necessary) that whenever two types are exclusive – i.e., whenever they must not share common subtypes or instances – the exclusion relation that links these two types must be specified, explicitly or implicitly. With tree structures, such exclusion relations are implicit but still specified. By systematically using sets of exclusive subtypes, all exclusion relations can be easily and explicitly specified, e.g. via the use of binary partitions such as “Spatial_entity / Non-spatial_entity”. Exclusion relations are a simple and efficient way to support ontology checking and knowledge retrieval. For example, when searching for particular “achievements” in a KB, people may not know which exact type to use and then which query to use: whether a query for specializing objects, for generalizing ones, for doing both with a maximal number of specializing/generalizing relations, etc. In this case, it may be easier for them to search for instances of all types not exclusive with some type they have in mind. This query may well collect all and only the objects they are interested in, if the ontology is systematically structured using sets of exclusive subtypes.

However, there are some kinds of types for which using only subtype partitions – or sometimes even just sets of exclusive subtypes – is cumbersome. An example for these kinds is the one of non-natural types [Guizzardi, 2005] [Mizoguchi, Kozaki & Kitamura, 2012] such as role types and phase types. Examples of general role types are Agent, Experiencer, Recipient, Cause and Instrument. Examples of more specific ones are Driver, Employee and King. Examples of phase types are Beginning and Child. Examples of role type partitions are “Cause / Non-cause” and “King / Non-king”. Systematically specifying such partitions is often cumbersome and useless since role types such as Non-cause and Non-king will probably not be instantiated. Always using sets of exclusive subtypes for non-natural types is also sometimes cumbersome, e.g. when only one subtype is known or relevant to represent. The situation is similar for some other kinds of types – such as those for attributes, information objects or relations – for which the distinction between “natural type” and “non-natural type” is hard to make or not relevant.

Thus, even though systematically using subtype partitions or sets of exclusive subtypes is an interesting ontology design pattern (ODP), for practical purposes this ODP should be generalized to include options allowing ontology designers not to use exclusive subtypes in certain circumstances, as long as such choices are explicitly represented. To that end, first considering only subclasses (not all subtypes), the use of certain subtypes of sub:subclass (inverse of rdfs:subClassOf) seems the easiest way compatible with OWL and the proposed approach. This is why this article introduces the relation types sub:nonNaturalSubclass and sub:loneSubclass. The prefix “sub:” is an abbreviation for the namespace http://www.webkb.org/kb/it/SUB. SUB is the ontology declaring and organizing the types proposed in this article. The next subsection concludes with the representation of a constraint that enforces the above cited generalized ODP (restricted to subclasses). This ODP, which from now on is called the “Subclass ODP”, works by checking that each used subclassOf relation has for type a certain subtype of rdfs:subClassOf. The relation types sub:nonNaturalSubclass and sub:loneSubclass can be used for (exceptionally) not using exclusive subclasses. The first type also implies that the source of the relation is a non-natural class.

When – and only when – an ontology using such subtypes of rdfs:subClassOf has to be exploited by an inference engine that cannot handle such subtypes, a pre-treatment that expands the definition of these subtypes – and thus removes these subtypes – must first be applied on the ontology.

The next subsection is only about the Subclass ODP. This article does not propose a similar constraint for a “Subproperty ODP" to support the checking of relation subtypes. Indeed, RDFS+OWL provides the rdfs:subPropertyOf and owl:propertyDisjointWith but does not allow the use of owl:disjointUnionOf or owl:unionOf on properties.

4.1.1.2. Implementation of the Subclass ODP with OWL2

Defining sub:subclassInExclusionSetOrAlone, the supertype of the subtyping relation types allowed by the Subclass ODP. For the reasons given in Section 2.3.3, the Turtle notation is used.

sub:subclass    owl:inverseOf rdfs:subClassOf.
sub:subProperty owl:inverseOf rdfs:subPropertyOf.
sub:subclassInExclusionSetOrAlone 
  rdfs:subPropertyOf sub:subclass; 
  sub:subProperty sub:subclassInDisjointUnion  #for a subclass in a subtype partition
                  sub:subclassInSetOfExclusiveSubclasses
                  sub:loneSubclass #for a subclass not in a set of exclusive subtypes
                  sub:nonNaturalSubclass.  #idem but for a non natural type

If the last three lines are removed from this definition, the constraint for the Subclass ODP will check that all subclasses are defined only via subtype partitions. In other words, the last three lines provide three more subtyping options.

Defining sub:nonNaturalSubclass. With the following full definition of this relation type, it does not have to be used explicitly for sub:subclassInExclusionSetOrAlone relations to be inferred: rdfs:subClassOf relation can also be used instead as long as all non-natural classes are categorized as subtypes of sub:NonNaturalClass.

sub:nonNaturalSubclass rdfs:subPropertyOf sub:subclassInExclusionSetOrAlone;
                       rdfs:domain owl:Class;   rdfs:range sub:NonNaturalClass.

Defining sub:subclassInDisjointUnion. If the following full definition of this relation type sub:subclassInDisjointUnion was correct, it would not have to be used explicitly for sub:subclassInExclusionSetOrAlone relations to be inferred: owl:disjointUnionOf relations or equivalent representations could be used instead.

sub:subclassInDisjointUnion
  owl:propertyChainAxiom (owl:equivalentClass  owl:disjointUnionOf  rdfs:member).

The problem with this definition is that an OWL list of disjoint classes is not source of rdfs:member relations to these classes: these classes are supposed to be accessed sequentially from the first (via an rdf:first relation) to the last (via a sequence of rdf:rest relations followed by a rdf:first relation). Since OWL does not allow regular expressions or recursive definitions, it does not allow a full definition of membership based on rdf:rest and rdf:first relations. Thus, this article proposes the following three solutions to support the checking of sub:subclassInExclusionSetOrAlone relations and equivalent representations, via the proposed content-independent queries.

One solution is to define membership in a way that is partial but sufficient for correctly handling the content of the KB, as illustrated here:

sub:listMember
  sub:subclass sub:listMember1, sub:listMember2, sub:listMember3. #and so on if needed

sub:listMember1 owl:equivalentProperty rdf:first.
sub:listMember2 owl:propertyChainAxiom (rdf:rest rdf:first).
sub:listMember3 owl:propertyChainAxiom (rdf:rest rdf:rest rdf:first). #and so on if needed

Then, the following definition of sub:subclassInDisjointUnion is sufficient. However, these definitions are not within OWL2-RL, OWL2-EL and OWL2-QL since owl:disjointUnionOf is not within them.

sub:subclassInDisjointUnion
  owl:propertyChainAxiom (owl:equivalentClass  owl:disjointUnionOf  sub:listMember).

Another solution is to use a KRL that permits a full definition of sub:listMember. Representations that use this type can still be translated into OWL2 by using the above kind of “partial but sufficient” representations.

A third solution is to pre-treat the KB for adding sub:subclassInDisjointUnion relations based on owl:disjointUnionOf relations in the KB. Here is a SPARQL1.1 update request supporting this.

INSERT { ?st1 sub:subclassInDisjointUnion ?t.
         ?st2 sub:subclassInDisjointUnion ?t } WHERE
{ ?t owl:equivalentClass [rdf:type rdfs:Class;
                          owl:disjointUnionOf/rdf:rest*/rdf:first ?st1;
                          owl:disjointUnionOf/rdf:rest*/rdf:first ?st2]
  FILTER (?st1 != ?st2)
}

Defining sub:subclassInSetOfExclusiveSubclasses. If the following full definition of this relation type was correct, it would not have to be used explicitly for sub:subclassInExclusionSetOrAlone relations to be inferred: owl:disjointUnionOf relations or equivalent representations could be used instead.

sub:subclassInSetOfExclusiveSubclasses  owl:propertyChainAxiom 
    (sub:subclass  owl:equivalentClass  owl:disjointUnionOf  rdfs:member).

The previous paragraph explained why the use of rdfs:member relations is incorrect and how there are three solutions to fix it. In the current case, the partial but sufficient definition is:

sub:subclassInSetOfExclusiveSubclasses  owl:propertyChainAxiom 
    (sub:subclass  owl:equivalentClass  owl:disjointUnionOf  sub:listMember).

Defining the constraint for the Subclass ODP. A positive descriptive constraint can now be defined to ensure that “if there is a subclass relation between two classes, this relation must be of type sub:subclassInExclusionSetOrAlone”. An equivalent constraint, albeit class-focused instead of relation-focused, is: “if a class C1 has a subclass relation to a class C2, this relation must be of type sub:subclassInExclusionSetOrAlone”. However, OWL does not allow the use of variables to refer to the relation or to C2. The OWL compatible version of this constraint is then “if a class C1 has a subclass relation, all subclass relations from C1 must be of type sub:subclassInExclusionSetOrAlone”. If “some” was used instead of “all”, not all the subclass relations from C1 would be checked. Using OWL2 and the CSTR types, here is a way to write this constraint without having to define a second-order class. The type owl:AllDisjointClasses is allowed in OWL2-EL, OWL2-QL and OWL2-RL. However, although an OWL2-RL inference engine can check if an expression using this type is respected, this engine cannot use this type for inferring whether or not an object is instance of the conclusion class below.

[] rdf:type cstr:Descriptive_constraint;
   cstr:condition_class      # if C is a class that has a subclass ...
      [rdf:type owl:Class; 
       owl:equivalentClass [rdf:type owl:Restriction;  #"any class that has a subclass"
                            owl:onProperty sub:subclass;  owl:someValuesFrom rdfs:Class] ];
   cstr:conclusion_class  #... then C has no subclass relation that is not
                          #            of type sub:subclassInExclusionSetOrAlone
      sub:ClassWithNoRelationOfType_subclassButNot-subclassInExclusionSetOrAlone.
           #(=> all subclass relations of type sub:subclassInExclusionSetOrAlone)

#with:
[] rdf:type owl:AllDisjointClasses;
   owl:members  #this relation type is usable even though rdfs:member is not
                #  usable for accessing each member of the destination list!
     (sub:ClassWithNoRelationOfType_subclassButNot-subclassInExclusionSetOrAlone
      sub:ClassWithSomeRelationOfType_subclassButNot-subclassInExclusionSetOrAlone).

sub:ClassWithSomeRelationOfType_subclassButNot-subclassInExclusionSetOrAlone
   owl:equivalentClass
      [rdf:type owl:Restriction;
       owl:onProperty sub:subclassButNot-subclassInExclusionSetOrAlone;
       owl:someValuesFrom owl:Class].

sub:subclassButNot-subclassInExclusionSetOrAlone
   rdfs:subPropertyOf sub:subclass;
   owl:propertyDisjointWith sub:subclassInExclusionSetOrAlone.

This constraint can also be translated into a negative constraint and represented in inconsistency-implying form. That way, an OWL2-RL inference engine can be used for checking this constraint.

[] rdf:type cstr:Descriptive_constraint;
   cstr:condition_class  # if C is a class that has a subclass relation that is not
                         #               of type sub:subclassInExclusionSetOrAlone
      sub:ClassWithSomeRelationOfType_subclassButNot-subclassInExclusionSetOrAlone];
   cstr:conclusion_class owl:Nothing.  #... then there is a problem

4.1.2. Generalization To Check the Use of Any Transitive Relation

The above constraint seems worth being generalized and applied to all transitive relations used in an ontology, e.g. part relations. Indeed, the reasons why the destinations of subtype relations are worth being organized into subtype partitions are also reasons why the destinations of part relations are worth being organized into part partitions. The definition of a part partition for an object mirrors the one for a subtype partition: it is a complete set of disjoint parts of the object; “disjoint” meaning that the parts cannot share common subparts, and “complete” meaning that the union of the parts is equal to the object.

Since OWL does not allow the use of variables referring to relation types, OWL does not enable the specification of a unique constraint for all transitive relations used in an ontology. Hence, for each transitive relation type that needs to be checked, a specific constraint must be created. However, this creation can be automated using the next SPARQL command. It takes as a parameter each instance of owl:TransitiveProperty specified in the ontology, i.e. each transitive relation type ?trRelType, e.g. sub:subclass and sub:part which is the inverse of the general partOf relation type. From each ?trRelType, via a sub:aloneOrInPartitionTrRelType relation the command accesses the supertype of recommended relation types to use when representing ?trRelType relations. For example, for sub:subclass relations, the previous section showed that this recommended supertype can be sub:subclassInExclusionSetOrAlone. If subtyping should rather be done only via subtype partitions, the recommended supertype would rather be sub:subclassInPartition (same definition as the previous type but without the last three lines). For sub:part relations, the recommended supertype can be sub:partInPartition.

INSERT 
{ [] rdf:type cstr:Descriptive_constraint;
     cstr:condition_class  #class for the source of a transitive relation
        [?TrRelTypeDomain  owl:equivalentClass 
            [rdf:type owl:Restriction;   #class source of a transitive relation
             owl:onProperty ?trRelType;  #e.g., a sub:part relation
             owl:someValuesFrom owl:Thing] ]; #any thing (type or individual)
     cstr:conclusion_class
        [?conclusion_class owl:equivalentClass  
            [rdf:type owl:Restriction;
             owl:onProperty ?aloneOrInPartitionTrRelType; #e.g., sub:partInPartition
             owl:someValuesFrom owl:Thing] ]
}
WHERE{ ?trRelType rdf:type owl:TransitiveProperty;
                  sub:aloneOrInPartitionTrRelType ?aloneOrInPartitionTrRelType;
                  rdfs:domain ?TrRelTypeDomain  #e.g., owl:Thing (see below)
     }

Thus, based on the next partial definitions of sub:subclass and sub:part, the above command generates i) a descriptive constraint for checking sub:part relations, and ii) a descriptive constraint for checking sub:subclass relations. This last generated constraint is different from the one given in the previous subsection, for two main reasons.

These generated constraints only support the checking of direct uses of recommended transitive relations, not the checking of alternative but equivalent ways to represent these recommended relations in OWL. The reason is that OWL permits a full definition of sub:subclassInExclusionSetOrAlone (or, more exactly, a partial but sufficient definition of this type) but OWL does not permit a similar definition for sub:partInPartition nor for any other transitive type. However, if needed, sub:partInPartition relations and other transitive relations can be generated via SPARQL update requests similarly to the way relations of type sub:subclassInDisjointUnion can be generated, as seen in the previous subsection.
Unlike all the other examples of constraints in this article, these generated constraints have to be checked using the log:implies based approach described in Section 3.5. Otherwise, these constraints are ambiguous, like the OWL compatible version in the previous if “some” was used instead of “all”. However, as noted, Description Logic inference engines generally do not deduce log:implies relations between statements (triples, ...) and hence cannot be reused with this solution.

sub:subclass rdf:type owl:TransitiveProperty;   rdfs:domain rdfs:Class;
             sub:aloneOrInPartitionTrRelType sub:subclassInExclusionSetOrAlone.

sub:part rdf:type owl:TransitiveProperty;   sub:trRelDomain owl:Thing;
         sub:aloneOrInPartitionTrRelType sub:partInPartition.

The use of owl:Thing in this last example may be a problem for some classic Description Logic inference engines. For example, owl:Thing cannot be used as source or destination of an rdfs:subClassOf relation in OWL-RL [OWL 2 profiles]. It is therefore preferable to use a class other than owl:Thing as the domain of a transitive relation type. It is also often easy and relevant, as in using sub:Spatial_entity for the domain of sub:spatial_part.

4.2. Examples of Useful General Prescriptive Constraints

4.2.1. Checking that Certain Relation Types Are Systematically Used From or To Objects of Certain Types

The previous examples of generated constraints were descriptive: they stated that if transitive relations of certain general types are used, these relations must also be of certain specific types. A prescriptive constraint is needed for checking that if certain kinds of relation types are declared in an ontology, every object instance of the domain (resp. range) of such relation types must be the source (resp. destination) of such relations. To support the generation of such constraints, one must first indicate which these relation types are, e.g. by typing them with the 2nd-order type sub:MandatoryOutRelationType (or sub:MandatoryInRelationType). From such relation types, relations of type sub:leafObjectType can also be used for indicating the type of objects (classes or individuals) that, as an exception, must not be checked. As an example, the following representation states that every dividable object – i.e. every instance of sub:DividableThing – must be the source of a sub:part relation except for each object instance of sub:PartDestLeaf. By stating such an exception, ontology creators can make explicit their choice of not using a certain object as source of a sub:part relation even when it is possible.

sub:part rdf:type sub:MandatoryOutRelationType;  rdfs:domain sub:DividableThing;
         sub:leafObjectType sub:PartDestLeaf;

Similarly, for specifying that every class not marked as a subclass leaf must have a subclass, one can write:

sub:subclass rdf:type sub:MandatoryOutRelationType;   rdfs:domain owl:Class.
             sub:leafObjectType  sub:SubclassDestLeaf;

Below is a SPARQL command that generates a prescriptive constraint for each instance of sub:MandatoryOutRelationType. In this command, each occurrence of “out” must be replaced by “in” if the same is to be done for each instance of sub:MandatoryInRelationType. When the prescriptive constraint generated by this command is about relations which are not between classes, e.g. sub:part relations, a classic Description Logic inference engine can be reused to check this constraint, although this may require a temporary removal of relations from types such as sub:leafObjectType relations.

INSERT
{ [] rdf:type sub:Prescriptive_constraint;
     cstr:condition_class  #source class of mandatory "out relation"
        [?MandatoryOutRelTypeDomain  owl:equivalentClass 
            [rdf:type owl:Class; #?MandatoryOutRelTypeDomain minus ?OutRelLeafDestType
             owl:intersectionOf ( ?MandatoryOutRelTypeDomain
                                  [rdf:type owl:Class;
                                   owl:complementOf ?OutRelLeafDestType] ) ] ];
     cstr:conclusion_class
        [?conclusion_class owl:equivalentClass  
            [rdf:type owl:Restriction;  #there must be a relation of type ?outRelType
             owl:onProperty ?outRelType;   all:someValuesFrom owl:Thing] ]
}
WHERE { ?outRelType rdf:type sub:MandatoryOutRelationType;
        sub:leafObjectType ?OutRelLeafDestType;
        rdfs:domain ?MandatoryOutRelTypeDomain
      }

This generation-based approach can also be used to check that instances of a certain class – or, equivalently, instances of certain classes since a class may have subclasses – are systematically connected to at least one instance of a certain class via relations of a certain type, e.g. a transitive relation type or one defined via owl:propertyChainAxiom. For example, to generate a constraint checking that every instance of sub:Component is related via sub:part relations to an instance of sub:Component, one can write:

sub:partOf rdf:type sub:MandatoryOutRelationType; 
           rdfs:domain sub:Component;  rdfs:range sub:DividableThing.

The specifications given in Section 4.1 do not enforce the use of subtype partitions or part partitions. However, if the specifications above given in this subsection are also used, these partitions become mandatory except for non-natural types in the case of subtype partitions. Thus, if there are no non-natural types, doing so specifies that any two classes in the KB are either connected by a subclass relation or an exclusion relation. This could not be specified using only one prescriptive constraint since RDF+OWL does not allow the explicit use of variables for relating objects shared by both the condition and conclusion of a constraint.

5. Evaluation and Comparisons

The originality of the approach proposed in this article is that it enables i) the representation of constraints independently of their exploitation (this one is represented within content-independent queries), ii) the representation of both descriptive and prescriptive constraints with any KRL the expressiveness of which is at least equal to RDF or RDFS, and hence iii) the exploitation of most inference engines, especially via SPARQL queries.

Since the proposed approach relies on other methods and tools chosen by each user of the approach, it inherits from their theoretical or practical improvements. It would thus not be relevant to focus on theoretical aspects of a particular method or tool in this article. For a general comparison, [Baget et al., 2015] and [Swan, 2016] list theoretical points relevant to the proposed approach. Regarding the use of SPARQL to check constraints, [Tao et al., 2010] shows that SPARQL can be used for both expressing and validating integrity constraints based on some partial forms of the Unique Name Assumption and Closed World Assumption. It also shows that this validation is sound and complete when the expressiveness used for the constraints and the rest of the KB are respectively only “SROIQ and SRI” or only “SROI and SROIQ”. In the proposed approach, queries are used only for validating constraints, not expressing them, but this is only a generalization of the approach of [Tao et al., 2010] which does not change the associated theoretical results. In [Tao et al., 2010], the used partial forms of the Unique Name Assumption and Closed World Assumption are specified in SPARQL via its operator NOT EXISTS and relations of type owl:sameAs or owl:differentFrom. These forms can similarly be expressed via the commands seen in Section 3 and the use of relations of type owl:sameAs or owl:differentFrom in the constraints.

The proposed approach was validated experimentally by testing the degree to which a few constraints – including all those presented in this article – were followed in i) the “family relationship” focused sample ontology given in the OWL2 Primer W3C document and ii) a few ontologies from LOV, a Linked Data repository. The validation came from finding the right constraint violations and the right degrees with the proposed queries and, when necessary, preliminary update requests.

Besides testing these constraints, queries and requests, another goal of this validation phase was to represent ontology design patterns or best practices (ODPs) as constraints. As shown by the previous sections, the proposed kinds of descriptive or prescriptive constraints are ways to represent ODPs that are collections of types that should be reused whenever possible. However, during the above cited validation phase, no ODP satisfying the two following criteria was found: i) the ODP could be implemented via a constraint, and ii) the ODP was likely not to lead to a completeness degree close to 0% for a randomly chosen ontology. More generally, no widely followed ODP was found.

Querying a KB for detecting anti-patterns in it is analogous to querying it for detecting violations of ODPs in it. However, like the SPARQL based works of [Roussey & Zamazal, 2013, many works on anti-pattern detection use queries essentially as a way not to use an expressive inference engine for detecting certain problems]. Instead, the proposed approach exploits inference engines. With a sufficiently powerful KRL, any anti-pattern can be expressed as a negative constraint in inconsistency-implying form.

The introduction of this article summarized the strong distinction that exists between constraint-based completeness and represented-world-based completeness, and hence the reason why it would not be relevant to further compare the proposed approach with those of tools such as SWIQA and Sieve.

Since the proposed approach is based on a particular use of RDFS it should be compared to SHACL and SPIN.

SHACL (SHApes Constraint Language) is a language ontology (such as OWL) proposed by the W3C to enable the definition of constraints in RDF. SHACL does not reuse OWL to define constraints: it introduces new terms. It therefore does not support the reuse – for checking constraints – of inference engines that take into account the special meaning of OWL terms. Thus, inference engines dedicated to SHACL have to be used and a new KRL (SHACL) has to be learned. In addition, SHACL does not distinguish between descriptive constraints and prescriptive ones, and thus handles prescriptive constraints only very partially. E.g., handling the condition and conclusion of a prescriptive constraint generally require different entailment regime (as explained in Section 2.1.2) but, with SHACL, only one regime can be specified for both the condition and conclusion. Furthermore, neither LDScript-like extensions nor SPARQL update requests can be used in SHACL. Thus, KB pre-treatments – including the one proposed in Section 3.2.3.3 for prescriptive constraints – have to be specified via a KRL other than SHACL.
SPIN (SParql Inferencing Notation) is a W3C language ontology that enables the storage of SPARQL queries in RDF and, via special relations such as spin:rule and spin:constraint, the (possibly recursive) calls of SPARQL queries or Javascript functions for adding nodes or values to the KB. Thus, SPIN enables procedural attachments in a KB and thereby also supports the extension of SPARQL. However, the use of SPIN requires a SPIN aware engine. The approach proposed in this article is KRL independent (and hence not based on procedural attachments). SPIN could be used for storing the SPARQL commands (queries and update requests) proposed in Section 3 and Section 4, thus not only procedurally defining the types proposed for constraints but also but also providing a way to trigger such commands automatically. SPIN can also be used for checking constraints in other ways that are less modular (i.e., not using content-independent queries) or less logic-based (i.e., more procedural), hence in ways that offer less possibilities for knowledge comparison, translation, inferencing, reuse or, more generally, exploitation. The widespread use of such other ways may be a reason why SHACL has been designed. This article provides a less restricted alternative. The author also works on a knowledge translation tool exploiting ontology based specifications of conversions, including for constraints.

Some transformation languages or systems exploit KRs. [Zamazal & Svátek, 2015] and [Corby & Faron-Zucker, 2015] present such systems. Although few of them explicitly have a function that detects KR patterns without also transforming the matched KRs (e.g., PatOMat [Zamazal & Svátek, 2015] is an exception), these languages or systems could easily be adapted to have such a function and hence be used for handling prescriptive constraints. However, all such systems appear to use rule-based languages with more expressiveness than what relations-between-classes based constraints allow. Typically, these languages allow the direct and explicit use of variables for relating objects shared by both the condition and conclusion of a rule. Indeed, using such languages can simplify the writing of prescriptive constraints. However, regarding what can be expressed and checked via constraints, this article shows that i) much can be achieved simply using relations-between-classes based constraints and SPARQL1.1, and ii) the power of the proposed approach then relies on the power of the inference engine used for object matching, rather than on the used language.

Some transformation systems, like PatOMat [Zamazal & Svátek, 2015], issue SPARQL queries for detecting patterns, based on non-SPARQL specifications for patterns and their transformations. Some other transformation systems directly propose an extension of SPARQL such as STTL [Corby & Faron-Zucker, 2015] to write specifications for patterns and their transformations. For instance, as shown in [Corby, Faron-Zucker & Gazzotti, 2016], STTL can be combined with LDScript [Corby, Faron-Zucker & Gandon, 2017] to specify STTL queries (compiled into SPARQL queries) for detecting patterns and then transforming the results. However, [Corby, Faron-Zucker & Gazzotti, 2016] does not discuss the exploitation of object matching capabilities of inference engines and it does not distinguish between prescriptive constraints and non-prescriptive ones. To sum up, the proposed SPARQL commands, including those generating prescriptive constraints, could also be reused in these transformation systems, although in an adapted form. Alternatively, using scripts may be more flexible, as shown with JavaScript in Section 3.7. This work will continue in both directions, respectively using STTL+LDScript and JavaScript, especially for prescriptive constraints related to the use of given KRL models and KRL notations. To date, bases for this future work are mostly unrelated and discussed in [Corby, Faron-Zucker & Gazzotti, 2016] and [Martin & Bénard, 2017a, 2017b].

6. Conclusion

This article focuses on the original goal of supporting the checking of constraint-based completeness via a few predefined content-independent queries or procedures as opposed to creating special-purpose queries for each constraint or ad hoc procedures not reusing inference engines. An example of application is the checking of the following of ontology design patterns (ODPs), KB design libraries (e.g., the KADS library [Breuker & van de Velde, 1994]) or top-level ontologies (e.g., DOLCE) in order to validate the quality of a KB or, during it design, help elicit missing knowledge from experts.

Section 2 answers the first two research questions: what kinds of constraints need to be considered for evaluating constraint-based completeness via content-independent queries, and how to represent constraints in any KRL that has an expressiveness at least equal to RDF or RDFS? Section 2 does so via complementary means.

First, by defining the original notion of “prescriptive constraint” for checking that some objects are explicitly given instead of possibly inferred as in descriptive constraints (the two constraint kinds thus form a partition).
Second, by providing i) a general method to check prescriptive constraints, ii) types for distinguishing different kinds of constraints, and iii) three alternative structures for representing them via class expressions. The use of such expressions is both a way to permit the reuse of most KRLs and a way to reuse inference engines by exploiting calculated instanceOf relations.
Third, by showing that both descriptive constraints and prescriptive constraints are i) necessary for evaluating constraint-based completeness via content-independent queries, and ii) in a sense, sufficient too for two reasons. First, descriptive constraints and prescriptive constraints form a (complete) partition. Second, more specialized distinctions, if needed, can still be expressed by specializing the given types and using further methods to take into account these more specialized types.

Section 3 answers the third research question: how to implement the general approach with query languages such as SPARQL or slight extensions of it as well as in procedural languages? For query languages, Section 3 shows that only few (kinds of) content-independent queries are necessary and that pre-treatments on the KB are a way to counter-balance some lack of expressiveness of some query languages as well as implementing inference bypassing methods. For procedural languages, Section 3.7 shows that more control and flexibility are possible at lower costs.

Section 4 provides an ODP for the use of subClassOf relations and generalizes it into a multi-inheritance based ODP that i) works on any transitive relation, ii) keeps the advantages of the single inheritance based ODP, and iii) is more flexible and not dependent on special tools. More importantly, Section 4 shows how these ODPs can be represented as descriptive constraints exploitable by content-independent queries. Section 4 also shows how prescriptive constraints can be generated for checking the systematic use of all the subtypes of a certain type.

Section 5 evaluates the proposed techniques and compares the approach to other ones. A complement to this work will be to i) represent ODPs in several research areas (knowledge sharing, cooperation, security, etc.), using only relations between classes whenever possible, ii) organize them by relations of specialization or other kinds, and iii) test these ODPs via STTL+LDScript and JavaScript. The resulting analyzes will be added to this article.

7. Acknowledgments

Many thanks to Dr Olivier Corby (member of the Wimmics and SPARKS teams of, respectively, the INRIA and I3S CNRS laboratories at the University Côte d'Azur, France) for his questions and remarks on the approaches presented in this article and his help during the implementation of these approaches with SPARQL and LDScript via the Corese tool.

The interesting feedback provided by reviewers of published parts of this article was also well appreciated.

8. References

Assmann U., Wagner G. (2006). Ontologies, metamodels and model-driven paradigm. In: Ontologies for Software Engineering and Technology, Springer-Verlag, Berlin, chapter 9.
Bachimont B., Isaac A., Troncy R. (2002). Semantic Commitment for Designing Ontologies: A Proposal. In: EKAW 2002, Knowledge Engineering and Knowledge Management: Ontologies and the Semantic Web, LNCS, vol 2473, pp. 114–121, Springer Berlin, Siguenza, Spain.
Baget J., Gutierrez A., Leclère M., Mugnier M., Rocher S., Sipieter C. (2015). Datalog+, RuleML and OWL 2: Formats and Translations for Existential Rules. In: Challenge+DC@RuleML 2015, 9th International Web Rule Symposium (RuleML), Berlin, Germany.
Berners-lee T., Connolly D., Kagal L., Scharf Y., Hendler J. (2008). N3logic: A logical framework for the world wide web. Journal Theory and Practice of Logic Programming, 8(3), pp. 249–269, Cambridge University Press New York, NY, USA.
Breuker J, van de Velde W. (1994). CommonKADS Library for Expertise Modelling: Reusable Problem Solving Components. IOS Press, Amsterdam.
Calvanese D., De Giacomo G., Lembo D., Lenzerini M., Rosati R. (2007). EQL-lite: effective first-order query processing in description logics. In: IJCAI'07, pp. 274–279, Hyderabad, India.
Chein M., Mugnier M. (2008). The BG Family: Facts, Rules and Constraints. Graph-based Knowledge Representation - Computational Foundations of Conceptual Graphs. Chapter 11 (pp. 311–334), Springer-Verlag London, 428p.
Corby O., Faron-Zucker C. (2015). STTL: A SPARQL-based Transformation Language for RDF. In: WEBIST 2015, 11th International Conference on Web Information Systems and Technologies, Lisbon, Portugal.
Corby O., Faron-Zucker C., Gazzotti R. (2016). Validating Ontologies against OWL 2 Profiles with the SPARQL Template Transformation Language. In: RR 2016, 10th International Conference on Web Reasoning and Rule Systems, LNCS 9898, pp. 39–45, Springer, Aberdeen, UK.
Corby O., Faron-Zucker C., Gandon F. (2017). LDScript: a Linked Data Script Language. In: ISWC 2017, Vienna, Austria.
Djakhdjakha L., Mounir H., Boufaïda Z. (2014). Towards a representation for multi-viewpoints ontology alignments. In: IJMSO, International Journal of Metadata, Semantics and Ontologies, 9(2), pp. 91–102, Inderscience Publishers, Geneva.
Dromey R.G. (2006). Scaleable Formalization of Imperfect Knowledge. In: AWCVS 2006, 1st Asian Working Conference on Verified Software, pp. 29–31, Macao SAR, China.
Galárraga L., Hose K., Razniewski S. (2017). Enabling completeness-aware querying in SPARQL. In: WebDB 2017, pp. 19–22, Chicago, IL, USA.
Genesereth M.R., Fikes R.E. (1992). Knowledge Interchange Format, Version 3.0, Reference Manual. Report Logic 92-1, Logic Group, Stanford University, California.
ftp://www.ksl.stanford.edu/pub/KSL_Reports/KSL-92-86.ps.gz
https://pdfs.semanticscholar.org/590c/e95343370578dd1d357319c84d1dc7ff8fac.pdf
Guizzardi G. (2005). Ontological foundations for structural conceptual models. PhD thesis (441p), Telematica Instituut / CTIT.
Marino O., Rechenmann F., Uvietta P. (1990). Multiple Perspectives and Classification Mechanism in Object-Oriented Representation. In: ECAI 1990, pp. 425–430, Pitman Publishing London, Stockholm, Sweden.
Martin Ph., Bénard J. (2017a). Categorizing or Generating Relation Types and Organizing Ontology Design Patterns. In: KAM'17, 23rd IEEE conference on Knowledge Acquisition and Management, Prague, Czech Republic.
Martin Ph., Bénard J. (2017b). Creating and Using various Knowledge Representation Model and Notations. In: ECKM 2017, 18th European Conference on Knowledge Management, Barcelona, Spain.
Mizoguchi R, Kozaki K., Kitamura Y. (2012). Ontological Analyses of Roles. In: IEEE FedCSIS 2012, pp. 489–496, Wroclaw, Poland.
Rector A., Brandt S., Drummond N., Horridge M., Pulestin C., Stevens R. (2012). Engineering use cases for modular development of ontologies in OWL. Applied Ontology, 7(2), pp. 113–132, IOS Press.
Roussey C., Zamazal A. (2013). Antipattern detection: How to debug an ontology without a reasoner. In: WoDOOM 2013, 2nd International Workshop on Debugging Ontologies and Ontology Mappings, pp. 45–56, Montpellier, France.
Sowa J.F. (1992). Conceptual Graphs Summary. Conceptual Structures: Current Research and Practice, Ellis Horwood, pp. 3–51.
Swan R. (2016). Querying Existential Rule Knowledge Bases: Decidability and Complexity. PhD thesis (159p), University of Montpellier.
Tao J., Sirin E., Bao J., McGuinness D. (2010). Integrity Constraints in OWL. In: AAAI'10, pp. 1443–1448, Atlanta, USA.
Zamazal O., Svátek V. (2015). PatOMat – Versatile Framework for Pattern-Based Ontology Transformation. Computing and Informatics, 34(2), pp. 305–336.
Zaveri A., Rula A., Maurino A., Pietrobon R., Lehmann J., Auer S. (2016). Quality assessment for linked data: A survey. Semantic Web, 7(1), pp. 63–93.