OWL2-ER Compatible Prescriptive Constraints to Evaluate Ontology Completeness

Dr Philippe A. MARTIN (www.phmartin.info;  article@phmartin.info),   Jun H. Jo

This unpublished article extends our article for EGC 2018.

Abstract. This article first slightly extends the classic definition of ontology completeness, relates this notion to the one of prescriptive constraint and shows that classic (i.e. descriptive) statements – e.g., assertions, definitions or constraints – can be easily, yet formally, flagged and reused as prescriptive constraints. Thus, it becomes apparent that various tasks from different research fields have subtasks checking ontology completeness, that these subtasks can use the same kind of tools, and that many existing inference engines can be reused for such subtasks. This enables the selection of the most relevant one for a domain or application. Examples of such tasks include: i) knowledge elicitation and modelling methodologies, ii) ontology design patterns (ODPs) or best practices, and iii) criteria or measures for evaluating ontologies. This article shows how prescriptive constraints can be represented via the simple OWL2-ER KRL model and implemented with popular tools such as SPARQL1.1 and JavaScript, how they can be applied and to what extent given their simplicity. In doing so, this article also generalizes a single inheritance based ODP for subtypeOf relations into a multi-inheritance based ODP that works on any transitive relations and keeps the advantages of the single inheritance based ODP but that is more flexible and not dependent on special tools.

Keywords:  prescriptive constraint, ontology completeness, OWL2, SPARQL.

Table of Contents

1. Introduction 2. OWL2-ER Compatible Prescriptive Constraints: Definition and Exploitation 2.1. Constraints: Positive vs. Negative, Descriptive vs. Prescriptive 2.2. General Method and Types to Represent and Exploit Prescriptive Constraints 2.3. OWL2-ER Compatible Constraints 2.4. Examples 3. OWL2-ER Compatible Prescriptive Constraints in SPARQL and Javascript 3.1. SPARQL Commands for Checking OWL2-ER Compatible Constraints 3.1.1. Example of KB Pre-treatment: Temporarily Removing Individuals For Checking Constraints on Types With Classic Description Logic Inference engines 3.1.2. Other KB pre-treatment: Creating “Clones Without Types” of Objects For Exploiting These Objects Without Inheritance Mechanism 3.1.3. Checking Positive Prescriptive Constraints 3.1.4. Checking Negative Prescriptive Constraints 3.1.5. Checking Binary Relations Instead of Individuals 3.1.6. Evaluating the Completeness of a KB 3.2. JavaScript Function for Ontology Completeness Evaluation 4. Examples of Applications and Limits 4.1. Application to Checking the Design of a Subtype Hierarchy 4.2. Generalization To Check the Use of Any Transitive Relation 4.3. Checking that Certain Relation Types Are Systematically Used 5. Relations to Other Works and Conclusion 6. References

1. Introduction

Knowledge representations (KRs) are formal descriptions of information that enable automatic logical inference, and thus automatic KR comparison, search, merge, etc. KRs are logic formulas, e.g. binary predicates of 1st-order logic (FOL) called triples or property instances in RDF and binary relations in Conceptual Graphs (CGs) [Sowa, 1992]. In this article, for the purpose of clarity, we use the intuitive terminology of CGs: (information) objects are either types or individuals, and types are either relation types or concept types (classes and datatypes in RDF). A formal knowledge base (KB) is a collection of such objects written using a KR language (KRL). If this KB is essentially about types rather than about individuals, it is often called an ontology.

Creating or evaluating a KB is difficult. Thus, a subtask is often the evaluation of some (degree of) completeness of this KB, with respect to some criteria, i.e., more consisely, “its completeness”. Such an evaluation is performed in various fields, albeit not via the same tools and sometimes in implicit and ad-hoc ways. Examples of such fields are: i) the automatic or manual knowledge extraction or KB creation, ii) the exploitation of ontology design patterns (ODPs), and iii) the evaluation of ontologies or, more generally, datasets. In this last field, as noted in [Zaveri et al., 2016], completeness commonly “refers to the degree to which all required information is present” in a dataset. We adopt this definition but with a slight extension since, in the works cited in [Zaveri et al., 2016], the word “required” only refers to “a gold standard dataset” to which the evaluated dataset is compared. In this article, “required” more generally refers to any constraint that the dataset should satisfy, as long as this constraint can be expressed via a KR, e.g. an integrity constraint. Indeed, people creating or reusing ontologies generally do not have a “gold standard ontology”: otherwise, they would directly reuse it. With this extended definition, evaluating KB completeness can be done with any inference engine that can check the given constraints. Thus, whichever the field, the same kind of tools can be reused and, depending on the domain or application, the most relevant tool can be selected.

If checking that all given constraints are satisfied is not sufficient, i.e. if an actual degree of completeness is searched for, one way for computing it is to divide the number of statements satisfying the constraints by the total number of statements. This is simple albeit basic compared to methods that aggregate the results of completeness oracles wrt. a query [Galárraga, Hose & Razniewski, 2017]. Each of these oracles is a rule or query estimating whether or not the KB has the necessary information to answer the query as well as an hypothetical gold standard KB. However, the information used by some of these oracles could be represented as constraints, and this article is about a simple way to represent constraints that could be used (directly or not) for checking or evaluating completeness.

Section 2 distinguishes descriptive constraints from positive prescriptive ones and shows that these last ones cannot be represented in classic logics, even with the closed world assumption, but are necessary for checking the completeness of a KB. It then introduces i) OWL-ER, a subset of OWL [OWL 2 profiles], and OWL-ER compatible compatible constraints as a simple way to represent them in almost any KRL, ii) special types to state that certain statements are constraints, and iii) a KRL independent method to check such constraints.

Section 3 shows how this previous method can be implemented first using a small set of SPARQL commands and then using JavaScript.

Section 4 generalizes some ODPs or best practices (BPs) related to the use of subtypeOf relations and show how the above solution provides a simple way to specify and use such ODPs. Section 4 also generalizes these results to any transitive relation. It also shows how to prescribe the systematic use of certain relation types. Like Section 3, this section shows how some – but not all – expressiveness limits of OWL2-ER, and hence of our approach, can be overcome.

Section 5 relates our approach with other approaches and concludes.

2. OWL2-ER Compatible Prescriptive Constraints: Definition and Exploitation

2.1. Constraints: Positive vs. Negative, Descriptive vs. Prescriptive

In this article, as in [Chein & Mugnier, 2008], constraints can be positive or negative, respectively expressing statements of the form “ `if A holds, so must B´ or `if A holds, B must not´ ”.

As noted in [Assmann & Wagner, 2006], one common distinction between engineering models is whether they are i) descriptive of some reality, e.g. like most ontologies, or ii) prescriptive, i.e. about what should be in the dataset they apply to, as with system specification or meta-models (XML schemas, database schemas, etc.). Similarly, we distinguish two kinds of constraints. Descriptive constraints are like definitions or axioms: they enable inference engines to check the use of certain formal terms (if they are used). Prescriptive constraints enable inference engines to check that certain formal terms are actually used (not just inferred) or not used, in certain conditions. E.g., prescriptive constraints can be used for checking that if a type is defined as (necessarily) having certain relations, these relations are explicitly given by users whenever they create an instance of such a type. Here, explicitly emphasizes that these relations should not exist because they were automatically deduced, e.g. by inheritance, but only because they were set by a user (manually or automatically).

Thus, prescriptive constraints enable checks that descriptive constraints cannot, and they are not equivalent to using the closed world assumption. The techniques presented by this article for defining and checking prescriptive constraints can be performed with open-world assumption. It should be noted that classic logical expressions are only descriptive, e.g., stating that “any Person (necessarily) has a parent” is descriptive. It is not prescriptive since this expression is not normally interpreted by an inference engine or ontology editor to issue an error if a knowledge provider describing a person does not also explicitly state the existence of a parent for that person. A special interpretation is necessary, e.g. via a special menu or command, hence by a special procedure.

Few KRLs allow the use of constraints, and we found that even such KRLs do not necessarily enable the specification of positive prescriptive constraints. Indeed, some relations may be automatically added by inheritance or a rule mechanism instead of being explicitly entered. For example, if a KB includes the rule “if X is a person, X has a parent” and a user enters the fact “John is a Person”, this KB will satisfy the positive constraint “if X is a Person, X must have a parent” without the user having to describe a parent for John. In other words, if some mechanism automatically associates relations to some of the checked objects – e.g., by dynamic lookup for inherited relations during each object matching or via one forward chaining saturation – this mechanism must be adequately temporarily disabled or taken into account for checking positive prescriptive constraints. For negative constraints, there is no distinction between descriptive or prescriptive since negative constraints enable the detection of incorrect KRs whether or not it has been added automatically.

2.2. General Method and Types to Represent and Exploit Prescriptive Constraints

[Tao et al., 2010] shows that representing and checking Integrity Constraints exploiting some forms of the Unique Name Assumption and Closed World Assumption can be done via SPARQL queries, different ones for different constraints. Instead, we wish to enable the representation of constraints in any KB, whichever the used KRL, for increasing the possibilities of exploitation and reuse of these constraints, e.g. via a few predefined SPARQL queries. Furthermore, we also and especially wish to take into account prescriptive constraints.

To that end, our approach is to introduce types for constraints. By setting instanceOf relations from certain KRs to those types, KB authors can state that these KRs are constraints. Thus, they can be retrieved or interpreted in a special way by inference engines. The same approach is used with OWL [OWL 2 profiles] to augment the expressiveness of RDF. The name of our ontology of constraint types is CSTR. Its supertype for all constraint types is cstr:Constraint. The “cstr:” part of its name is an XML namespace abbreviation for http://www.webkb.org/kb/it/CSTR. Stating that a type is instance of cstr:Constraint is sufficient to state that all definitions of this type are also constraints. Similarly, the type cstr:Prescriptive_constraint, a subtype of cstr:Constraint, enables the specification or retrieval of prescriptive (only) constraints.

For adequately checking positive prescriptive constraints, the previous subsection introduced the need for temporarily disabling or bypassing inference mechanisms that automaticaly associate relations to objects. Yet, for inferencing completeness purposes, these mechanisms are needed for adequately checking the condition of each positive prescriptive constraint. Thus, they should be disabled or bypassed only for the main (and hence first) object of the conclusion of the constraint, i.e. the object whose relations are mandatory for all objects matching the condition of the constraint. We propose the following bypassing method: statically via a pre-treatment of the KB or dynamically during the checking of such constraints, create a “clone without type” of each object matching the condition of such a constraint and then, when checking its conclusion, do it on this clone. Such a clone has the same relations as the original object except for instanceOf relations (it has none; furthermore, if it is a non-anonymous individual, it must have an identifier different from the original object). Thus, with the clone, “inferences exploiting types to associate relations to an object” will be avoided. As an abbrevation, from now on, we shall write that this method permits to avoid inheritance. This method does not work for inferences not exploiting types (e.g., duck typing instead of inheritance), nor if forward chaining saturation is automatically made before the above cited pre-treatment, but both cases are rare.

This method relies on a temporary modification of the KRs before their checking by an inference engine. Thus, this method does not rely on a particular KRL, inference engine or tool feature. In other words, this solution is KRL independent and tool independent: it can be used with any KRL and any tool. Hence, depending on the domain and application, a different inference engine can be reused to check or evaluate ontology completeness. However, with some query languages such as current standard versions of SPARQL, the temporary modification cannot be done dynamically, a pre-treatment of the KB is necessary. This is a limitation since few KB servers, e.g. few SPARQL endpoints, allow (most of) their users to modify the KB for checking it.

2.3. OWL2-ER Compatible Constraints

Since constraints are special kinds of rules and since we wish to represent constraints in a simple and KRL independent ways, we first considered OWL2-RL [OWL 2 profiles], the OWL2 profile that can be defined using only definite Horn rules with equality, hence with Datalog, a purely declarative subset of Prolog. However, in such rules, the conclusion (or head) cannot include anonymous existentially quantified objects. This is possible with Existential Rules [Baget et al., 2015] or Datalog+. [Baget et al., 2015] shows that a sublanguage of OWL2 called OWL2-ER enables the representation of many kinds of existential rules (hence the suffix “-ER”) via the use of an rdfs:subClassOf relation [RDFS] between two OWL2 class expressions, the rule conclusion being expressed by the superclass expression. (rdfs:subClassOf is one subtypeOf relation type and also a supertype of rdfs:subClassOf since rdfs:Class is a supertype of owl:Class.) In other words, the relation rdfs:subClassOf(C1,C2), here expressed in a functional notation, can be translated into the following rule in the modern Peano-Russel-like notation for first-order logic: ∀x(C1(x) ⇒ C2(x)). Thus, OWL2-ER is roughly the part of Datalog+ that can be expressed using only OWL2, hence with binary relations only and without variables shared by both the condition and the conclusion of a rule. In OWL2-ER, negative constraints can be represented in two ways: i) by using a class expression equivalent to the type owl:Nothing in a rule conclusion, hence a rule of the form ∀x (ClassExpression(x) ⇒ ⊥), ii) by using the type owl:NegativeObjectPropertyAssertion to express negated facts of the form ¬ ∃x ClassExpression(x). Neither OWL2-ER nor Datalog+ can directly represent a positive constraint but, as explained in the previous subsection, this can be expressed by stating that a rule is instance of cstr:Constraint.

In this article, for the sake of simplicity, constraints are only represented as rules: we do not exploit the negated fact form for expressing negative constraints. Rules are represented using rdfs:subClassOf relations, as in OWL2-ER, because this is a simple (albeit restricting) way to represent rules. [Baget et al., 2015] shows that OWL2-ER can be translated into Datalog+ (even though not all of OWL2 can be represented by Datalog+) and then into RuleML. By relying only on rdfs:subClassOf relations for the representation of rules, our techniques can work with any KRL model that has at least the expressivity of RDFS or, like RDF, that permits to reuse RDFS. This is what the expression “OWL2-ER Compatible Constraints” means. In other words, these techniques actually do not require that at least or at most OWL2-ER be used in the exploited KRs. Thus, more formal details on OWL2-ER – or a formalization of what it means to represent a rule using rdfs:subClassOf – would be irrelevant in this article. See [Baget et al., 2015] and [Swan, 2016] for those formal points. The SPARQL code given in the nect section provides the necessary formal details. Similarly, in this article, a formalization of prescriptive constraints other than the formalization given via this SPARQL code would not be useful.

OWL2-ER compatible constraints could be generalized using a subtypeOf relation instead of an rdfs:subClassOf relation between the condition and conclusion of a constraint. Conforming to the terminology given in the first paragraph of the introduction, this would permit the use of relation types and datatypes, hence not just classes, in the condition and conclusion of a constraint. However, checking objects such as relations and instances of datatypes can most often be done via constraints on class instances related to these objects. For example, relations can generally be checked via their sources or destinations, or via the assertions that include these relations, i.e. instances of classes such as owl:NegativeObjectPropertyAssertion. Hence, in this article, we only use rdfs:subClassOf relations between conditions and conclusions of constraints.

We use the name cstr:Constraint_in_OWL2-ER for referring to the type of OWL2-ER compatible constraints. More precisely, each instance of that type is the condition of a constraint. However, since this condition has for superclass the conclusion of the constraint, we shall also simply write that the instances of this type refer to (whole) constraints.

To express an OWL2-ER compatible constraint on a type definition, at least one 2nd-order type must be used (since the defined type must be instance of the condition of the constraint). However, most Description Logic inference engines do not handle ontologies requiring both the matching of individuals to 1st-order classes and the matching of 1st-order classes to 2nd-order classes since such matchings could be undecidable (on the other hand, such ontologies are handled by many other inference engines). Thus, to check type definitions with such Description Logic engines, a pre-treatment that removes individuals (for making 1st-order types the new individuals) is required. In such a case, the subtypeOf relations between these new individuals must also be removed. The first SPARQL1.1 update command below gives an example of about how to do so. Conversely, before checking individuals \emph{with such Description Logic engines}, 2nd-order types must be removed.

2.4. Examples

In this article, for clarity purposes, the names of relation types have a lowercase initial while other names have an uppercase initial. For expressing statements, we use the Turtle notation, a notation usable for RDF. SPARQL also reuses this notation. As in other graph-based notations, a statement of the form “ SourceConcept relationName1 DestConcept1ForRelName1 , DestConcept2ForRelName1 ; relationName2 DestConcept1ForRelName2 , DestConcept2ForRelName2 . ” can be read “SourceConcept has for relationName1 DestConcept1ForRelName1 as well as DestConcept2ForRelName1, and has for relationName2 DestConcept1ForRelName2, and has for relationName2 DestConcept2ForRelName2”.


3. OWL2-ER Compatible Prescriptive Constraints in SPARQL and Javascript

3.1. SPARQL Commands for Checking OWL2-ER Compatible Constraints

We now give query/update commands in SPARQL1.1 to check OWL2-ER compatible constraints and evaluate ontology completeness. These commands can be grouped into a procedure, for instance using extensions of SPARQL1.1 enabling the definition of functions, e.g., LDScript [Corby, Faron-Zucker & Gandon, 2017]. In SPARQL, variable names begin by “?”.

3.1.1. Example of KB Pre-treatment: Temporarily Removing Individuals For Checking Constraints on Types With Classic Description Logic Inference engines

The WHERE clause of this command selects each object ?o that does not have for type rdfs:Class, hence that is an individual. The DELETE clause removes the rdf:type relations from ?o and, from the destinations of these relations, removes the rdfs:subClassOf relations when they exist. To replace these rdfs:subClassOf relations, the INSERT clause sets cstr:type and cstr:subClassOf relations. Indeed, these relations have no special meaning for an inference engine and hence do not prevent the matching of objects to classes, i.e. the inferring of cstr:subClassOf relations. Furthermore, they later permit the re-assertion of the initial rdf:type and rdfs:subClassOf relations, when needed, via a very similar command.

DELETE { ?o rdf:type  ?t .  ?t rdfs:subClassOf ?superClass . } 
 INSERT{ ?o cstr:type ?t .  ?t cstr:subClassOf ?superClass . }
 WHERE { ?o rdf:type  ?t .  ?t cstr:subClassOf ?superClass . 
         FILTER NOT EXISTS { ?o rdf:type rdfs:Class }        }

3.1.2. Other KB pre-treatment: Creating “Clones Without Types” of Objects For Exploiting These Objects Without Inheritance Mechanism

SPARQL does not provide a way to suppress inheritance within a query. E.g., it does not allow the selection of a particular entailment regime within a query. However, the inheritance bypassing method given in Section 2.2 can be implemented in SPARQL. The command below shows how, assuming for the sake of clarity that there is no 2nd-order types nor anonymous individuals in the KB. For every object ?o in the KB, if this object is an individual, this command creates ?o2, a partial copy of ?o that has the same relations except for rdf:type relations. This partial copy has for identifier the one of ?o but with the suffix “_cloneWithoutType“. This command also relates ?o to ?o2 by a relation of type cstr:cloneWithoutType. With a SPARQL1.1 extension such as STTL [Corby & Faron-Zucker, 2015], such a pre-treatment of the KB is not needed: as illustrated below, a SPARQL CONSTRUCT query can be embedded into a SELECT query and thus permits the creation of the clones “on the fly”.

INSERT { ?o cstr:cloneWithoutType ?o2 . ?o2 ?r ?dest } WHERE
{ ?o ?r ?dest .   FILTER (?r != rdf:type) 
  FILTER NOT EXISTS { ?o rdf:type rdfs:Class }
  BIND (uri(concat(str(?o), "_cloneWithoutType")) as ?o2)

3.1.3. Checking Positive Prescriptive Constraints

The next command is a query. It lists every object violating a positive prescriptive constraint. As shown by the code, such an object satisfies two conditions. First, it matches – and hence has for type – the condition of a constraint ?posConstr that is a subclass of cstr:OWL-ER_prescriptive_constraint_condition-ER and that does not have owl:Nothing in its conclusion. Second, such an object does not match – and hence has not for type – the conclusion of the constraint, i.e. its superclass. Thus, this query requires a SPARQL engine that has an entailment regime enabling the matching (alias, categorization) of an individual with respect to a class expression and thence the deduction of an rdf:type relation between them. In the code of the commands below, such deduced rdf:type relations are highlighted in bold. If, for example, all individuals, constraint conditions and constraint conclusions are described in OWL2-QL, then an OWL2-QL entailment regime is required and sufficient. In that case, an inference engine that can handle the expressiveness of OWL2-QL is required.

SELECT ?objectNotMatchingPosConstr ?posConstr  WHERE
{ #this is the body of this 2nd command: between this first '{' and the last '}'
  ?posConstr rdfs:subClassOf cstr:OWL-ER_prescriptive_constraint_condition ,
                             ?posConstr_conclusion .  #initializes ?posConstr
  FILTER NOT EXISTS { ?posConstr rdfs:subClassOf owl:Nothing }
  ?objectNotMatchingPosConstr rdf:type ?posConstr. #matches condition
  FILTER NOT EXISTS  #objects satisfying the conclusion must not be listed
  { ?objectNotMatchingPosConstr rdf:type ?posConstr_conclusion }

Here is the same query in STTL, with an embedded query for creating the above cited partial copies “on the fly” and thus removing the necessity to modify the KB.

SELECT ?objectNotMatchingPosConstr ?posConstr WHERE
 { ?posConstr rdfs:subClassOf cstr:OWL-ER_prescriptive_constraint_condition ,
                              ?posConstr_conclusion .
   FILTER NOT EXISTS { ?posConstr rdfs:subClassOf owl:Nothing }
   ?objectNotMatchingPosConstr rdf:type ?posConstr. #matches condition

   #No object with a "clone without type" satisfying the conclusion must be listed:
   BIND (QUERY(CONSTRUCT { ?o2 ?r ?dest } WHERE     #?o2: cloneWithoutType
         { ?objectNotMatchingPosConstr ?r ?dest.  FILTER (?r != rdf:type)
                           "_cloneWithoutType")) as ?o2)
         }) as ?g
     #Note: in STTL, the above 5 lines can also be written in the next way
     #  LET (?g = (CONSTRUCT { ?o2 ?r ?dest } WHERE    #?o2: cloneWithoutType
     #  { ?objectNotMatchingPosConstr ?r ?dest.  FILTER (?r != rdf:type)
     #    BIND(uri(concat(str(?objectNotMatchingPosConstr),
     #                    "_cloneWithoutType")) as ?o2)
     #  })
   FILTER NOT EXISTS { GRAPH ?g { ?o2 rdf:type ?posConstr_conclusion } }

3.1.4. Checking Negative Constraints

The next command lists every object violating a negative constraint, i.e. every object that matches – and hence has for type – a type ?negConstr instance of the type cstr:Constraint_in_OWL2-ER and that has owl:Nothing as conclusion. It is much simpler than the previous command since, as above noted, for negative constraints, there is no distinction between descriptive or prescriptive. Thus, there is no need to check that the object is instance of cstr:Prescriptive_constraint but cstr:Constraint_in_OWL2-ER is useful to find constraints and hence test if the object violates a negative one.

SELECT ?objectMatchingNegConstr ?negConstr 
 WHERE { ?negConst rdfs:subClassOf cstr:OWL-ER_compatible_constraint_condition ,
                                   owl:Nothing .
         ?objectMatchingNegConstr rdf:type ?negConstr .

3.1.5. Checking Binary Relations Instead of Individuals

To list binary relations violating prescriptive constraints instead of individuals (that have some relations violating constraints), it is sufficient to replace rdf:type by the “logical implication relation between statements” in the previous two queries. For referring to such a relation, Tim Berners-Lee uses the name log:implies [Berners-lee et al., 2008] in his Notation3 KRL. For this to work, the used SPARQL engine must exploit an inference engine that can deduce the existence of such a relation when it exists between the matched statements. Like queries on individuals, queries on relations can use additional filters. For example, for the previous command to operate only on negative statements, one may add at the end of its body:

?objectMatchingNegConstr rdf:type owl:NegativeObjectPropertyAssertion .

3.1.6. Evaluating the Completeness of a KB

As an alternative to listing objects violating constraints, the next command calculates the number of objects not violating prescriptive constraints and divides it by the number of objects (that are related to another one), thus giving the completeness of the ontology with respect to all prescriptive constraints.

SELECT ( ((?nbObjs - ?nbAgainstPosCs - ?nbMatchingNegCs) / ?nbObjs)
         AS ?completeness)
    WHERE { ?o ?r ?o2 } } #any object related to another
        # For considering only objects that have a type:
        # { {?o rdf:type ?t1} UNION {?o cstr:type ?t2} }
  {SELECT(COUNT(DISTINCT ?objectNotMatchingPosConstr) AS ?nbAgainstPosCs)
    WHERE { ... #the body of the 2nd command should be copied here
          } }
  {SELECT (COUNT(DISTINCT ?objectMatchingNegConstr) AS ?nbMatchingNegCs)
    WHERE { ... #the body of the 3rd command should be copied here
          } }

3.2. JavaScript Function for Ontology Completeness Evaluation

Like the previous SPARQL commands, the next function provides a way to list each constraint violation – and/or calculate the completeness of the ontology if the constraints are prescriptive – but there are a few differences. First, the completeness is now given with respect to each constraint in a particular set, not to all of them at once, and for the instances of a particular set of classes, not all individuals. Second, error handling is included. Third, the method specializes is called to perform the matching between relations associated to an object and relations in a constraint condition or conclusion, i.e., to test if each relation in the second set of relations has a specialization in the first set (details on such a specialization test can be found in [Zaveri et al., 2016] [Chein & Mugnier, 2008] [Swan, 2016]). This method can ignore the instanceOf relations from the handled object, and it needs to do so if the checked constraint is a prescriptive one, i.e., if its parameter is instance of cstr:Prescriptive_constraint. Hence, the code below needs not include a temporary removal of instanceOf relations from the objects.

JavaScript is used instead of pseudo-code since its syntactic sugar is rather well-known or intuitive and is object-oriented. This previous feature is required here to have a code that is short, clear and generic with respect to error handling. The objects in the ontology are assumed to be accessible in the object-oriented way used in the code.

The complexity of the next function is the complexity of the object matching multiplied by the number of given constraints, multiplied by the number of given classes, multiplied by the average number of instances by class.

function checkThatInstancesOfTheseClassesSatisFyTheseConstraints
         (setOfClasses,     //e.g.: {Man, Woman}
          setOfConstraints, //e.g.: {"if X is a Person, X has a parent"}
          errorHandling) //contains a class for generic error handling
{ if (setOfConstraints.length() == 0)
    return errorHandling.returnCodeIfNoPropertyToCheck;
  if (setOfClasses.length() == 0)
    return errorHandling.returnCodeIfNoClassToCheck;
  for co in setOfConstraints  //co may be positive or negative
  { var numberOfObjsSatisfyingCo= 0, numberOfObjsNotSatisfyingCo= 0;
    for cl in setOfClasses
      for obj in cl.instances()
        if (obj.specializes(co.condition))
          if ((co.isPositiveConstr() && obj.specializes(co.conclusion))
          else { numberOfObjsNotSatisfyingCo++;
                 if (errorHandling.alertAtEachError) 
                   alert("Relations from '" + obj.toString() + 
                         "' do not satisfy:\n  " + co.toString());   
                  //e.g.: Relations from 'Tom' do not satisfy:
                } //        if X is a Person, X has a parent
    var completenessForThatConstraint= numberOfObjsSatisfyingCo /
               (numberOfObjsSatisfyingCo + numberOfObjsNotSatisfyingCo);
    if (completenessForThatConstraint < 1)
    { if (errorHandling.alertAtEachIncompleteness)
        alert("Only " + completenessForThatConstraint + "% of instances"
              + " of the given classes satisfy the constraint:\n  " + 
              + co.toString());
      if (errorHandling.returnCodeIfIncompletenessForSomeConstraint)
        return errorHandling.returnCodeIfIncompletenessForSomeConstraint;
  return errorHandling.returnCodeIfNoError;

4. Examples of Applications and Limits

4.1. Application to Checking the Design of a Subtype Hierarchy

Several research works in knowledge acquisition, model-driven engineering or ontology engineering, e.g. [Marino, Rechenmann & Uvietta, 1990] [Bachimont, Isaac & Troncy, 2002] [Dromey, 2006] [Rector et al., 2012], have advocated the use of tree structures when designing a subtype hierarchy, hence the use of i) single inheritance only, and ii) multiple views or viewpoints, one for each tree. Indeed, they argue that objects have a unique place in such trees and hence that such trees can be used as decision trees or ways to avoid redundancies, normalize KRs and ease KR search/handling. This is true but subtyping exclusively via subtype partitions, i.e. disjoint unions of types, also has the same advantages (for the same reasons) while i) still allowing multi-inheritance, ii) not requiring a special inference engine to handle views (e.g. [Marino, Rechenmann & Uvietta, 1990] [Djakhdjakha, Hemam & Boufaïda, 2014]), and iii) being less cumbersome for knowledge providers since with such partitions they do not have to specify or use views each time they create subtypes.

We have realized that, in order to keep the above cited advantages, it is necessary and sufficient that whenever two types are exclusive – i.e., whenever they should not share common subtypes or instances – the exclusion relation that links these two types should be specified, explicitly or implicitly. With tree structures, such exclusion relations are implicit but still specified. By systematically using subtype partitions, all exclusion relations can be easily and explicitly specified, e.g. via the use of binary partitions such as “Spatial_entity / Non-spatial_entity”. Exclusion relations are a simple and efficient way to support ontology checking and knowledge retrieval. For example, when searching for particular “achievements” in a KB, people may not know which exact type to use and then which query to use: whether a query for specializing objects, for generalizing ones, for doing both with a maximal number of specializing/generalizing relations, etc. In this case, it may be easier for them to search for instances of all types not exclusive with some type they have in mind. This query may well collect all and only the objects they are interested in, if the ontology is systematically structured using subtype partitions.

However, unlike for natural types (or natural kinds), using subtype partitions is a bit cumbersome for non-natural types [Guizzardi, 2005] [Mizoguchi, Kozaki & Kitamura, 2012] such as role types and phase types. Examples of general role types are Agent, Experiencer, Recipient, Cause and Instrument. Examples of more specific ones are Driver, Employee and King. Examples of phase types are Beginning and Child. Examples of role type partitions are “Cause / Non-cause” and “King / Non-king”. Systematically specifying such partitions is often cumbersome and useless since role types such as Non-cause and Non-king will probably not be instantiated. The manual or automatic instantiation of non-natural types such as Child, King or Taxi-driver may be useful but, for reusability purposes, should be done in a way that contextualizes that instantiation to some region of time or space. Such contextualization is not needed for natural types. More generally, for ontology checking or search purposes, it is interesting to distinguish non-natural types from natural ones. Using subtype partitions for non-natural types would not help and the classic solution of connecting each non-natural type to a second-order type via an instanceOf relation is quite cumbersome too. Instead, to create a non-natural subtype, we propose the use of the relation type cstr:nonNaturalSubclass or subtypes of it. This relation type is the inverse of the relation type cstr:nonNaturalSubclassOf which is a subtype of rdfs:subClassOf. It has a range restricted to cstr:NonNaturalClass which is a subtype of rdfs:Class since non-natural types are concept types which are not datatypes. When – and only when – an ontology using cstr:nonNaturalSubclass relation types has to be exploited by an inference engine that does not handle subtypes of rdfs:subClassOf, a pre-treatment that expands the definition of these subtypes – and thus removes these subtypes – must first be applied on the ontology.

To sum up, for reusability purposes, we propose that subtyping should only be done via subtype partitions or via cstr:nonNaturalSubclass (or subtypes of it). In the OWL2 model, “via subtype partitions” means stating that the subtyped class is equivalent to – or is a subclass of – a subtype partition. The relation types cstr:equivDisjointUnion and cstr:subclassOfDisjointUnion can be defined to refer to these two ways. The next assertion shows how this last type can be defined using the OWL2 model and the Turtle notation. The definition of cstr:equivDisjointUnion is identical except that rdfs:subClassOf is replaced by owl:equivalentClass.

cstr:subclassOfDisjointUnion #from a class to a set of classes
  rdfs:domain owl:Class ;  #the source class
  rdfs:range [rdf:type owl:AllDisjointClasses] ;  #the set of classes
  owl:propertyChainAxiom ( rdfs:subClassOf owl:unionOf ).

Then, to refer to the relation between a class and any of its direct subtypes specified via the use of a subtype partition or cstr:nonNaturalClass, we can define cstr:nonNaturalOrPartitionSubclass. Using Turtle and the OWL2 model:

cstr:subclass owl:inverseOf rdfs:subClassOf .

  rdfs:subPropertyOf cstr:nonNaturalOrPartitionSubclass ;
  rdfs:domain owl:Class ;    rdfs:range cstr:NonNaturalClass.

  rdfs:subPropertyOf cstr:nonNaturalOrPartitionSubclass ; 
  owl:propertyChainAxiom ( cstr:subclassOfDisjointUnion rdfs:member ).

  rdfs:subPropertyOf cstr:nonNaturalOrPartitionSubclass ; 
  owl:propertyChainAxiom ( cstr:equivDisjointUnion rdfs:member ).

Now, a positive constraint can be defined to ensure that if a subtype relation connects a type to another one, this relation must be of type cstr:nonNaturalOrPartitionSubclass. That constraint can equivalently be phrased: if a type generalizes another one, it must be connected to this other one via a relation of type cstr:nonNaturalOrPartitionSubclass. An OWL2-ER compatible definition of that constraint is given below, using Turtle. However, since such a definition has to use a rule with OWL2-ER class expressions for its condition and its conclusion, this constraint actually defines that if a type generalizes another one, it must be connected to another one via a cstr:nonNaturalOrPartitionSubclass relation. Yet, this constraint can still be used to perform the intended checking: to do so, the SPARQL1.1 command (specified in Section 3.1) that checks each relation, one by one, must be used, i.e. the command that uses log:implies. The command that checks all relations from an object and uses rdf:type must not be used. When classic Description Logic inference engines are used, there is another reason why this previous command cannot work with this particular constraint. Indeed, these engines first require the removal of instanceOf and subclassOf relations between the first-order types to make them individuals, as noted in Section 3.1, but without such subclassOf relations this positive constraint has nothing to check.

cstr:Subclass  #class - and constraint (condition) - referring to
               # i) a class that has a subclass, and hence also
               # ii) a subclassOf relation from a class
  rdf:type cstr:Prescriptive_constraint;  #constraint (condition)
  owl:equivalentClass             #this defines the condition:
    [rdf:type owl:Restriction ;   #  "any class that has a subclass"
     owl:onProperty cstr:subclass ;   owl:someValuesFrom rdfs:Class ];
  rdfs:subClassOf #conclusion (types of relations that must be present):
    [rdf:type owl:Restriction ;   #any class satisfying the next 2 lines
     owl:onProperty cstr:nonNaturalOrPartitionSubclass;
     owl:someValuesFrom rdfs:Class ].

We believe that the above positive constraint is worth generalizing and applying to all transitive relations used in an ontology, e.g. part relations. The reasons why the destinations of subtype relations are worth being organized into subtype partitions are also the reasons why the destinations of part relations are worth being organized into part partitions. The definition of a part partition for an object mirrors the one for a subtype partition: it is a complete set of disjoint parts of the object; “disjoint” meaning that the parts cannot share common subparts, and “complete” meaning that the union of the parts is equal to the object. The notion of non-natural subtype does not seem to have an intuitive equivalent for part relations but this is not a problem.

Since OWL2-ER does not enable the use of variables for relation types, it does not enable the specification of a unique prescriptive constraint for all transitive relations used in an ontology. One solution then is, for each transitive relation type that is not a subclass relation and that needs to be checked, to copy and adapt the above subclass related definitions and positive constraint. The creation of such positive constraints can be automatized using the next SPARQL1.1 command. It takes as a parameter each instance of owl:TransitiveProperty specified in the ontology, i.e. each transitive relation type ?trRelType, e.g. cstr:subclass and cstr:part which is the inverse of the general partOf relation type. From each one, via a cstr:nonNaturalOrPartitionTrRelType relation, it accesses the recommended kind of relation types to use for such transitive relations. For example, for cstr:subclass relations, we have seen that this recommended supertype can be cstr:nonNaturalOrPartitionSubclass. For cstr:part relations, this supertype can be cstr:partitionPart. OWL2-ER does not support a full definition of this last type but still enables its declaration, along with its domain and range, and hence permits its use and some checking of its uses.

{ ?TrRelTypeDomain  #class for (the source of) a transitive relation
    rdf:type cstr:Prescriptive_constraint;  #constraint (condition)
    owl:equivalentClass             #this defines the condition:
      [rdf:type owl:Restriction ;   #  "source of a transitive relation"
       owl:onProperty ?trRelType; #e.g., cstr:part
       owl:someValuesFrom owl:Thing]; #any thing, class or individual
    rdfs:subClassOf  #constraint conclusion:
      [rdf:type owl:Restriction ; #any class satisfying the next 2 lines
       owl:onProperty ?nonNaturalOrPartitionTrRelType; #e.g.,
       owl:someValuesFrom owl:Thing]                 #cstr:partitionPart
WHERE { ?trRelType 
          rdf:type owl:TransitiveProperty ;
              ?nonNaturalOrPartitionTrRelType ;
          rdfs:domain ?TrRelTypeDomain }     #e.g., owl:Thing (see below)

Thus, based on the next two statements, here in Turtle, the above command generates a prescriptive constraint for checking cstr:subclass relations as well as the corresponding prescriptive constraint for checking cstr:part relations. The generated prescriptive constraint for checking cstr:subclass relations is not identical to the one given in the previous sub-section since owl:Thing is used instead of rdfs:Class as destination of an owl:someValuesFrom relation. However, this loss of precision is not a problem for checking cstr:subclass relations. On the other hand, as illustrated here with cstr:part, the above command may make owl:Thing the condition of a constraint. If so, owl:Thing is restricted via owl:equivalentClass and rdfs:subClassOf relations. This may be a problem for some classic Description Logic inference engines. For example, owl:Thing cannot be used as source or destination of an rdfs:subClassOf relation in OWL-RL [OWL 2 profiles]. However, it is often easy and relevant to use a class other than owl:Thing as the domain of a transitive relation type, e.g. cstr:Spatial_entity as the domain of cstr:spatial_part.

cstr:subclass rdf:type owl:TransitiveProperty; rdfs:domain rdfs:Class ;
                                   cstr:nonNaturalOrPartitionSubclass .
cstr:part rdf:type owl:TransitiveProperty;   cstr:trRelDomain owl:Thing;
          cstr:nonNaturalOrPartitionTrRelType cstr:partitionPart .

4.3. Checking that Certain Relation Types Are Systematically Used

The above kinds of prescriptive constraints check only that if a transitive relation type is used, it is used via certain subtypes of it. A complementary kind of prescriptive constraint is needed to check that if certain kinds of relation types are declared in an ontology, every object instance of the domain (resp. range) of such relation types should be the source (resp. destination) of such relations. The 2nd-order type cstr:MandatoryOutRelationType (resp. cstr:MandatoryInRelationType) can be used to specify or access such relation types. Then, from such types, relations of type cstr:leafObjectType can be used to indicate the type that can be used to specify or access the objects (classes or individuals) that, as an exception, should not be checked. For example, the following statement is sufficient to state that every dividable object – i.e. every instance of cstr:DividableThing – should be the source of a cstr:part relation except for each object instance of cstr:PartDestLeaf. This exception makes explicit the choice of an ontology creator not to use an object as source of a cstr:part relation even when it is possible.

cstr:part rdf:type cstr:MandatoryOutRelationType ; 
          cstr:leafObjectType cstr:PartDestLeaf ;
          rdfs:domain cstr:DividableThing .

Below is a SPARQL1.1 command that generates such an OWL2-ER compatible prescriptive constraint for each instance of cstr:MandatoryOutRelationType. In this command, each occurrence of “out” should be replaced by “in” to do the same for each instance of cstr:MandatoryInRelationType. If SPARQL1.1 is used for checking these generated prescriptive constraints, the command (specified in Section 3.1) that checks all relations from an object should be used, not the command that checks each relation one by one. (This is the reverse when checking the use of certain relations, as noted in Section 4.1) When the prescriptive constraint generated by this command is about relations which are not between types, e.g. cstr:part relations, a classic Description Logic inference engine can be reused to check this constraint, although this may require a temporary removal of relations from types such as cstr:leafObjectType.

{ ?MandatoryOutRelTypeDomain  #source class of mandatory "out relation"
    rdf:type cstr:Prescriptive_constraint;  #constraint (condition)
    owl:equivalentClass   #condition: ?MandatoryOutRelTypeDomain minus
      [rdf:type owl:Class ;                  # ?OutRelLeafDestType
       owl:intersectionOf ( ?MandatoryOutRelTypeDomain
                            [rdf:type owl:Class ;
                             owl:complementOf ?OutRelLeafDestType] ) ];
    rdfs:subClassOf  #constraint conclusion: there must be a relation of
      [rdf:type owl:Restriction;                     #  type ?outRelType
       owl:onProperty ?outRelType;   owl:someValuesFrom owl:Thing ]
WHERE { ?outRelType rdf:type cstr:MandatoryOutRelationType ;
        cstr:leafObjectType ?OutRelLeafDestType ;
        rdfs:domain ?MandatoryOutRelTypeDomain }

5. Relations to Other Works and Conclusion

A first contribution of this article is to slightly extend the classic definition of ontology completeness, relate this notion to the one of prescriptive constraint and show that classic (i.e. descriptive) statements – e.g., assertions, definitions or constraints – can be easily, yet formally, flagged and reused as prescriptive constraints. Thus, it becomes apparent that various tasks from different research fields have subtasks checking ontology completeness, that these subtasks can use the same kind of tools, and that many existing inference engines can then be reused for such subtasks. This enables the selection of the most relevant inference engine for a domain or application. Examples of such tasks include: i) knowledge elicitation and modelling methodologies, ii) ontology design patterns (ODPs) or best practices, and iii) criteria or measures for evaluating ontologies. For example, a knowledge engineer can represent some ODPs or generic task models [Breuker & van de Velde, 1994] as prescriptive constraints, use an advanced inference engine to check the completeness of his ontology and, guided by the results, elicit the missing knowledge from experts.

A second contribution is to generalize a single inheritance based ODP for subtypeOf relations into a multi-inheritance based ODP that i) works on any transitive relation, ii) keeps the advantages of the single inheritance based ODP, and iii) is more flexible and not dependent on special tools.

The main contribution is to show how prescriptive constraints can be represented in a simple and OWL2-ER compatible way, how such constraints can be implemented with popular tools such as SPARQL1.1 and JavaScript, how they can be applied and to what extent given their simplicity.

Some transformation languages or systems are KR-focused. [Zamazal & Svátek, 2015] and [Corby & Faron-Zucker, 2015] present such systems. Although few of them explicitly have a function to detect KR patterns without also transforming the matched KRs (e.g., PatOMat [Zamazal & Svátek, 2015] is an exception), these languages or systems could easily be adapted to have such a function and hence be used for handling prescriptive constraints. However, to our knowledge, all such systems use languages more expressive than OWL2-ER. E.g., they generally use rule-based languages allowing the use of variables to relate objects shared by both the condition and conclusion of a rule. Using such languages can simplify the writing of prescriptive constraints. However, regarding what can be expressed and checked for prescriptive constraints, this article and its companion article show that i) much can be achieved simply using OWL2-ER and SPARQL1.1, and ii) the power of our approach then relies on the power of the inference engine used for object matching (rather than on the used language).

Some transformation systems, like PatOMat [Zamazal & Svátek, 2015], issue SPARQL queries for detecting patterns, based on specifications for patterns and their transformations in another language. Some other transformation systems directly propose an extension of SPARQL such as STTL [Corby & Faron-Zucker, 2015] to write specifications for patterns and their transformations. For instance, as shown in [Corby, Faron-Zucker & Gazzotti, 2016], STTL can be combined with LDScript [Corby, Faron-Zucker & Gandon, 2017] to specify STTL queries (compiled into SPARQL queries) detecting patterns and then transforming the results. (On the other hand, [Corby, Faron-Zucker & Gazzotti, 2016] does not discuss the exploitation of object matching capabilities of inference engines and it does not distinguish between prescriptive constraints and non-prescriptive ones.) Our SPARQL commands, including those generating prescriptive constraints, could thus also be reused in these transformation systems, in an adapted form. Alternatively, using scripts may be more flexible, as shown with JavaScript in Section 3.2. We will continue to work in both directions, respectively using STTL+LDScript and JavaScript, especially for prescriptive constraints related to the use of given KRL models and KRL notations. To date, bases for this future work are mostly unrelated and discussed in [Corby, Faron-Zucker & Gazzotti, 2016] and [Martin & Bénard, 2017a, 2017b].

6. References

  1. Assmann U., Wagner G (2006). Ontologies, metamodels and model-driven paradigm. In: Ontologies for Software Engineering and Technology, Springer-Verlag, Berlin, chapter 9.
  2. Bachimont B., Isaac A., Troncy R. (2002). Semantic Commitment for Designing Ontologies: A Proposal. In: EKAW 2002, Knowledge Engineering and Knowledge Management: Ontologies and the Semantic Web, LNCS, vol 2473, pp. 114–121, Springer Berlin, Siguenza, Spain.
  3. Baget J., Gutierrez A., Leclère M., Mugnier M., Rocher S., Sipieter C. (2015). Datalog+, RuleML and OWL 2: Formats and Translations for Existential Rules. In: Challenge+DC@RuleML 2015, 9th International Web Rule Symposium (RuleML), Berlin, Germany.
  4. Berners-lee T., Connolly D., Kagal L., Scharf Y., Hendler J. (2008). N3logic: A logical framework for the world wide web. Journal Theory and Practice of Logic Programming, 8(3), pp. 249–269, Cambridge University Press New York, NY, USA.
  5. Breuker J, van de Velde W. (1994). CommonKADS Library for Expertise Modelling: Reusable Problem Solving Components. IOS Press, Amsterdam.
  6. Chein M., Mugnier M. (2008). The BG Family: Facts, Rules and Constraints. Graph-based Knowledge Representation - Computational Foundations of Conceptual Graphs. Chapter 11 (pp. 311-334), Springer-Verlag London, 428p.
  7. Corby, O., Faron-Zucker, C. (2015). STTL: A SPARQL-based Transformation Language for RDF. In: WEBIST 2015, 11th International Conference on Web Information Systems and Technologies, Lisbon, Portugal.
  8. Corby, O., Faron-Zucker, C., Gazzotti, R. (2016). Validating Ontologies against OWL 2 Profiles with the SPARQL Template Transformation Language. In: RR 2016, 10th International Conference on Web Reasoning and Rule Systems, LNCS, vol 9898, pp. 39–45, Springer, Aberdeen, UK.
  9. Corby O., Faron-Zucker C., Gandon F. (2017). LDScript: a Linked Data Script Language. In: ISWC 2017, Vienna, Austria.
  10. Djakhdjakha L., Mounir H., Boufaïda Z. (2014). Towards a representation for multi-viewpoints ontology alignments. In: IJMSO, International Journal of Metadata, Semantics and Ontologies, 9(2), pp. 91–102, Inderscience Publishers, Geneva.
  11. Dromey R.G. (2006). Scaleable Formalization of Imperfect Knowledge. In: AWCVS 2006, 1st Asian Working Conference on Verified Software, pp. 29–31, Macao SAR, China.
  12. Galárraga L., Hose, Razniewski S. (2017). Enabling completeness-aware querying in SPARQL. In: WebDB 2017, pp. 19–22, Chicago, IL, USA.
  13. Guizzardi G. (2005). Ontological foundations for structural conceptual models. PhD thesis (441p), Telematica Instituut / CTIT.
  14. Marino, O., Rechenmann, F., Uvietta, P. (1990). Multiple Perspectives and Classification Mechanism in Object-Oriented Representation. In: ECAI 1990, pp. 425–430, Pitman Publishing London, Stockholm, Sweden.
  15. Martin Ph., Bénard J. (2017a). Categorizing or Generating Relation Types and Organizing Ontology Design Patterns. In: KAM'17, 23rd IEEE conference on Knowledge Acquisition and Management, Prague, Czech Republic.
  16. Martin Ph., Bénard J. (2017b). Creating and Using various Knowledge Representation Model and Notation. In: ECKM 2017, 18th European Conference on Knowledge Management, Barcelona, Spain.
  17. Mizoguchi R, Kozaki K., Kitamura Y. (2012). Ontological Analyses of Roles. In: IEEE FedCSIS 2012, pp. 489–496, Wroclaw, Poland.
  18. Rector A., Brandt S., Drummond N., Horridge M., Pulestin C., Stevens R. (2012). Engineering use cases for modular development of ontologies in OWL. Applied Ontology, 7(2), pp. 113–132, IOS Press.
  19. Sowa J.F. (1992). Conceptual Graphs Summary. Conceptual Structures: Current Research and Practice, Ellis Horwood, pp. 3-51.
  20. Swan R. (2016). Querying Existential Rule Knowledge Bases: Decidability and Complexity. PhD thesis (159p), University of Montpellier.
  21. Tao J., Sirin E., Bao J., McGuinness D. (2010). Integrity Constraints in OWL. In: AAAI'10, pp. 1443–1448, Atlanta, USA.
  22. Zamazal O., Svátek V. (2015). PatOMat – Versatile Framework for Pattern-Based Ontology Transformation. Computing and Informatics, 34(2), pp. 305–336.
  23. Zaveri A., Rula A., Maurino A., Pietrobon R., Lehmann J., Auer S. (2016). Quality assessment for linked data: A survey. Semantic Web, 7(1), pp. 63–93.