This document synthesizes the ideas of my research works. It gives
many examples but does not detail the applications in which these ideas are used.
The home page of this thesis is at
http://www.webkb.org/doc/papers/hdr/
(a PDF version is given for printing purposes).
Abstract.
This document first introduces reasons why a
"collaboratively-built&evaluated global well-organized secure Semantic Web"
is needed to support scalable information retrieval, sharing and management
processes (within an intranet or on the Web) that are both precision-oriented
and completeness-oriented.
With respect to that goal, current approaches for the Semantic Web and,
more generally, the sharing and retrieval of knowledge
– information organized in a (semi-)formal way –
are insufficient. Indeed, these approaches exploit but
do not try to minimize the creation of documents or knowledge bases (KBs)
that are mostly informal or independently created and hence with few formal
semantic relations between their content.
In other words, they do not minimize implicit redundancies/contradictions nor
support the incremental refinement, organization and evaluation of knowledge by
intranet/Web users.
The main parts of this document propose various elements of solutions (that complement those of current approaches and that are partly or fully implemented by the "personal"/"shared" knowledge servers WebKB-1 and WebKB-2, both usable at www.webkb.org):
Title in French. Vers une base de connaissances - construite de manière
collaborative -
sur&pour des partages et recherches de connaissances passant à l'échelle.
Abstract in French. Ce document explique tout d'abord pourquoi un "Web sémantique, global, construit de manière collaborative, bien organisé et sécurisé" est requis pour un passage à l'échelle de processus de partage, recherche et gestion d'informations (dans un intranet or sur le Web) qui sont orientés à la fois vers la précision et la complétude. Vis à vis de ce but, les approches actuelles pour le Web Sémantique et, plus généralement, pour le partage et la recherche de connaissances - informations organisées de manière (semi-)formelle - sont insuffisantes. En effet, ces approches exploitent - mais n'incitent pas à minimiser la création de - documents ou de bases de connaissances (BCs) essentiellement informels ou indépendamment créés et donc avec peu de relations formelles sémantiques entre leurs contenus. En d'autres termes, elles ne minimisent pas les redondances/contradictions implicites ni ne permettent le raffinement, l'organisation et l'évaluation de connaissances de manière incrémentale par des personnes dans un intranet ou sur le Web.
Les parties principales de ce document proposent divers éléments de solutions (lesquels complètent ceux des approches actuelles et sont partiellement ou entièrement mis en oeuvre dans les serveurs de connaissances "personnels"/"partagés" WebKB-1 et WebKB-2 qui sont utilisables à www.webkb.org) :
Keywords:
semantic modeling/indexation/web, knowledge/ontology/semantic based collaboration,
knowledge/ontology modeling/(re-)presentation/sharing/integration/retrieval/management/tool/server,
language/top-level/lexical/domain ontology, controlled languages.
Acknowledgments.
Thanks to the following persons for having made this HDR thesis possible this
year or having contributed to make my post-PhD years enjoyable.
♠ The jury of this thesis
and especially, its reviewers - by alphabetic order, Pr. Jerôme Euzenat,
Pr. Ollivier Haemmerlé, Pr. Marie-Laure Mugnier
and Pr. Nhan Le Thanh - for having accepted
and found time out of their busy schedule to review this thesis.
♦ Pr. Refik Molva, Dr. Yves Roudier and, more generally, Eurecom.
Dr. Michael Blumenstein and, more generally, Griffith University.
Pr. Melfyn Lloyd and, more generally, the DSTC.
Dr. Rose Dieng, Dr. Olivier Corby and, more generally, the INRIA.
♥ P-L Domi.
♣ The tax payers of France and Australia.
1. Introduction and High-level Summary p. 6 1.1. Selected Approach for Knowledge Management (KM) p. 6 1.1.1. Non-technical Description of the General Goal of the Selected Approach p. 6 1.1.2. General Approach or Vision p. 6 1.1.3. The Choice of a Knowledge Representation (KR) Intensive Approach p. 8 1.1.4. Quick Comparison with some Main Approaches for KM and Collaboration p. 9 1.2. Selected Research Directions and their Guidelines, Hypothesis or Difficulties p. 10 1.2.1. Towards a Process-focused Ontology of KM (Processes, Structures, Tools, ...) p. 10 1.2.2. Towards a General Ontology for KM p. 11 1.2.3. Towards a Language Ontology for KM; Intuitive, Expressive and Personalizable Languages p. 12 1.2.4. Towards an Ontology of Knowledge Presentation p. 13 1.2.5. About this document: its Goal, Decomposition and Amount of Formal Descriptions p. 14 2. Towards a Process-focused Ontology of Knowledge Management p. 16 2.1. Some Concepts and Techniques of Knowledge Management p. 16 2.1.1. Introduction to the Main KR Notation Used in this Document p. 16 2.1.2. Representing and Avoiding to Represent or Organize Subject Areas p. 33 2.1.3. Some Description Content/Mediums/Containers in KM p. 41 2.1.4. Top-level Processes of Knowledge Management and Acquisition p. 47 2.2. Knowledge Sharing (KS): Modularization, Indexation, Distribution, Collaboration, ... p. 53 2.2.1. Unscalability of KS Approaches Based on the Indexation of Resources p. 55 2.2.2. Unscalability of KS Approaches Based on Either Fully Formal or Mostly Informal Resources p. 57 2.2.3. Unscalability of KS Approaches Based on Mostly Independently Created Resources p. 59 2.2.4. Supporting Knowledge Sharing Between KBs (or: Combining the Advantages of Centralization and Distribution) p. 60 2.2.5. Supporting Collaborative Knowledge Editions Within a KB p. 64 2.2.6. Supporting the Valuation and Filtering of Knowledge or Knowledge Sources p. 74 2.3. Following Normalization Rules or Best Practices When Representing Knowledge p. 78 2.3.1. Lexical Normalization p. 78 2.3.2. Structural or Semantic Normalization p. 81 2.3.3. Application for Correcting some Examples or Advices from W3C People p. 87 2.3.4. Normalization of Input Files p. 89 2.4. Knowledge Comparison and Knowledge-based Indexation and Retrieval p. 93 2.4.1. Knowledge-based Indexing of Any Document Element And Document Generation p. 94 2.4.2. Operators For Searching or Comparing Categories Or Statements p. 98 2.4.3. Examples of Static Interfaces Proposed by WebKB-2 For Search and Presentation p. 102 2.4.4. Generated Search/Entering Interfaces p. 108 2.4.5. Use or Generation of Scalable Comparison Tables – Example with the Beginning of an Ontology of CG Tools p. 111
3. Towards a General Ontology for Knowledge Representation, Sharing and Retrieval p. 117 3.1. A General Top-level Ontology of Concepts and Relations p. 117 3.1.1. Overview and Approach p. 117 3.1.2. Minimizing re-categorization – Examples with DOLCE p. 123 3.1.3. General Categories for Situations (States and Processes) p. 125 3.1.4. Organizing Processes w.r.t. their Inputs/Outputs p. 130 3.1.5. General Categories for Entities p. 131 3.1.6. General Categories for Spatial Objects (Including Physical Objects) p. 133 3.1.7. General Categories for Non-Spatial Objects p. 136 3.1.8. General Categories for Temporal/Spatial/Physical/Psychological/... Attributes and Measures p. 137 3.1.9. General Categories for Description Content/Mediums/Containers p. 141 3.1.10. General Categories for Collections and Types p. 145 3.1.11. Things w.r.t. their Roles p. 151 3.1.12. Some Other Categorizations For Things: Continuants/Occurrents, Divisible/Indivisible, ... p. 152 3.1.13. Categorization of Relations w.r.t. Their Roles or Ontological Nature p. 154 3.1.14. Categorization of Relations w.r.t. What/Who/Why/.../How Questions p. 158 3.2. Integrating WordNet-like Resources p. 159 3.2.1. Generating Intuitive Identifiers p. 160 3.2.2. Distinguishing Types from Individuals p. 162 3.2.3. Correcting Lexical and Semantic Problems p. 163 3.2.4. Making Some Domain-independent Additions p. 165 4. Towards a Language Ontology and a Knowledge Presentation Ontology p. 167 4.1. Example of Semi-Formal Discussion about RDF+OWL and the Need For More Expressiveness p. 167 4.2. Comparison of Three Main Notations of WebKB With Other Knowledge Representation Languages p. 172 4.2.1. Existential Quantification, Conjunction, Difference p. 175 4.2.2. Simple Contextualizations Or Meta-statements p. 175 4.2.3. Identities, Names and Authorship p. 177 4.2.4. Relation Signatures and Cardinalities p. 178 4.2.5. Universal Quantification, Definitions and Lambda Abstractions p. 179 4.2.6. Relation Cardinalities (a Restricted Kind of Numerical Quantification?) p. 181 4.2.7. Qualifiers and Numerical Quantification via Percentages p. 182 4.2.8. Simple Negations (Exclusions, Complements, Inverses, ...) and (X)OR-Collections p. 183 4.2.9. Function Calls, Actors and Ordered Collections p. 185 4.2.10. Higher-order Statements p. 186 4.2.11. Relations from Collections, Collection Interpretations and Quantifier Precedence p. 187 4.2.12. Quantitative Valuation (Measures, Intervals, Temporal Entities, ...) p. 193 4.2.13. Use of Concept Types in Relations; Generation of Relation Types From Concept Types p. 195 4.3. Towards a Shared LR(1) Grammar For Parsing FL, FCG, FE, CGLF, CGIF and KIF p. 197 4.4. Summary of the Future Data Model of WebKB-2 p. 204 4.5. A General Ontology for Notations and Knowledge Presentation p. 207 4.5.1. Ontology (or Meta-model) of FS and hence of Most Kinds of Knowledge Representations p. 209 4.5.2. Parsing, Presenting and their Parameters p. 214 4.5.3. Presentation Ontology of Code, Commands and Graphs p. 217 4.5.4. Presentation Ontology of Relation Nodes p. 220 4.5.5. Start of Presentation Ontology of Concept Nodes and Examples of Parsing/Presentation Control p. 228 5. Conclusion and Possible Future Works p. 232 6. References p. 236
Depending on their embedding level, the sections of this document are, from now on,
called "chapter", "section" or "subsection".
The KM approach explored in this document is precision-oriented: it focuses on the representation, organization, sharing, retrieval, understanding and evaluation of individual statements, preferably undecomposable ones, not groups of statements such as whole documents. This KM approach implies giving people the possibility to
Although this approach is first aimed to ease knowledge representation and sharing by knowledge engineers, progressing towards the previously mentioned goal is interesting for all persons that regularly search, read, write or evaluate documents, e.g., researchers, lecturers, students, project writers and project evaluators such as those of the European commission. As this introduction and the next chapter will show in progressively more technical ways, achieving this goal requires that a lot of people learn and use certain reading and writing techniques that at first they will find complex. However, the benefits outweigh these problems and such a process will probably begin with communities or funding organisms asking their members or applicants to follow more and more structured writing techniques.
The goals listed in the bullets of the previous subsection may look like goals of most projects aiming to ease "knowledge management" in its most general sense. However, the general hypothesis of the selected approach, argued for in the following chapter, is that genericity, scalability and efficiency can only be achieved by an approach centered around interconnected knowledge servers supporting one "collaboratively-built&evaluated global well-organized secure Semantic Web" (cgosSW), not on loosely interconnected knowledge servers or static formal documents (e.g., documents storing knowledge bases), let alone informal documents. Each of the next paragraphs define one of the attributes of a "cgosSW".
"Secure" refers to the possibility for any person to hide her queries and information – or restrict operations on (and usage of) that information – from any (particular or kind of) person/agent she wants. Supporting this is not explored in this document.
"Well-organized" means that most of the concepts and statements in the information are represented in a formal way and explicitly related (and hence ordered) by semantic relations – such as temporal, spatial, mereological, corrective, argumentation and specialization relations (e.g., subprocess and physical_part for the mereological relations) – in a normalized way, and hence in a non-redundant and easy to search&compare way. Modules of information grouping sets of statements (e.g., documents or long paragraphs) cannot be organized by such relations, they can mainly only be related by structural or indexation relations (e.g., inclusion, version, author, keywords, concepts, summary).
"Global" means that people would either not have to choose a particular node of the cgosSW (e.g., a database, Web site or community/company intranet) for making queries or additions, or that all choices would be equivalent: to that end, whenever useful, (parts of) the query/addition should be forwarded to all the nodes that committed to store the kind of information contained in that query/addition. That way, the cgosSW is virtually a single global Knowledge Base (KB). Subsection 2.2.4 details this point.
"Collaboratively-built&evaluated" means that any piece of information can be annotated – e.g., precisely evaluated, corrected or completed without risk of deletion – by any person, and this contributes to building the cgosSW. For scalability purpose, building a cgosSW should not require coordination/selection committees nor agreements or even discussions between people. In this document, the term "collaboration" rather than "cooperation" is used but no particular distinction is made: it does not imply any centralized task repartition and the information providers may even compete for fame or other things.
The annotations – or in other words, the whole content of the cgosSW – should then be exploitable to permit anyone to
Indeed, the writing of (informal) documents is necessarily affected by presentation constraints (e.g., space limits, and most importantly, the linear informal writing) as well as assumptions about the expectations of the readers. Hence, this writing results in redundant and non-explicitly organized statements. Such precise evaluations would also permit to reward -and hence encourage – "good" contributions and contributors in less arbitrary ways. Thus, the vision of a cgosSW generalizes the vision of [Hillis, 2004] about a "Knowledge Web" to which people can "add isolated ideas and single explanations at the right place" and having "mechanisms for credit assignment, usage tracking, and annotation that the Web lacks". Such a Web would for example support a much better re-use and evaluation of the work of a researcher than the current system of article publishing and reviewing.
Considering their above description, it is now clear that the selected goal and approach departs from those of projects for "KM" in its most general sense (and common sense in the industry), that is, KM based on documents or classic databases. It also departs from those of most works for "KM" in its academic sense, that is, KM based on formal knowledge representations. Indeed, research in this domain is nowadays often focused on i) automatic knowledge extraction, merging or retrieval from loosely interconnected formal or barely formal knowledge bases (KBs) in servers or documents, and ii) helping people create such KBs (that is, without support for strong knowledge sharing) as is the case with current ontology editors, semantic wikis and social semantic web tools [Erétéo et al., 2009].
Nowadays and in the medium term, techniques for automatic knowledge extraction, merging or retrieval are not able to understand and precisely represent the meaning of informal sentences or other data, and hence cannot create a well-organized Semantic Web. Actually, a rather safe hypothesis is that if people do not directly insert information in a cgosSW, some pieces of information in their sentences will often never be fully understood by machines or even other people and hence will often never be inserted into a cgosSW.
Another safe hypothesis is that people will always need to learn simple KR languages (KRLs) to visualize, navigate and refine parts of this semantic network in an efficient, precise or scalable way. Since people are reluctant to learn KRLs and KR best practices, and since representing knowledge is more difficult and time consuming than writing it informally, in order not to scare away potential users or to exploit already stored information, most KM projects restrict the expressiveness of the models and notations they propose or exploit. This is a safe choice for the short term but not a choice that will permit to create a cgosSW. To that end, an approach is to maximize this expressiveness and its exploitation while minimizing a person's effort to learn, read and manage these notations and models. This document introduces expressive – yet rather intuitive and normalizing – notations as well as general KBs (ontologies), methodologies and tools to support and exploit these notations. In this document, normalizing means "helping to reduce and compare the various statements into which a same object can be represented by people".
The main hypothesis behind this choice for a KR intensive approach – i.e., that a sufficiently large number of researchers will sooner or later be led to create tools following such an approach, and that a sufficiently large number of persons will be led to (correctly) use such tools – will also be discussed. One of the arguments is that in the medium term some professionals or amateurs in KR or Information Technology will be led to represent their own knowledge into a cgosSW – especially researchers, lecturers and their students – and that the number of people using and extending it will then steadily grow. Although this approach is first aimed for manual knowledge modeling, it can also guide future advanced automatic knowledge extraction, merging or retrieval research by offering them i) better KBs to exploit, and ii) new guidelines about the kind of KB they should generate or contribute to for knowledge sharing purposes.
The currently existing or foreseen Semantic Web (SW) – now also called Web 3.0 (Web 2.0 + Semantics) – is not a cgosSW. The SW – as described by the W3C [semArchi-Shadbolt 06] and in most current works claiming to be on this subject – is not planned to be "well-organized", "global" and "collaboratively-built&evaluated" in the above described senses. Like the current Web, the SW is most often seen as a Web of data albeit indexed by generally very lightweight semantic representations (e.g., simple categories from ontologies, thesaurus or folksonomies) stored and organized by Web users in more or less independently created KBs, hence heterogeneous and loosely interrelated KBs. (This also applies to ontologies, the parts of each KB that defines and organizes the formal terms it uses). Like "data" – as opposed to "genuine knowledge" (homogeneous formal semantic representations) – the information in these KBs are hard to find, match, merge and exploit in logical/relevant ways; hence, only short – and not necessary logical – chains of inferences are expected to be automatically performed on them. Current knowledge sharing approaches try to re-use independently developed ontologies (which is understandably difficult and does little to ease the work of knowledge seekers and knowledge providers) or propose a shared knowledge base with no edition protocols nor guidelines to keep it organized. Finally, the KRLs proposed by the W3C for the Semantic Web are currently designed to have good properties for inferencing purposes but are not yet expressive and normalizing enough to be adequate for the representation and sharing of non-simple kinds of knowledge. Various arguments are given for this last point in this document, including in Chapter 4 which compares various KRLs; [Kalfoglou et al., 2004] and [Patel-Schneider, 2005] give some complementary arguments.
Even in Semantic Grids and semantic-based Peer-to-Peer networks (with generally, one KB per node/peer), the partial redundancies and inconsistencies between the KBs are not made explicit and hence the replication of queries or knowledge among the KBs is restricted; thus, the search and exploitation of their knowledge is restricted too. Even in such networks, knowledge sharing, ontology evolution and collaboration are not based on "encouraging a collaboratively-built&evaluated global KB and then on the querying or filtering of these KBs by each user" but on "a more or less extended selection committee accepting or not to include/keep knowledge in a shared KB based on the assumed expectations of the users".
Similarly, current tools for direct/indirect/liquid e-democracy focus on voting-related issues, proxy chains and workflows, but do not permit their users to freely add to – and thus collaboratively build – a non-redundant well-organized semantic/argumentation network of statements (typically of interconnected hypothesis, observations, preferences and decisions) and evaluate these statements in precise ways that other users can exploit for precise search or comparison of statements and hence well-justified decisions.
More generally, as in current social networks, the search and exploitation of information is limited by its lack of explicit semantic content and organization [Erétéo et al., 2009].
The following subsections introduce the next chapters except for the last subsection which further explains the aim of this document, its content, format and decomposition into chapters. Some general research questions that these chapters aim to answer are what are the criteria for judging the quality of knowledge representations, notations and libraries, what kinds of techniques can satisfy those criteria and help design or generate these artifacts, and how to semantically organize these criteria, techniques and artifacts. The general guideline of the research directions presented by these chapters is to reduce the implicit redundancies or inconsistencies between the knowledge objects and, more generally, increase the semantic organization of these objects. Some more specialized formulations of this guideline are: i) the objects should be represented as precisely and uniformly as possible, and ii) the objects should be as small and explicitly interconnected by semantic relations as possible, at least by identity and specialization relations. A related hypothesis is: the bigger and more organized the KBs, the easier it is for software to align/merge these KBs or guide the users in entering precise and re-usable knowledge.
Nowadays, the word "ontology" refers to a set of formal terms – i.e., unambiguous identifiers for certain objects with unique meanings (concepts, statements, ...) – and to their associated formal (logic-based) or informal descriptions (partial/complete definitions, other statements) which relate the objects with respect to each other. An ontology may also include informal terms and connect them to other terms, typically via lexical relations. A collaboratively-extendable well-organized core ontology of KM (cooKM) is one necessary element to permit people (researchers, lecturers, students, engineers, ...) to index, share, organize, compare or retrieve KM information (tools, techniques, ...) relevant to their needs and in a scalable or efficient way. Then, the approach could be extended to other domains. Since this ontology would organize KM best practices and other resources, it would also be a guide for KM in general and hence for any cgosSW.
No such ontology currently exists in any domain (and the results presented below are only steps "towards" achieving it, hence the title of Chapter 2). The major works of the early 1990s on libraries of models for generic knowledge modeling tasks – in particular the library of CommonKADS [Breuker & van de Velde, 1994] – are of course very interesting resources to initialize it but, since they are focused on knowledge modeling, do not provide categories for a large percentage of important KM processes. In any domain (not just KM), process categories are represented in topic hierarchies or poorly organized ontologies of "subject areas" (topics), e.g., Yahoo's topic hierarchies and – in KM – the "Semantic Web Topics Ontology" of ISWC 2006 [www-SWTO, 2006]. In this document, the expression "topic hierarchy" refers to a list of formal or informal terms organized with only one kind of relation which is hierarchical and does not distinguish the various semantic (or lexical) relations that exist between the terms. With a topic hierarchy or a poorly organized ontology, different users are likely to insert or look for a same piece of information at different places (or represent it in different ways), thus quickly leading to implicit inconsistencies and redundancies; this decreases the organization of the hierarchy or ontology and makes knowledge sharing or retrieval progressively more and more difficult.
Chapter 2 illustrates parts of a core for a cooKM. Indeed, it i) describes and organizes research contributions about certain KM processes, i.e., about techniques, best practices and applications for knowledge sharing, modeling, comparison, retrieval and evaluation, ii) situate them with respect to related processes or resources, and iii) follows the above cited best practices. This knowledge essentially is (or, for some parts, will be) represented in the knowledge server WebKB-2, thus allowing any Web user to navigate, query, extend or correct this knowledge.
The difficulties related to the content of Chapter 2 were to come up with theses contributions, implement them in tools, and finally represent, organize and argue for them in a scalable way. Among these contributions are:
Chapter 3 presents an ontology useful for any KR intensive KM process. This chapter illustrates the content – as well as the rationales and techniques for the design – of an ontology created by i) transforming the noun-related part of WordNet [Miller, 1995] into a genuine (semantically correct) lexical ontology with short intuitive identifiers, and ii) integrating this lexical ontology in a loss-less manner with many top-level ontologies (e.g., DOLCE, SUMO, LIS, NSM) and some domain ontologies, including the above cited core ontology for KM. [Sowa, 2003] named the result "the Multi Source Ontology" (MSO), acknowledged it was "a necessary part of any suitable, scalable knowledge sharing effort" and recommended it as a "candidate material for a standard". It was voted as such a "candidate material" by the IEEE Standard Upper Ontology Working Group [IEEE-SUO-MSO, 2004] after having examining it. This ontology is the one proposed by the main WebKB-2 server which permits any Web user to extend it.
All ontology integrations in the MSO have to be "loss-less", that is, the categories are associated to their source ontologies (more precisely, the category identifiers include identifiers for their source ontologies) and the meaning of the categories in the source ontologies is not changed (over-interpreted) except when internal inconsistencies in the sources are detected and hence have to be fixed before integration. This is a requirement for the above cited collaboration techniques and permits the users to re-generate any source ontology or generate combinations of (parts of) the source ontologies if they wish to. This also permits to integrate subsequent versions of the source ontologies. Nowadays, (automatic or manual) loss-less integrations are still rare. WordNet has often been re-used in ontology related works, especially for information retrieval, but the MSO still seems to be the only work that has converted WordNet into a genuine consistent ontology with intuitive identifiers and that has not over-interpreted it (as was for example the case when OntoWordNet [Gangemi et al., 2003] has been created).
One difficulty of the integrations in the MSO lies in making explicit the often implicit semantic relations between the concepts of the source vocabularies and finding a "right place" for these concepts into the MSO. One rather safe hypothesis behind these integrations is that different, internally consistent, ontologies do not have to be modified to be integrated into a unique consistent semantic network. A stronger hypothesis is that the categories of these different ontologies can always be inter-related into a well-organized semantic network, especially via specialization relations. However, it appeared that the difficult cases can be solved via "extended specialization" relations. A hypothesis that generalizes the two previously ones is that semantic conflicts can always be solved by adding more precision or making explicit how they boil down to mere "preferences", and hence that solving conflicts increases the organization of the KB. A related hypothesis is that solving conflicts (adding precision) can be done incrementally (when a new piece of information is added) and that people do not have to meet nor compromise to merge their ontologies – this is the main hypothesis behind the above cited collaboration protocols. These hypotheses will be discussed or illustrated.
Compared to many other KRLs, Conceptual Graphs (CGs) are more expressive, intuitive, concise and normalizing. These characteristics are generally the main reasons why those who adopted it did so. The first three characteristics explain why CGs are often said to be relatively "close to natural language". The last three characteristics come from its graph-based nature and from two of its textual and graphic notations (namely, CGLF and CGDF) which have similarities. The CG model – and CGIF, another textual notation for CGs – are part of Common logic (CL) [ISO/IEC 24707, 2007], an interlingua framework for logic languages based on first-order logic or subsets of it. However, the above characteristics can be improved on and it is very valuable to do so. This led to the creation of Frame-CGs (FCG), For-Links (FL) and Formalized-English (FE), three formal notations of complementary kinds that
There are currently very few formal CNLs (FCNLs) having the expressiveness of at least the CL model. Apart from FE, one such FCNL seems to be Attempto Controlled English (ACE) [Fuchs et al., 1999] and only its expressiveness may be claimed to be improved on by FE. Another one, if (vs. when) implemented, is (vs. will be) Common Logic Controlled English (CLCE) [Sowa, 2007]. FE is a bit less readable than ACE and CLCE but has the advantage of having an explicit logical structure and the same one as FCG. Comparing FCNLs with other kinds of notations on concision or intuitiveness criteria is not really relevant because i) FCNLs are purposely less concise to be more intuitive for beginners, and ii) they are also less "visually structured", that is, they make it much more difficult to see if and how a certain number of things are related together.
As shown in Section 2.1, FL is probably one of the most concise and "visually structured" possible "textual notation with the expressiveness of at least the CL model" that can be designed; however, for very complex statements, FL is not the most intuitive or handy notation.
Furthermore, a general textual KR notation called "For-Structuring" (FS), having all the above cited notations as sub-languages, was also created. More precisely, CGLF, FCG, FL and FE are currently included, and CLIF, KIF, CGIF and RDF+OWL/XML will also hopefully be fully included in the future, (currently, only parts of these last notations are parsed). All these notations permit to write (formal or semi-formal) logic-based statements. FS permits to use them along with query operators and procedural control structures to create queries or programs. Thus, in an FS input/output file, an ontology or KB is an "ordered" set of assertion/querying/control commands. Once parsed, the KB is, as usual, a set of logic formulas or functions, possibly ordered by relations between them or between the terms.
The work on notations presented in this document is an exploration of which notational features are needed to ease knowledge representation and sharing, which ones lack in current notations, and how to include them into a notation statically or dynamically.
As hinted in the previous paragraph, the above cited core language ontology is also a core for a knowledge presentation ontology. Thus, these two issues are merged in Section 4.5. Subsection 4.5.5 shows how arbitrary complex parsing/presentation directives can be built with the current model. However, this ontology does not yet include most of the knowledge presentation features of WebKB (WebKB-1 and WebKB-2):
WebKB is composed of two complementary KB servers.
The following chapters contain a lot of information in a formal and semi-formal format rather than via informal sentences. Both are important since, mainly due to time constraints, not all the important information presented informally in this document are currently also represented in a (semi-)formal way. However, this will come and the basis for this is presented. Indeed, to illustrate, follow and extend the results of the selected KM approach, one motivation for this document was to (re-)present, organize, justify and generalize the main ideas of the author's post-PhD research into a core for a cgosSW, thus also allowing any reader to correct or complement these ideas (via WebKB-2) in a scalable, easily retrievable way and without having to introduce redundancies by summarizing or re-situating the new materials. As noted earlier and argued for below, traditional writing and publishing do not permit this and renders the review of information or information providers quite arbitrary since they link it to the review of documents and the way these documents are presented.
Comparison between this document and the input files for the MSO. At its inception, this document was meant to be an "onto-book", that is, a book that is both i) a "relatively" intuitive but formally structured research description that can accommodate many additions without its structure having to be changed, and ii) a file that can be used as an input file or backup file for a large ontology (in this case, the MSO) and be regenerated from it to include new additions made in the ontology. In the end, this goal had to be left for a subsequent version. However, this document contains a lot of formal descriptions and is scalable to a limited extent: there are "right places" for inserting new (representations of) ideas and categories related to those already represented in this document. Nevertheless, the modularity (and hence scalability) of this document is limited by its important informal parts and structure. Hence, it cannot be used as one of the input files of the MSO. Subsection 2.3.4 gives one approach for the scalable decomposition of input files. To sum up, the ideas and categories represented in this document are (or will be) also stored in actual input files for the MSO. This acknowledges the unfortunate manual duplication of formal content between this document and these files. However, like input files, this document can be seen as a static view on parts of the MSO, with additional informal information and an ordering of the information that cannot be re-generated via queries. This is one of the interests of keeping input files and permitting their access to the users while they navigate the MSO.
FL, FCG and FE share many features, can be used for assertions or queries within FS and, to ease readability, can be used within each other provided that sub-statements in a different notation are embedded within their distinctive delimiters. The main KR notation used in this document is FL because it is the most concise, structured (i.e., it maximizes the possibilities to join statements and hence avoid to repeat parts of them) and often simplest to read. This is due to the following six features.
The next titled paragraphs illustrate some features of FS an FL, mainly those that are needed to understand the representations of the next sections. The text in courier font shows a formal code accepted by the FS parser. Its translation in 1st order logic is given in Section 4.2. FL, FCG and FE are 1st-order logic notations (with contexts and collections): they are not dependent of any particular model (Description Logics, Conceptual Graphs, Common Logics, etc.). The next paragraphs are numbered because they are referred to in other sections of this document.
2.1.1.1. Strings, variables and code delimiters. In FS and its sub-languages, strings can be single quoted, double quoted or delimited by "$(" and ")$". This eases the embedding of strings. The escape character is '\', e.g., 'It\'s a string.' is equivalent to "It's a string.". Juxtaposed strings are concatenated, e.g., "ab" 'cd' is equivalent to "abcd" or $(abcd)$.
Variable names are prefixed by '?', '*', '$' or '^'. Variables with a '?', '*' or '^' prefix behave like variables in KIF, CGIF or CGLF (the '^' prefix is for free variables, i.e., variables that are implicitly universally quantified). Variables with a '$' prefix behave like the variables of shell scripts and can be set with a value as in shell scripts. All variables are interpreted in double quoted string or strings delimited by "$(" and ")$". With respect to variables with a '?', '*' or '^' prefix, these two kinds of strings are equivalent to KIF quoted expressions where variables are preceded with commas. Interpreting a variable with a '$' prefix means replacing it with its value. For example, the following two statements or commands are valid in FS.
max=8; for n=1 to $max { print "here is a number: $n"; }
Apart from the interpretation of variables, the content of strings are not interpreted by the FS parser except for a string delimited by "$(" and ")$" when it is used as a delimiter for FS code within informal text. In an HTML file, FS code can also be isolated within the strings (HTML tags) "<script language='FS'>" and "</script>". This is one way to mix formal and informal information in FS.
2.1.1.2. Comments and annotations.
/* This is a multi-line "comment". //This is an in-line comment within a multi-line comment. */
Comments are discarded by the FS parser, unlike "annotations" which are informal notes (explicitly or implicitly) related to an object via an "annotation" relation (even though in RDFS, the type of such relations has for identifier "comment").
HTML comments are recognized and ignored: their content is not made visible by a Web browser but their content is parsed by the FS parser. More generally, the parsers of FS, FL, FE and FCG ignore HTML tags, hence they can be use within representations in input files to highlight certain parts.
Before mixing formal and informal parts via comments or code delimiters, a knowledge provider should first consider relating them with precise relations (hence preferably not annotation relations) and structure the informal parts by splitting them into shorter strings and connecting them using languages such as FL or FE. This is of course not always appropriate (e.g., this document is mostly composed of informal text).
2.1.1.3. Identifiers, names and relation from a type.
pm#thing pm#name: "something";
The above line is an FL assertion (i.e., a statement asserted in FL) that relates the formal term pm#thing to the string "something" by a relation of type pm#name. (In FS, both pm#name and pm#thing are predefined. This last term refers to the (non strict) supertype and type of all imaginable types or things). Thus, this line asserts that (and should be read as) "pm#thing has for name 'something'". Hence, given the semantics of pm#name, "something" becomes another name for pm#thing: "thing" and "something" are informal terms/names (which may be names for other things) while pm#thing is a unique identifier (a formal term). Since the main WebKB server (www.webkb.org) accepts conceptual queries as GET parameters, it permits to use different parameters (and hence different URLs) to query different information on the referred object. For example, the next first URL asks for all the direct relations from pm#thing in RDF/XML. The next second URL asks for the direct relations from pm#thing and all its subtypes via a recursive exploration on three levels, in a format similar to FL but relying on indentation.
http://www.webkb.org/bin/categSearch.cgi?categ=pm%23thing&format=RDF
http://www.webkb.org/bin/categSearch.cgi?=&categ=%23thing&recursLink=%3E&depth=3
In FS, and in the rest of this document, when referring to terms from an ontology, informal terms (strings) are enclosed within double quotes while formal terms are either not quoted or quoted within a single backquote and the single right quote (as with `pm#thing´). Indeed, such quotes can be used as delimiters for terms and statements in FE. As their prefix indicates, pm#thing and pm#name have been created by `pm´, some user (person or software agent) of FS (and, as it happens, an identifier in the KB of WebKB-2 for the author of this document). In this document, formal things which are not necessarily terms in an ontology (e.g. query operators and syntactic sugar) are single quoted.
2.1.1.4. Introduction to contexts. In the first example of the previous paragraph, the author of the relation/statement has been left implicit: according to the default presentation/parsing rules of FS, this author is `pm´ because the source term has for author `pm´. FL statements equivalent to the above one but showing the relation creator are:
pm#thing pm#name _[pm#author: pm]: "something" ;
pm#thing pm#name: "something" __[pm#author: pm];
[pm#thing pm#name: "something"] _[pm#author: pm];
pm#thing [_ pm#author: pm] pm#name: "something";
[_ pm#author: pm] [pm#thing pm#name: "something"];
"something" [_ pm#author: pm] pm#name of: pm#thing;
"something" pm#name of: _[pm#author: pm] pm#thing;
"something" pm#name of: pm#thing __[pm#author: pm];
[ [ [ [ ["something" pm#name of: pm#thing] _[pm#author: pm] ] ] ] ];
These equivalent statements illustrate the following three points.
As illustrated by the next FL statement, a relation may have various "believers". By default, an author is assumed to be a believer (more precisely, pm#author is a subtype of pm#believer).
pm#thing pm#name: "something" __[pm#author: pm, believer: oc];
When a term has many names from the same creator, instead of explicitly using the relation pm#name, the names can be concatenated using three underscores, as in pm#thing___something which is another identifier for the concept type referred to by pm#thing.
2.1.1.5. Introduction to quantifiers, definitions and relation signatures. FL, FCG and FE uses various kinds of quantifiers and various syntactic forms for them:
Numerical quantifiers that are not percentages are like cardinalities in entity-relationship diagrams but they can be used in both the source node and the destination node of a relation. As with cardinalities, the default quantifier for the source node is 'any' and the default quantifier for the destination node is
This default is necessary to avoid the specification of quantifiers when displaying parts of lexical ontologies such as WordNet and hence displaying these parts in a non-cumbersome way. Thus, the next statements are equivalent and, unless the signature of the relation type pm#name changes the default interpretation, they mean 'any (instance of) pm#thing has for name "something", and "something" is name of 0 to any number of instances of pm#thing'.
pm#thing pm#name: "something";
pm#thing pm#name: "something" __[any->1..*, 0..*<-any];
pm#thing pm#name: "something" __[any->?];
pm#thing pm#name: "something" __[any->?, pm#author: pm];
pm#thing pm#name: "something" __[any->? _[pm#author: pm], pm#author: pm];
pm#thing pm#name: "something" __[any->? _[pm#author: pm] ];
pm#thing pm#name _[any->?]: "something";
any pm#thing pm#name: the pm#string "something";
"something" pm#name of: pm#thing __[any->0..*, 1..*<-any];
//The order of the quantifiers is important: 'any->?' is different from '?->any'
However, the interpretation that one would intuitively expect is "the type pm#thing has for name "something", and "something" is the name of the type pm#thing and maybe other types". This can be expressed by using '.' in any of the following ways.
pm#thing pm#name: "something" __[.->?, .<-?];
pm#thing pm#name: "something" __[.<->?];
pm#thing pm#name: "something" __[.<->.]; //this form is to be used with caution
The '.' permits to state that the relation is about a category, not about its instances. For an individual it would seem that using '.' or not does not matter but Paragraph 2.1.1.10 shows that any term (including individuals and strings) may have "extended specializations". Thus, it is better to use '.' only when one particular type needs to be referred to.
To avoid forcing the use of '.', the signature of the relation type pm#name may precise that this should be the default interpretation, as in the following equivalent relation type declaration.
pm#name .(pm#thing . [0..*], pm#string [1..*]);
pm#name .(., pm#string [1..*]);
'.(' introduces a relation signature. The first identifier after '.(', here pm#thing, specifies that the source node of the relation should be of type pm#thing, i.e., that any thing may have a name (here, pm#thing may be abbreviated by '?' or left implicit since '.' is specified). The following '.' specifies that the relation applies to the object used in the source node of the relation, not to its instances. If pm#term or a 2nd order type such as pm#type is used in the signature (or an Nth-order type with N superior to 2) the '.' needs not be specified, it is the default. Thus, relation signatures very rarely need to use the '.'. Then, '[0..*]' is the reverse cardinality for the relation: any string may be the name of 0 to many types (since such a cardinality is the default, it can be left implicit). Then, the type that any destination of such a relation may have is specified: pm#string. Then, '[1..*]' is the direct cardinality for the relation: any (declared) thing has 1 to many names.
If given the above signature, `pm´ still wanted to state that any (declared) thing has, by definition, "something" as a name, he would simply have to make the relevant quantifier explicit, as follows.
pm#thing pm#name: "something" __[any->?];
Since this last statement is a definition, it is "neither true nor false" in the sense that no one may contradict it, but it is "true by definition" in the sense that it states a relation (and a necessary condition) that `pm´ associates to pm#thing and its subtypes. If another person (say, `oc´) does not like this definition, he has to declare another term (say, oc#thing) and relate it to pm#thing (and, if needed, its subtypes) via some relation, e.g., a generalization relation. On the other hand, the following statement asserts that according to `pm´ every (declared) thing happens to have "something" as a name. This following statement (which is clearly false) uses a classic universal quantification and hence may be "corrected" by other people using a relation such as pm#corrective_specialization (the supertype for all relation types having for destination an `extended_specialization´ which is also a `correction´).
pm#thing pm#name: "something" __[every->?, 1..*<-.];
To state that having the name "something" is a sufficient condition for being a thing, first by assertion and then by definition:
pm#thing pm#name: "something" __[every<-.];
pm#thing pm#name: "something" __[any<-.];
If a type was used instead of "something", `__[any<-.]´ would not be a enough to state "sufficient conditions". Hence, to state them, it is safer to use the two following equivalent forms (the first one is needed because it permits to concisely express various quantifications on a same relation by different users but the first one will be used in the rest of this document because it makes the "sufficient conditions" mark easier to spot by the reader):
pm#thing pm#name: "something" __[.<=.];
pm#thing pm#name<= "something" __[.<-.];
To state that pm#thing can only have "something" as a name:
pm#thing pm#name: "something" __[.=>.];
pm#thing pm#name=> "something" __[.->?];
pm#thing pm#name=> "something" __[.=>.];
If a type was used instead of "something" (e.g., if `[.=>1..*]´ had been used), the `=>´ could be translated in RDF/XML using owl#allValuesFrom (`owl:allValuesFrom´ in the RDF/XML syntax).
To state that at least two things (that are not connected by an identity relation) have "something" as a name:
pm#thing pm#name: "something" __[2..*->?];
2.1.1.6. Relations and order of parameters in FL.
pm#Tom pm#kind: pm#person __[pm#author: oc] pm#document_creator __[pm],
pm#identifier: oc;
The above line asserts that pm#Tom has for pm#kind a pm#person according to `oc´, has for pm#kind a pm#document_creator according to `pm´, and has for pm#identifier `oc´ according to `pm´. This example shows that i) relations and relation destinations can be added without repetitions, and ii) within a context, the pm#author relation can be left implicit (in any case, the author must be a registered user). pm#kind refers to the usual meaning of an "instanceOf" relation type: the source node of the relation (the instance) must conform to (and "inherits from") the characteristics that were associated to the type via a partial/total definition of this type or via a statement using this type with a universal quantifier. Here are FL statements that are equivalent to the above one.
pm#Tom pm#kind: pm#person __[oc] pm#document_creator, pm#identifier: pm;
pm#person pm#instance: (pm#Tom pm#kind: pm#document_creator __[oc], pm#identifier: pm);
These equivalent statements show that i) pm#instance is the inverse relation type of pm#kind, and ii) parenthesis can be used for attaching relations to an object within a statement without introducing a new statement (thus, unlike when square brackets are used). The use of '_', '.' or ':' before or after '(' permits to change the aspect or order of the list composed of the relation name and its parameters, which is sometimes handy. These last two points are illustrated by the fact that the following statements are equivalent.
pm#Philippe pm#kind: pm#person; [pm#Philippe pm#kind: pm#person];
(pm#Philippe pm#kind: pm#person); [(pm#Philippe pm#kind: pm#person)];
pm#Philippe _(pm#kind: pm#person); (_ pm#kind: pm#person) pm#Philippe;
pm#kind _(pm#Philippe pm#person); (_ pm#Philippe pm#person) pm#kind;
(: pm#kind: pm#Philippe pm#person); (: pm#kind pm#Philippe pm#person);
(: pm#Philippe pm#kind: pm#person); (: pm#Philippe pm#person pm#kind:);
The "_(", "(_" and "(:" prefixes of FL are shared by FCG and can make these notations look like and be used like lisp-based notations, which is interesting in certain cases, for example when functional relations or functions are used. Furthermore, although this has not yet been implemented in WebKB-2 but is prepared for via the notation/presentation ontology of Section 4.5, such prefixes and related syntactic sugar could be changed by each user (and, in certain cases be made optional) by selecting values from this ontology or changing the default values that define the selected language (Table 4.5.5.3 gives examples). This will be a major step to permit the re-use of the same import/export programs to parse or generate knowledge in some other notations. Having different prefixes for contexts/statements, groups of relations and relation/function parameters – e.g., by default in FL, FE and FCG, "_(", "(_" or "(:" for function calls and ".(" for relation signatures – permits FL, FE and FCG to have a Lex&Yacc parser – hence a LALR(1) grammar – and eases the reading of complex statements. The shared Lex parser of FL, FE and FCG can be made generic enough to allow small and coherent changes of the prefixes to be dynamically made by each user, within an input file. Then, the Yacc parser of FCG can be extended to parse FL, FE, CGIF, KIF and hence probably all well-known graph-based and logic-based KRLs. Section 4.3 presents a shared grammar in preparation of this. The parser for RDF/XML currently appears too different to be worth being integrated into this unique parser. The FE parser can then be independently extended to handle more English looking statements. Thus, the FS parser will continue to call different sub-parsers: the FE parser, the RDF/XML parser and another one for all the other languages. This shared parser will also take into account the selected (re-)presentation options, not just the selected language and used statement delimiters.
The current FL parser of WebKB-2 does not yet fully handle the '_', '.' or ':' before or after the parenthesis and square brackets (hence, the default order must often be followed), and expects parenthesis instead of brackets as context delimiters. Most of the other syntactic features presented in this document (and all of those presented for FE and FCG) are implemented.
2.1.1.7. User/category declarations in FL. In FL, information sources can only be declared in this rather peculiar way:
_user@thisKB pm#instance: anonymous_user@thisKB _[u]
spamOnly@phmartin.info _[pm pw001xyKtDq2k];
_user@thisKB is predefined in FS (since a type for all registered users is needed) and hence the parser interprets the context information as i) the short identifier for the newly declared user, followed by ii) the encryption of its password with the Unix function "crypt". Thus, the user declarations (which, like other knowledge representations, can be generated by the interfaces of WebKB-2) can be included in import/export files in a (at least minimally) secure way. Except in this special case, user identifiers must be declared before they are used.
If an identifier such as pm#thing has been declared, FS accepts `thing%pm´ as an alternative identifier, and conversely. In both cases, `pm´ must have been declared as a user. By default, the FS parser only accepts identifiers that are either i) predefined, ii) a registered user, iii) prefixed or postfixed as above indicated, or iv) common URIs (typically, URLs and email addresses; these identifiers have no author). By default, to prevent some lexical errors and then semantic problems, the FS parser does not accept identifiers to refer to different objects that are not related by an identity relation, and it does not accept the use of yet unknown identifiers, unless they are in a statement that declares them. Declarations are made by relating the new identifier to an existing one by a relation of type pm#equivalent_object or pm#extended_specialization (and hence also any subtype of it, e.g., pm#instance or pm#subtype).
2.1.1.8. Some predefined types of relations from types.
pm#supertype_or_equal .(pm#type, pm#type) pm#supertype: pm#relation_from_type pm#generalizing_category, pm#subtype: pm#supertype___strict_supertype pm#direct_supertype___direct_strict_supertype, pm#inverse: pm#subtype_or_equal, pm#equivalent_object: rdfs#subClassOf owl#subClassOf ontolingua#subclass-of;
The Multi Source Ontology (MSO; the default ontology of WebKB-2) relates categories from many ontologies, including language ontologies such as RDFS, OWL and the Frame Ontology of Ontolingua [www-Ontolingua-FO 1994]. The above statement defines pm#supertype_or_equal as being equivalent to rdfs#subClassOf, owl#subClassOf and ontolingua#subclass-of, and asserts that these last three types are equivalent according to `pm´ (before doing so, pm#type was defined as being equivalent to rdfs#Class, owl#Class and ontolingua#Class). These equivalence relationships may not be strictly correct if pm#equivalent_object is interpreted as the identity or equivalence relations of certain logics but the MSO can be complemented by other users of WebKB-2 to make this explicit. The use of relations such as pm#supertype_or_equal is discouraged by WebKB-2 because the "equal" part reduces possible validations. In FL, FE and FCG, many relation types such as the following ones are predefined to allow inferences or validations and have abbreviations:
Only the first eleven abbreviations will be used in this document and the abbreviation of pm#inverse (i.e., '-') will only be used in Chapter 3. The relation type pm#definition will be used for relating a term to an informal definitions of it.
pm#definition .(pm#term, pm#description)
> pm#definition_of_necessary_condition pm#definition_of_sufficient_condition,
pm#definition_of_necessary_and_sufficient_condition;
2.1.1.9. The pm#supertype relation defines necessary conditions for the source term and sufficient conditions for the destination term. This is a consequence of i) the definition of the "supertype" relation with respect to the "instance" relation (in FE: ` `a pm#type *x has for pm#supertype a pm#type *y´ iff `any (pm#thing that is pm#instance of *x) is pm#instance of *y´ ´), and ii) the meaning of the "instance" relation (any definition or universal statement associated to a type applies to its instances).
In WebKB-2, this consequence holds as long as no conflict is (manually or automatically) detected, even if the creator of the pm#supertype relation is not the same as the creator of the source and destination term. If a conflict is automatically detected between statements belonging to a same creator, the last entered statement (the one causing the conflict) is rejected. If a conflict is automatically detected when a creator connects a pm#supertype relation from/to at least a term that she has not created, this relation is rejected. If, when the creator of a term (say, pm#X) associates a definition or universal statement to pm#X, a conflict is automatically detected between the statements associated to pm#X and statements associated to another term (say, pm#Y) from another creator (say `oc´), X or Y is "cloned", that is, its identifier is automatically changed (e.g., into oc#X) but its previous relations are kept (in order to keep its meaning) and the clone is connected to its source (e.g., oc#X becomes a supertype of pm#X). Core techniques for this cloning (i.e., in the previous example, for selecting an appropriate oc#X or oc#Y and connecting it to pm#X or pm#Y) are described in Annex 2 of [Martin, 1996] and are not repeated in this document. However, the conflict solving protocols (which rely or not on such a cloning) will be described in Subsection 2.2.5.
In WebKB-2, these rules apply to all the subtypes of the pm#extended_generalization relation type and hence also, symmetrically, to all the subtypes of the pm#extended_specialization relation type.
2.1.1.10. The pm#extended_specialization relation type. This type is predefined in FS and refers to relations of "specialization" in its general sense between terms (e.g., strings and category name/identifiers) or statements. This general sense of "specialization" is that the destination node of this relation includes more information than the source node. Extended specialization relations can be manually set. The 'ext-spec' graph matching operator (described in Section 2.4) permits to discover extended specialization relations between many kinds of (semi-)formal statements. The extended_specialization relation is a supertype of the (logical) generalization – the inverse of a specialization – which corresponds to a logical deduction. For example, generalizing an existential conceptual graph (or a logical formula) can be done by cutting any of its branches or generalizing any of the categories it contains. In WebKB-2, the 'gen' operator looks for recorded statements that generalize a query graph or, if the parameter is not a graph but a category, looks for its types and supertypes. Its inverse operator is 'spec'. The 'ext-spec' operator does not just look for types/statements that are more constrained (i.e., have more information) in a logical sense but in a general sense. For example, it considers that any context (e.g., using modalities or temporal relations) is a constraint, that the universal quantifier is more constraining than the existential quantifier, and that the identifier of a type is more constrained than a name of this type (since categories may share names but not identifiers). More generally, 'ext-spec' can also take into account manually set pm#extended_specialization relations between formal or informal terms. Table 2.1.1.10.1 shows some subtypes of pm#extended_specialization.
Here are examples of use. (The meaning of the '{(' and ')}' delimiters is explained in the next paragraph).
"animal related concept"
.> ("animal right"
.> (pm#right_of_an_animal // < pm#right, //commented since redundant with next line
:<=> (pm#right owner=> some pm#animal),
//pm#owner=> a pm#animal, //implied by the definition of the previous line
.> (pm#right_of_every_animal
pm#owner=> every pm#animal,
.> (pm#right_of_any_animal owner=> any pm#animal)
(pm#right_of_Garfield owner=> Garfield)
) ) );
pm#term
> {( (pm#informal_term < pm#string) (pm#formal_term .> pm#informal_term) )}
{( (pm#term_for_an_individual .> pm#individual,
> {(pm#informal_term_for_an_individual pm#formal_term_for_an_individual)} )
(pm#term_for_a_type > {(pm#informal_term_for_a_type pm#type)} )
)};
pm#individual > {(pm#formal_term_for_an_individual pm#statement)};
pm#formal_term_for_an_individual > pm#formal_term_for_a_statement;
pm#extended_specialization___extended_strict_specialization .(., .)
pm#abbreviation: '.>',
pm#inverse: (pm#extended_generalization pm#abbreviation: '.<'),
> (pm#extended_specialization_from_formal_or_informal_term .(pm#term, ?)
> (pm#type_specialization .(pm#type, pm#formal_term) > pm#instance pm#subtype)
(pm#term_specialization .(pm#term, ?)
pm#inverse:
(pm#term_generalization .(pm#term, pm#term)
> (pm#formal_term_generalization .(pm#formal_term, pm#term)
> (pm#type_name .(pm#type, pm#informal_term) < pm#name) )
pm#string_generalization .(pm#string, pm#string) ))
//A string X generalizes a string Y if X is a regular expression that
// i) describes Y if Y is not a regular expression, or
// ii) describes more strings than Y if Y is a regular expression.
//The other case for a string X to be considered as a generalization
// of a string Y if X has the same content as Y except for some spaces
// at the beginning or end of Y (the user may decide which characters
// should be considered as spaces; see Subsection 4.5.2 for details)
pm#definition //.(pm#term, pm#description)
pm#subdomain //.(domain, domain) //declared in next subsection
)
(pm#extended_specialization_from_statement .(pm#description, pm#description)
> (pm#specializing_statement
pm#abbreviation: '>' '<=',
pm#inverse: (pm#generalizing_statement pm#abbreviation: '<' '=>') )
(pm#corrective_specialization___corrective_restriction
> pm#corrective_existential_specialization)
pm#overriding_specialization
);
|
2.1.1.11. Introduction to collections.
pm#thing
> {(pm#situation pm#entity)} {(pm#thing_playing_some_role sowa#independent_thing)}
{(sumo#physical sumo#abstract)} {(pm#indivisible_thing pm#divisible_thing)}
{(pm#individual pm#type)} {(dolce#particular dolce#universal dolce#world)}
{(sowa#continuant sowa#occurrent)} {(cyc#partially_tangible cyc#intangible)}
{(cyc#temporal_thing pm#non-temporal_thing)} {(cyc#partially_intangible cyc#tangible)}
{(pm#domain pm#thing_that_is_not_a_domain)} {3D#thing 4D#thing},
pm#closed_exclusion: owl#nothing,
= owl#thing cyc#thing akts#thing sumo#entity sowa#entity rdfs#resource;
In FL, FCG and FE, curly brackets are used for delimiting collections. By default, their type is pm#exclusive_AND-set (the members of a set of that type are "exclusive" according to the pm#subtype relation and the pm#equivalent_object relation; exclusive types cannot have common subtypes or instances, and other exclusive objects cannot be identical or equivalent; the type pm#part_exclusive_AND-set is needed to specify disjoint types according to the pm#part relation, i.e., to specify that the objects cannot have common parts). Thus, given the default parsing rules, the next two statements are equivalent.
pm#thing > {3D#thing, 4D#thing}; //the "," is optional (see Table 4.3.7)
pm#thing > {3D#thing __[pm] 4D#thing __[pm]}_[pm#kind: pm#exclusive_AND-set _[pm]]
By default, a collection is "distributive", i.e., each of its (implicit or explicit) members is an individual source/destination of the relations connected to the collection. The distributive, collective or cumulative interpretation of a collection may be specified using the keywords coll and cuml in the context of the relation. For example, in the next statement the relation of type pm#collection_size can be used because a cumulative interpretation of the set is specified.
{3D#thing 4D#thing}_[pm#collection_size: 2 __[. cuml -> ., * <- .]];
Thus, by default, a set of types refers to a (non necessarily complete) type partition. To precise that a set is complete with respect to a certain relation, the keyword `complete´ can be used in the cardinality/quantifier part of the context of this relation. Alternatively, if within a statement (i.e., until its final ";") the set is complete with respect to all the relations it is connected to, the "{(" and ")}" delimiters can be used as an abbreviation. As an example for this, the next two statements are equivalent.
pm#thing > {pm#situation pm#entity} __[. -> . complete];
pm#thing > {(pm#situation pm#entity)};
//Note: when the destination set of a relation of type pm#subtype or pm#part is complete
// this set represents the type of the source node of the relation.
By default, the creators of the relations to the types inside a partition are the same as the creator of the partition but different creators may also be explicitly specified. The creator of the partition is the creator of the exclusion relations between the types in the partition. In a subtype partition of a type X, at least one type must be a direct subtype of X. Two partitions of X can group nearly the same types but can still be both required to express the specializations of X according to two different viewpoints. This viewpoint can be expressed in the context of the partition, as in the following example:
pm#thing
> {(pm#indivisible_thing pm#divisible_thing)}
_[pm#relation_on_which_the_source partition_is_based: pm#part __[. cuml -> .];
2.1.1.12. Introduction to some top-level concept types. The above subtype partitions of pm#thing lists some important top-level distinctions:
In this document, the terms "objects" or "categories" refer to types (concept/relation types) as well as individuals (including statements and collections). The term "node" refers to the syntactic structure composed of an object and its associated quantifier if it has one (examples of concept nodes in FE and with `wn´ referring to WordNet, `some wn#cat´, `at least 2 wn#cat', `at least 2 wn#cat', `the wn#cat "Tom"´ and `wn#cat´).
2.1.1.13. Situations (states or processes) and descriptions. A (real or imaginary) situation is described (represented) by a statement (description, i.e., a definition, a belief or a preference). In theory, a type of "relation from a situation" (i.e., a relation type having a signature which specifies that a relation of that type can only have for source a node of type pm#situation) cannot have for source a description, and conversely, a type of "relation from a description content/medium/container" cannot have for source a situation. In both cases, one should theoretically explicitly use an intermediary relation such as pm#descr to connect the situation node to the description node, and indeed one must do so in most languages, e.g., Conceptual Graphs. Since this is tedious, does not bring any information and leads to bigger representations, such relations can be left implicit in all sub-languages of FS, and (formal or informal) statements do not have to be typed: indeed, when needed, for each relation connected to the statement, the relevant type (situation or description) can automatically be inferred from the signature of this relation.
As shown by the next statements, in the MSO there are types for description content (e.g., belief, narration), description mediums (e.g., languages, abstract data types) and description containers (e.g., documents), and one type that generalizes all of them. The main reasons for that type are: i) there are many types of relations that apply to description content as well as description mediums or containers, ii) distinguishing the actual type of such objects is tedious, sometimes difficult and hence leads users to make different representation choices (thus, this is not a task that people should have to do), and iii) many ontologies do not distinguish between these different types. However, as detailed in the next chapter, the MSO also has relations specific to each type.
pm#situation pm#description_content/medium/container:
pm#description_content/medium/container __[*<->*];
pm#description_content/medium/container > {pm#description pm#description_container};
pm#description > pm#description_content pm#description_medium;
2.1.1.14. Things that can be seen as relations. In all sub-languages of FS, certain concept types can be used in binary relation nodes as if they were binary relation types, as illustrated by the above statement about pm#situation. This is a rare but not exceptional feature (e.g., it has been used in Ontoseek [Guarino et al., 1999]) since it is handy and, as later detailed, avoids to "duplicate concept types" in the relation type hierarchy and thus eases knowledge sharing. In this document, the expression "duplicating types" (duplicating types elsewhere in the ontology) means creating other types with similar names and with structurally similar relationships. To remind the special meaning of this expression, the word 'duplicate' will always be single quoted in this document. The conditions for a concept type X to be used as a relation type are that: i) X is declared as subtype of pm#thing_that_can_be_seen_as_a_relation (which is predefined in FS), ii) the destination node is of type X and, iii) if X has an associated signature, the source node must conform to that signature. The following statements show some of the subtypes of pm#thing_that_can_be_seen_as_a_relation and an example of a relation signature associated to a concept type (although this was not necessarily in this particular case). When exporting to languages that do not have such a feature, for each subtype of pm#thing_that_can_be_seen_as_a_relation, the declaration of a corresponding relation type (including the relation signature) can be generated (see Subsection 4.2.13 for a specification of this in KIF). This corresponding relation type may also be explicitly declared, e.g., in this document, pm#definition is declared as a relation type in Paragraph 2.1.1.10 (since its supertype is a relation type) while pm#Definition is declared as a concept type in Table 2.1.3.3 (since its supertype is a concept type). In such a case, a relation of type pm#manually_set_corresponding_relation_type must be set between the concept type and the corresponding relation type. A relation type cannot be given a signature different to the one given to its corresponding concept type, and conversely. If, in a statement, the signature of the used relations are consistent with the use of both a concept type and its corresponding relation type, by default the concept type is selected. A way to lift the ambiguity is to use more specific relations with this used type.
pm#thing_that_can_be_seen_as_a_relation
< pm#thing_playing_some_role,
> pm#thing_that_can_be_seen_as_a_function pm#attribute_or_quality_or_measure
pm#description_content/medium/container pm#entity_playing_some_role
wn#relation;
pm#description_content/medium/container .(situation, pm#description_content/medium/container);
2.1.1.15. Signatures of processes in order to make certain relations implicit. Types subtypes of pm#process cannot be subtypes of pm#thing_that_can_be_seen_as_a_relation but can have some kind of signature to allow their use in relation nodes. Here is an example of declaration (note: '*x' and '?x' are equivalent ways to declare or refer to a variable named "x" and hence to an unknown object which, if no quantifier is explicitly given, is existentially quantified; in FL, FCG and FE, the scope of a variable covers the whole statement in which it has been declared, including its context; in other words, the scope ends with the ';' ending the statement):
wn#indexing .(pm#input: ?x, pm#output: ?y) input: pm#thing ?x, //anything, not just a description_content/medium/container output: pm#description ?y; //an indexing output (an index) is a description
This declaration makes the following two statements equivalent and permit the first one to be accepted:
"Tom is on a mat" wn#indexing: "Tom" "mat"; "Tom is on a mat" input of: (a wn#indexing output: "Tom" "mat");
The default signature for processes is ".(pm#object: *x, pm#result: *y)". Since pm#input is subtype of pm#object and pm#output is subtype of pm#result, the above signature could have been omitted, and the following two statements would have been equivalent.
"Tom is on a mat" wn#indexing: "Tom" "mat"; "Tom is on a mat" object of: (wn#indexing result: "Tom" "mat");
2.1.1.16. Special keywords in relation nodes/contexts: `more'/`less', `main', `1st', `2nd´, ..., `last´. Here are self-explanatory equivalent examples about the use of `more´/`less´.
pm#informal_term has less pm#constraint_on_meaning than: any pm#formal_term; pm#informal_term less pm#constraint_on_meaning than: any pm#formal_term; pm#formal_term more pm#constraint_on_meaning than: any pm#informal_term;
Here are self-explanatory equivalent examples about the use of `main´.
pm#process_with_a_process_as_main_input main pm#input: process; pm#process_with_a_process_as_main_input pm#input: process __[main];
Here are self-explanatory equivalent examples about `1st', `2nd´, ..., `last´ and the use of a variable as a member index (here, the variable is `?Nth´; unless quantified elsewhere, such variables are implicitly existentially quantified).
pm#changing_a_tire .(pm#agent: *x, pm#object: {wn#tire *t1, wn#tire *t2})
1st pm#subprocess: pm#unscrewing_all_screws_of_a_tire .(pm#agent: *x, pm#object: *t1),
(?Nth != last) pm#subprocess: pm#disposing_of_a_tire .(pm#agent: *x, pm#object: *t1),
last pm#subprocess: pm#screwing_all_screws_on_a_tire .(pm#agent: *x, pm#object: *t2);
pm#changing_a_tire .(pm#agent: *x, pm#object: {wn#tire *t1, wn#tire *t2})
pm#subprocess: { pm#unscrewing_all_screws_of_a_tire .(pm#agent: *x,pm#object: *t1)__[?ct->?]
(pm#disposing_of_a_tire .(pm#agent: *x, pm#object: *t1)
(?Nth != last) pm#member of: ?subprocesses ) __[?ct->?]
pm#screwing_all_screws_on_a_tire .(pm#agent: *x, pm#object: *y) __[any->?]
}_[pm#kind: pm#sequence] __[any ?ct->?subprocesses];
//note: the use of ?ct is optional (destinations in a collection must have the same source)
The use of these keywords avoids the need to define relation types such as pm#main_input or pm#less_constraint_on_meaning_than. The type pm#constraint_on_meaning is actually a subtype of wn#precision which is an indirect subtype of pm#attribute_or_quality_or_measure which is a subtype of pm#thing_that_can_be_seen_as_a_relation. Relations such as `less pm#constraint_on_meaning than' permit to order types of descriptions (e.g., terms and statements) with respect to how well-defined their meaning are. This permits to order indexing processes (creating such descriptions) with respect to precision thanks to the rule "the more precise the indexation, the more precise the indexing process that produces it". The representation of this rule in FL and FE will be given later.
2.1.1.17. The pm#part relation type. This type refers to various kinds of binary mereological relations. It only connects objects of the same type. Hence, it is not necessary to declare a specific mereological transitive relation for every concept type: pm#part can be used directly. Below are some definitions. Table 3.1.13.2 categorizes additional mereological relation types.
pm#part .(*x, *y)
pm#relation_source: (*x pm#kind: *t __[.<->?]) __[.->?],
pm#relation_destination: (*y pm#kind: *t __[.<->?]) __[.->?],
:= [ [?x direct_part ?y] or: [?x part: ?y] ]; //or:
//:= [?x (direct_part: a description_medium)* direct_part: ?y];
< pm#part_or_equal //the next 3 lines will be explained later
(pm#relation_instance_of_transitiveProperty_unless_directly_overrided
kind: owl#transitive_property,
type: pm#type_instance_of_a_certain_second_order_type_unless_directly_overrided),
> {pm#sub-situation pm#spatial_part pm#sub-attribute pm#sub-description pm#subdomain}
pm#sub-collection (pm#member > pm#partner);
Example of subtypes:
pm#sub-situation .(pm#situation, pm#situation)
> pm#subprocess pm#substate;
pm#spatial_part .(pm#spatial_object, pm#spatial_object)
> {(pm#physical_sub-area .(pm#physical_entity, pm#physical_entity)
pm#non-physical_sub-area .(pm#spatial_object, pm#spatial_object) )}
(pm#physical_part .(pm#physical_entity, pm#physical_entity)
> pm#matter___stuff
(pm#physical_sub-area
> pm#attached_physical_component pm#removed_physical_piece) );
The relation types pm#relation_source and pm#relation_destination are predefined in FL, FCG and FE. The above listed examples of subtypes of pm#part already include more distinctions than Rogers and Rector mention in [Rogers & Rector, 2000] to have found in the literature: component, stuff, portion, area, member, partner and piece.
2.1.1.18. Mixing FL, FCG and FE.
[_ pm, pm#language: pm#FCG _[@<->?]] [2 wn#cat, pm#place: (a wn#mat, wn#color: some wn#red)];
The above formal sentence begins with a context which asserts that the contextualized statement has `pm´ for author and is written in FCG. The above `@´ quantifier specifies that the pm#language relation has for source the abstract/concrete model (a description medium) of the contextualized node, not the description content of this contextualized node. Here are equivalent declarations of pm#language with a signature that permits to omit the _[@<->?] above part since the '@' is specified:
pm#language .(pm#description_content/medium/container @ [0..*], pm#description_medium [1]);
pm#language .(pm#description_content/medium/container @ -> pm#description_medium);
//'->' indicates a functional relation
This facility also applies to the content of strings, e.g., the content of the following string is declared as being interpretable in Perl. In this example, the content is a double quoted that contains a regular expression. This expression is declared as being interpretable in Perl.
$("<script .* language='[A-Za-z0-9/_-]*'.*>")$ _[pm#language: Perl]
//this Perl expression matches the content of the string "<script language='FS'>"
Here are equivalent FE statements for the FL statement at the beginning of this paragraph:
[_ pm, pm#language: pm#FE] [2 wn#cat have for pm#place a wn#mat which has for
wn#color some wn#red)];
[_ pm] `2 wn#cat have for pm#place a wn#mat with wn#color some wn#red´;
`2 wn#cat have for pm#place a wn#mat which has for wn#color some wn#red´_[pm];
`2 wn#cat have for pm#place a wn#mat with wn#color some wn#red´_[with pm#author pm];
These examples illustrate that FCG, FE and FL can be mixed, that they have common features, that FCG and FE have a similar structure but different syntactic sugar and that the delimiter of FCG statements are square brackets while those of FE statements are the single backquote and the single right quote.
FE and FCG can use multiple types in a concept node. The objects referred by such a node must conform to each of these types. As illustrated by the next example, an exception is that when at least one of the types is not subtype of pm#attribute_or_quality_or_measure the objects do not need to conform to those types that are subtypes of pm#attribute_or_quality_or_measure. Instead, in such a case, the objects must be considered as being related to instances of those types by an pm#attribute_or_quality_or_measure relation. The following example of equivalent FCG statements illustrate this. The rationale for this rule is given in Paragraph 2.3.1.2.
[a wn#red wn#mat]; [a wn#mat, pm#attribute_or_quality_or_measure: some wn#red]; [a wn#mat, wn#color: some wn#red]; //since wn#color is a supertype of wn#red and an attribute [a wn#mat, wn#red___redness: some wn#red];
2.1.1.19. Parsing indications/directives.
[_ parsing][(pm#formal_term pm#part of: pm#relation_node) pm#default_creator: pm __[any->?]]; [_ parsing][(pm#formal_term pm#part of: pm#concept_node) pm#default_creator: pm wn]; [_ parsing][pm#term_being_declared pm#default_creator: pm]; [_ parsing][pm#new_term pm#default_creator: pm]; [_ pm#kind: pm#parsing_macro][pm#statement pm#default_creator: pm];
The above three statements are indications to the FS parser that (respectively)
All the terms in the above statements are predefined in FS; indeed, the FS parser needs to know their special meanings to take the parsing indications into account. These terms are part of the presentation/parsing ontology that is proposed in Chapter 4. The keyword `parsing´ is an abbreviation for "pm#kind: pm#parsing_macro". Parsing indications are particular kinds of defeasible statements, i.e., statements that can be overrided by other statements. The FS parser accepts statement overriding only if those statements have been marked as defeasible (e.g., via the keyword `defeasible´ or `parsing´ in their context; properties associated to a type "X" subtype of pm#type_instance_of_second_order_type_unless_directly_overrided can also be overrided by the subtypes of "X" as in [Y supertype: X __[.->0] ]). Overriding a statement does not override any other statement, even the more specialized statements. This is why the above 4th parsing indication does not override the above 2nd parsing indication. To ease the reading and writing or generation of informal statements, lexical parsing indications/abbreviations are also taken into account in the content of strings that are double quoted or delimited by "$(" and ")$", as for the expansion of variables. The current FS parser of WebKB-2 does not yet accept parsing indications written in FL but accepts and takes into account many predefined abbreviations for them.
2.1.1.20. Relations from/to quantified categories.
cat place: (mat color: red __[?m<->?]) __[2->?m]; //a non-free variable in a cardinality is cat place: (mat color: some red) __[2->?]; // by default existentially quantified cat place _[2->?]: (mat color: red); 2 cat place: (a mat color: some red);
The above equivalent statements are translations in FL of "there are 2 cats on a red mat", with the previous parsing indications taken into account, hence without the 'pm#' and 'wn#' prefixes. Since the concept node of type 'mat' is existentially quantified and given the variable ?m due to `__[2->?m]´ for the `place´ relation, there is no need to repeat these existential quantifier and variable for the `color´ relation.
Assuming that the user `oc´ has stated that "any cat is on a red mat" and hence has used '_[any->?]' instead of '_[2->?]', the two statements could be merged into any of the following equivalent statements:
cat place: (mat color: some red) __[2->?, any->? _[oc]]; cat place: (mat color: some red) __[any->? _[oc], 2->? _[pm]]; cat place: (mat color: some red) __[oc, 2->? _[pm]];
Expressing that "there are 2 cats on a mat and that any mat may be on a floor" can be done in the following equivalent ways.
cat place: (mat place: floor __[any->?]) __[2->?];
cat place: (. mat place: floor __[any->?]) __[2->?];
cat place: (. mat place: a floor) __[2->?]; //here, the '.' is not optional
2 cat place: (. mat place: a floor);
2 cat place: a (. mat place: a floor);
//2 cat place: a (mat place: a floor); //this one uses a lambda-abstraction and states that
// 2 cats are on a mat-on-a-floor
The `.´ permits to specify that a concept node does not have the quantifier or implicit/explicit variable that it was given by a previous relation. This feature permits to avoid specifying quantifiers that are the default ones. Subsection 4.2.1 presents complementary examples.
These last examples show how FL permit to express in one concise statement what would need to be expressed in several statements (and sometimes long statements) in other notations. This eases knowledge comparison, especially knowledge from various users.
2.1.1.21. Contextual relations from/to (descriptions of) processes. Consider the following FCG statements.
[a sitting, agent: a cat, time: 21/01/2009]; [ [a sitting, agent: a cat], time: 21/01/2009];In WebKB-2, these statements are considered equivalent. This can be seen as an extension of the above described feature for i) non-distinguishing contexts on situations from contexts on descriptions, and ii) allowing the scope of a variable to be usable until the end of the whole statement. Without this extension, the first statement asserts that there exists a sitting cat and that it is sitting on the 21/01/2009. With this extension, like the second statement, it asserts that on the 21/01/2009 there exists a cat and that it is sitting. This extension means considering that "truth restricting relations from/to a context" connected to a process also apply to all the objects directly or indirectly connected to this process within the same context (i.e., in FCG, within the nearest pair of square brackets). Without this extension the two above statements would not be automatically comparable.
2.1.1.22. Complex definitions. Below are three equivalent representations (one in FE and two in FL) of the rule "the more precise the indexation, the more precise the indexing process that produces it". For clarity purpose, the authors of certain categories are made explicit even though the content of the MSO and the above parsing indications permit to leave them implicit.
`any `wn#indexing *i with result a pm#description *y´ has less wn#precision than any `wn#indexing *i2 with result a pm#description that has less wn#precision than *y´; wn#indexing ouput: pm#description ?y, less wn#precision than: any (wn#indexing output: (a pm#description less wn#precision than: ?y); wn#indexing .(pm#input: ?x, pm#output: ?y) less wn#precision than: any (wn#indexing output: (a pm#description less wn#precision than: ?y);
Given this rule and the following statements in FL, the conclusion in FE is `any pm#indexing_by_an_informal_term has less pm#precision than any pm#indexing_by_a_formal_term'.
pm#constraint_on_meaning < wn#precision; pm#informal_term has less pm#constraint_on_meaning than: pm#formal_term __[any<->any]; pm#indexing_with_a_formal_term result: a pm#formal_term; pm#indexing_with_an_informal_term result: a pm#informal_term;
However, WebKB-2 is not yet able to draw such conclusions. Similarly, it cannot see the logical equivalence between the above FE representation of the rule and this one:
` `a pm#description *y has more wn#precision than a pm#description *y2´ has for pm#implication `a wn#indexing with output *y has more wn#precision than a wn#indexing with output *y2´ ´;
Unlike formulations via definitions or using universal quantifiers, formulations using pm#implication between statements do not have a structure that can be directly translated into a frame-like notation such as FL and hence be accepted by frame-based systems or description logic based systems. On the other hand, theorem provers would have no problem translating between the two formulations. Hence, for knowledge sharing purposes, it seems better for information providers not to use the last kind of formulation (in other words, the first kind can be seen as the normalized formulation).
2.1.1.23. Syntactic sugar for collections, to ease the reading of argumentation structures. FL can be used for representing argumentation structures or "semi-formal discussions": formal or informal sentences created by various persons and related by argumentation relations, specialization relation and corrective relations. It permits to clearly distinguish "a relation Y on the destination sentence of a relation X" from "a relation Y on a relation X" (in the last case, nothing is said on the destination of the relation X). Few argumentation systems make such a distinction and, more generally, accept meta-statements. ArguMed [Verheij, 1999] is one of the exceptions. Hence, most of them force incorrect representations of argumentation structures. This is clearly true of systems restricted to the famous argumentation schema of Toulmin, as for example noted by [Newman & Marshall, 1992]. Even fewer provide a textual notation that is not XML-based. Such a notation is nonetheless necessary whenever the use of an XML parser, editor or viewer is impossible or not desirable (this is for example the case in many text-based email editors, in text-based browsers, and in PDF or HTML documents). Finally, FL is not restricted to argumentation structures nor informal sentences. It permits to use them only when it is not handy to use more precise representations. It is also not restricting the user to a predefined ontology of only a few types of concept or relations.
To ease the reading of "joint arguments/objections" (e.g., a rule and its premise), instead of using the normal syntactic sugar for a collective AND-set ("{x, y, ...} __[coll]") dashes are used as shown by the next example. This example mentions three sources: `pm´, `fg´ and `tbl´ (for Tim Berners-Lee). The identifier `pm/tbl´ refers to `tbl´ as interpreted by `pm´: `pm´ is the creator but `tbl´ is the initial source. Thus, '__[pm/tbl]' is an abbreviation for '__[pm, source: tbl]'. Such an abbreviation will also be used in later subsections. The informal sentences in this example have no associated creator.
"knowledge_sharing_with_an_XML-based_language is advantageous"
.< ("knowledge_sharing_with_an_XML-based_language is possible"
.< knowledge_sharing_with_an_XML-based_language __[pm]
) __[pm],
argument: – "XML is a standard" __[pm]
– ("knowledge_management_with_classic_XML_tools is possible"
corrective_specialization:
"syntactic_knowledge_management_with_classic_XML_tools is possible"
__[pm, argument: "classic XML tools only manage structures not semantics"__[pm]
]
)__[pm],
argument: "the use of URIs and Unicode is possible in XML"
__[fg, objection: "the use of URIs and Unicode can easily be made possible in
most syntaxes" __[pm/tbl]
],
objection: – ("the use_of_XML_by_KBSs implies several tasks to manage"
argument: "the internal_model_of_KBSs is rarely XML" __[pm]
)__[pm]
– ` "an increase of the number of tasks *t to_manage" has for consequence
"an increase of the difficulty to develop software to manage *t"
' __[pm],
objection: – "knowledge_sharing_with_an_XML-based_language will force many persons
(developers, specialists, etc.) to learn how to understand
complex_XML-based_knowledge_representations" __[pm]
– ("understanding complex_XML-based_knowledge_representations is difficult"
argument: "XML is verbose" __[pm]
)__[pm];
|
2.1.1.24. Category naming style and lexical facilities. For readability purposes and to ease translations into FE or other kinds of controlled English, the users of WebKB-2 are discouraged (e.g., via warnings) to declare new identifiers that do not follow the English spelling and capitalization of the words used in the identifiers. The "loss-less category naming style" is to re-use those words as such and separated by '_' or '-'. Unfortunately, the currently most commonly used naming style – which is called the "W3C category naming style" in this document – is to follow the Intercap style (as in the identifier rdfs#subClassOf) and use an uppercase character for the first letter of the identifier if it is not for a relation type (as in rdfs#Resource). Indeed, this last style became a well-followed convention when it was adopted in the RDF examples given by the W3C and, much earlier, in the Ontolingua library [www-Ontolingua-library 1994]. However, when this naming style is used, an English dictionary is needed for recovering the exact case for the first letter of the English words composing an identifier and also for isolating these words when several uppercases are used consecutively. RDFS proposes the rdfs#label relation to permit knowledge providers to specify the source English words but this is understandably rarely used and not used in a uniform way.
For the MSO to have a coherent naming style, each new identifier following the "W3C category naming style" is converted into the "loss-less category naming style" whenever i) there is a way to quickly convert it back to the "W3C category naming style" for knowledge export purposes, and ii) this does not lead to a lexical conflict with another already declared category from the same source/user. By default, for search purposes, WebKB-2 first converts the given words or category identifiers in lowercase and ignore the '_' or '-' characters as well as the final 's'; if this leads to lexical ambiguities (i.e., several possible identifiers), the given spelling and capitalization are used. In an expression or statement in FE or FCG, it is possible to add a final 's' to an unprefixed category identifier when it is used with a numerical quantifier, as in `75% of cats´ and as long as there is no lexical ambiguity. However, the use of this last facility is generally avoided in this document.
2.1.1.25. Generation of categories for organization purposes. Consider the sentence "governments should enforce animal rights" and its normalized representation in FE: `any `enforcement with object `any `right with owner an animal´ ´ ´ should have for agent any government´. In a KB storing knowledge about animals or rights, several statements may be recorded about notions such as "animal rights", "enforcement by government", "enforcement of animal rights" or "enforcement of animal rights by governments". Then, for knowledge organization purposes – and hence for knowledge sharing, comparison and retrieval purposes – it is interesting to have such notions inserted in the specialization hierarchy and such statements accessible from these represented notions. Ideally, the representation of the sentence and the insertion of the notions would be (semi-)automatic but WebKB-2 does not yet perform such tasks. However, it can propose identifiers for such notions and the MSO includes the pm#extended_specialization relation type which permits to organize them and relate them to their associated statements. Here are examples.
"right"
.> (wn#right
> (right_with_an_owner
> (right_with_owner_an_animal
.> (`enforcement with object `any `right with owner an animal´´´
and with agent any government'
.> `any `enforcement with object `any `right with owner an animal´´´
should have for agent any government´) ) ) );
"enforcement"
.> (wn#enforcement
> (enforcement_with_an_agent
> (enforcement_with_agent_a_government
.> (enforcement_with_agent_every_government
.> (enforcement_with_agent_any_government
.> `enforcement with object `any `right with owner an animal´´
and with agent any government´) ) ) )
(`enforcement with object some thing´
> (`enforcement with object some right with owner an animal´
.> (`enforcement with object any `right with owner an animal´´
.> `enforcement with object `any `right with owner an animal´´
and with agent any government´) ) );
The use of a unique relation type such as ".>" is handy for certain presentation purposes but more specialized relation types can be used, e.g., the pm#statement_directly_using_this_term relation type to connect the type
`enforcement with object `any `right with owner an animal´´ and with agent any government´
to the statement
`any `enforcement with object `any `right with owner an animal´´´ should have for agent any government´.
Which relation type is used for presentation purposes depends on presentation options.
For organization purposes, the most important and easiest-to-generate categories in these examples are pm#right_with_an_owner, pm#enforcement_with_an_agent and `enforcement with object some thing´. Indeed, such categories are based on basic types of relations, permit to partition the specializations of a type, and subtyping them is only needed if there are many specializations.
After discussing various topic hierarchies (subject hierarchies) for document classification purposes (e.g., the Dewey Decimal System, LC, COLON, Bliss classifications, thesauri, subject headings, taxonomies), [Welty & Jenkins, 1999] concludes that placing a topic (subject area) into a topic hierarchy is quite arbitrary (in the sense that it depends on each person's goals, knowledge and preferences) and that the subtopic relation of such hierarchies is not a specialization relation (and is not even always transitive) but can be seen as a union of various relations. To bring a bit more determinism to the manual process of classifying documents (or other things such as persons' interests or competencies) and/or their related topics while still only relying on relations between topics, i.e., without representing (some of) the statements that the books contains or that the topics covers, Welty & Jenkins propose to replace the classic (and rather meaningless) subtopic relation by a "located-under" relation based on five mereo-topological relations that should be distinguished: "contains" (a transitive and specialized kind of "partOf" relation between topics), "overlaps", "borders", "near" and "far". They give axioms to relate these six relations.
Welty & Jenkins see topics as individuals. Indeed, relations between "shoe" and "heel" or "shoe-making" are clearly not subtype relations. However, Welty and Jenkins but do not precise the nature of these individuals. The MSO categorizes topics as "fields of study" (e.g., such as Physics) and, more generally, as particular kinds of sets (like theories and ontologies) which include i) categories for processes and things related to them, and ii) descriptions associated to these categories, processes and other things. Indeed, with such an interpretation, if these statements were represented it would be possible to derive the "contains" relations from the inclusion relationship between the sets and the specialization relations between the things contained in those sets. As a reference to this mix of partOf and specialization, in the MSO the "contains" relation (which has for identifier pm#subdomain) has for abbreviation ">part".
subdomain___true_subdomain (domain, domain) abbreviation: ">part", < subdomain_or_equal part; //extended_specialization //this was already stated as a subtype in the previous subsection
Most modern representations for topic hierarchies do not specify whether a topic is an individual or a type. For example, SKOS (Simple Knowledge Organization System [www-SKOS, 2009]), the ontology recommended by the W3C "to represent thesauri, classification schemes, subject heading systems and taxonomies within the framework of the Semantic Web" defines the type skos#Concept (topics must be instances of that type but nothing prevents topics to be themselves types) and relations such as skos#broader and skos#related which connect instances of skos#Concept. Another example is the "Semantic Web Topics Ontology" of ISWC 2006 [www-SWTO 2006] which defines and uses relations such as "topic_subtopic", "topic_requires", "topic_relatedTo" and "topic_relatedProjects" between instances of types "Topic", "Technique" and "Project".
In any case, representing topics and subtopic relations between them – instead of representing relations between processes and the things they use or that used them – is detrimental for knowledge sharing and retrieval (KS&R). First, because it leads to partially 'duplicate' the representations of those processes, things and relations into those of topics and their relations (these two kinds of representations have similar names but are unconnected). Second, the subtopic relation is semantically extremely poor and hence placing a topic into a topic hierarchy is quite arbitrary and cannot be much checked or guided – the slightly more restricted "contains" relation of Welty & Jenkins does not really change this point.
To sum up, since "true" fields of study (e.g., Physics) – or subkinds of them (e.g., Knowledge Engineering which may be seen as a mere set of processes) – exist, they should be represented as such but, for KS&R purposes, other things should not be represented as topics or fields of study (in the MSO, the term for such "areas of interest" is pm#domain). A relation such as pm#domain_object is needed to connect a domain to the things it includes (relation types such as "topic_relatedProjects" are not needed). Since (descriptions of) processes permit to relate most of those things and since specialization and subprocess relations between process categories are far less arbitrary than subtopic relations, the backbone for the representation of the content of a domain should be an ontology of processes in this domain, and it is not necessary (and hence, to avoid 'duplications', not advised to) relate a domain category to more than a few top-level process categories ("top-level" according to the specialization and subprocess relations).
However, what to do to integrate existing ontologies that use topics? For example, how to extend the MSO with the ReSIST ontology [www-ReSIST, 2006]? This ontology describes top-level concepts related to dependable and secure computing by reusing the Reference Ontology of the AKT project [www-AKT, 2006]. AKT is a very well funded Semantic Web project aimed to ease knowledge acquisition and retrieval but its Reference Ontology is small (a hundred of categories), not normalized and includes types for some fields of computer sciences and permits to reuse those of the ACM (Association for Computing Machinery). In accordance with the principles described in this document, to integrate the ReSIST ontology, the MSO i) integrates the (useful part of the) AKT Reference Ontology, ii) represents the ACM thesauri (about 1500 categories), iii) integrates the ReSIST ontology while only correcting the semantically incorrect representations (e.g., the MSO represents the fields of research as individuals instead of types and replaced their subtype relations by pm#domain relations), and iv) generalizes most of the new types from ReSIST, AKT or ACM with existing types in the MSO, essentially types from the WordNet ontology. Regarding the categories from ReSIST, the result is described in the MSO input file titled "The ReSIST ontology" [www-ReSIST-in-FL, 2008]. The following table gives a typology of domain-related concept/relation types that were used for these three integrations. (Note: the relation types that are instances of pm#non-directly_usable_type cannot be used with directly in relation nodes, only their subtypes can).
domain < non-spatial_collection___collection_of_categories_or_statements,
> wn#field_of_study wn#theory,
= aktp#Generic-Area-Of-Interest, //aktp: the AKT Portal part of the AKT Reference Ontology
closed_exclusion: thing_that_is_not_a_domain___object,
domain_object: wn#knowledge_domain;
domain_related_relation___domain-related-thing .(?,?)
kind: non-directly_usable_type,
< relation_playing_a_special_role,
> {(relation_from_a_domain relation_from_a_thing_that_is_not_a_domain)}
{(relation_to_a_domain relation_to_a_thing_that_is_not_a_domain)};
relation_from_a_domain___thing_in_domain .(domain, ?) kind: non-directly_usable_type,
> subdomain domain_object;
//subdomain___true_subdomain .(domain, domain); //already stated above
domain_object .(domain, thing_that_is_not_a_domain) < member object_relation;
relation_from_a_thing_that_is_not_a_domain .(thing_that_is_not_a_domain, ?)
name: "thing_in_domain_of_object",
kind: non-directly_usable_type,
> object_subdomain object_in_domain_of_object;
object_subdomain___subdomain_of_object .(thing_that_is_not_a_domain, domain)
inverse: domain_object;
object_in_domain_of_object .(thing_that_is_not_a_domain, thing_that_is_not_a_domain)
kind: non-directly_usable_type;
relation_to_a_domain___sub-domain_or_object-subdomain .(?, domain)
kind: non-directly_usable_type,
> subdomain subdomain_of_object;
relation_to_a_thing_that_is_not_a_domain .(?, thing_that_is_not_a_domain)
name: "domain-related-object",
kind: non-directly_usable_type,
> domain_object object_in_domain;
|
WordNet is nowadays commonly used as an ontology: its synsets are directly interpreted as conceptual categories, even though types and individuals are not distinguished. The integration of WordNet into the MSO included the transformation of the noun-related part of WordNet into a genuine "lexical ontology" by removing internal inconsistencies, giving short intuitive identifiers to the categories and making the distinction between types and individuals (about 3000 individuals were isolated). This meant distinguishing the hyponym relationship (the main specialization relationship in WordNet) into subtype, instance and subdomain relations. These last relation types were declared as subtypes of pm#hyponym for the change to be a refinement of WordNet and hence minimize the number of changes inconsistent with the source.
WordNet uses hyponym relations as well as part relations between its categories for fields of studies. These part relations were kept and these hyponym relations were transformed into subdomain relations. More precisely, the very general kinds of fields of studies – those with annotations beginning by "any of the branches of ..." – have been represented as types but their direct subdomains (which have annotations beginning by "the discipline ...") have been represented as their instances. From these instances, most of the other categories for fields of studies (with annotations beginning by "the branch of ...") are reachable via "subdomain" relations.
The next two examples shows direct relations from wn#field_of_study and wn#computer_science. These relations come from WordNet or the MSO input file titled "Fields of Study". (All MSO input files are accessible from http://www.webkb.org/kb/). The source/user/category `is´ represents "Information Sciences". A term such as pm/is#software_engineering_science is a term created by `pm´ but that "belongs to" (or "has for source" or "has been stated by") `is´ according to `pm´. Given the previously stated presentation/parsing indication that `pm´ prefixes are not shown, all the occurrences of `pm/is´ are abbreviated into `is´.
wn#field_of_study___discipline___subject_area___subject___subject_field___field___study
name: "bailiwick" "branch_of_knowledge", //these relations are implicitly from `wn´
annotation: 'a branch of knowledge; "in what discipline is his doctorate?";
"teachers should be well trained in their subject";
"anthropology is the study of human beings"',
//< pm#domain __[pm], //already stated
> wn#major.field_of_study wn#frontier.field_of_study wn#ology
wn#scientific_discipline wn#humanistic_discipline
aktp#Research-Area __[aktp] aktp#Business-Area __[aktp],
instance: wn#allometry wn#bibliotics wn#architecture.field_of_study #numerology
#engineering_science wn#theology.field_of_study wn#military_science;
wn#computer_science___computational_science
annotation: "the branch of engineering science that studies
computable processes and structures",
subdomain: wn#artificial_intelligence
is#software_engineering_science __[is] is#database_management_science __[is]
acm#field_of_study_in_the_ACM_classification __[is],
subdomain of: wn#engineering_science,
part: wn#information_theory,
part of: wn#information_science;
Below are two tables respectively showing extracts from the beginning of the representations in the MSO input file titled "Classification of the ACM" [www-ACM-in-FL, 2008] and then from the beginning of the MSO input file titled "The ReSIST ontology" [www-ReSIST-in-FL, 2008]. The first table shows that in addition to representing the ACM classification, the MSO structured it and also indexed it with WordNet categories. To ease the understanding of who created what, the prefix "pm#" is often used even though it could have been left implicit.
pm#field_of_study_in_the_ACM_classification
kind: wn#field_of_study,
subdomain of: wn#computer_science,
subdomain: pm#ACM_domains_from_the_hierarchy_with_the_A_to_K_categories_at_the_top
pm#ACM_domains_from_an_hierarchy_with_general_terms_at_the_top;
pm#ACM_domain_from_the_hierarchy_with_the_A_to_K_categories_at_the_top
definition: "union of the classic ACM hierarchies with the A to K
categories at the top, e.g., the 1998 ACM hierarchy"
subdomain: acm#A acm#B acm#C acm#D acm#E acm#F acm#G acm#H acm#I acm#J acm#K;
pm#ACM_domains_from_an_hierarchy_with_general_terms_at_the_top
annotation: "hierarchy based on 41 general terms, including the officially
listed 16 general terms",
subdomain: pm#ACM_situation_related_domain pm#ACM_entity_related_domain;
pm#ACM_situation_related_domain
subdomain: pm#ACM_state_related_domain pm#ACM_process_related_domain;
pm#ACM_state_related_domain
subdomain: acm#Security;
acm#Security domain_object: wn#security __[main],
subdomain: acm#Reliability;
acm#Reliability core_domain_object: wn#dependability,
subdomain: acm#Verification; //developed later in the original file
pm#ACM_process_related_domain
subdomain: pm#Design pm#Experimentation pm#Management pm#Measurement
pm#Performance pm#Standardization
pm#Verification //note: a subdomain of it is pm#Reliability
pm#ACM_Recovery pm#ACM_method pm#ACM_simulation
pm#ACM_evaluation pm#ACM_development pm#ACM_optimization
pm#ACM_analysis pm#ACM_synthesis pm#ACM_communication
pm#ACM_education pm#ACM_application;
pm#Design domain_object: wn#designing wn#design,
subdomain: pm#ACM_Redundant_design pm#ACM_model;
pm#ACM_Redundant_design
subdomain: acm#B.1.3.2 acm#B.2.3.2 acm#B.3.4.2 acm#B.4.5.4
acm#B.5.3.2 acm#B.6.2.2 acm#B.7.3.2;
//These domains are developed later in the original file; examples:
// acm#B.1.3.2__Hardware_Control_Structure_Redundant_design
// acm#B.2.3.2__Hardware_ARITHMETIC_AND_LOGIC_STRUCTURES_Redundant_design
pm#ACM_model
subdomain: acm#B.1.2.1 acm#B.3.3.0 acm#B.4.4.0 acm#C.0.2
acm#C.2.0.1 acm#C.4.3 acm#D.2.4.4 acm#D.2.9.6
acm#D.2.13.2 acm#D.4.8.1 acm#E.4.2 acm#F.1.1
acm#F.3.2.4 acm#F.4.1.6 acm#H.1 acm#H.2.1.0
acm#H.3.3.4 acm#H.5.5.1 acm#I.2.7.2 acm#I.2.10.3
acm#I.3.5 acm#I.5.1 acm#I.6;
|
resist#ReSIST_ontology < ontology,
part: resist#Dependability-And-Security resist#Entity;
resist#Dependability-And-Security
kind: aktp#Research-Area __[pm],
domain_object: pm#dependency_or_information_security_attribute __[pm]
pm#dependency_or_information_security_related_process __[pm];
pm#dependency_or_information_security_attribute
> resist#Dependability __[pm] resist#Security __[pm],
< pm#security_attribute;
resist#Dependability___High-Confidence___Survivability
definition: "ability to avoid service failures that
are more frequent and more severe than is acceptable",
> resist#Dependence resist#Trust resist#Attribute-Of-Dependable-Systems,
< wn#dependability __[pm];
|
Systems of logics can be seen as "fields of studies" or as "theories". Given the selected interpretation of "fields of studies", there are little differences. As above illustrated, wn#field_of_study and wn#theory were represented as subtypes of pm#domain. Thus, the pm#subdomain relation can apply to them. The next table shows examples. The source/user/category `km´ represents the "Knowledge Management" domain. `pm/km' is used for `km´ concepts as interpreted and represented by `pm´. Thus, '__[pm/km]' is an abbreviation for '__[pm, source: km]'. The represented categories (and most of the relations that were associated to them) are informally described in the Wikipedia pages indicated for these categories. The represented relations offer a precise, structured and browsable presentation of some of the informal descriptions found in these Wikipedia pages or in the Wikipedia taxonomy (which is not semantically structured and sometimes contains cycles).
//from now on, `pm/km´ is the first default creator in all given examples
// except in relation nodes where `pm´ stays the first default creator
// (new parsing directives will be specified in Chapter 3)
[_ parsing][(pm#formal_term pm#part of: pm#concept_node) pm#default_creator: pm/km pm wn];
[_ parsing][pm#term_being_declared pm#default_creator: pm/km];
[_ parsing][pm#new_term pm#default_creator: pm/km];
[_ pm#kind: pm#parsing_macro][pm#statement pm#default_creator: pm/km];
semantics_of_logic
subdomain of: wn#semantics,
subdomain: model-theoretic_semantics proof-theoretic_semantics,
url: http://en.wikipedia.org/wiki/Semantics_of_logic;
wn#logic.philosophy
annotation: "the branch of philosophy that analyzes inference"
subdomain: inductive_reasoning_system deductive_reasoning_system
binary_logic multi-valued_logic
informal_logic formal_logic,
url: http://en.wikipedia.org/wiki/Logic;
formal_logic
subdomain: philosophical_logic mathematical_logic
{(non-modal_logic modal_logic)}
{(classical_logic non-classical_logic)}
type_theory term_logic dialectical_logic,
url: http://en.wikipedia.org/wiki/Formal_logic;
non-classical_logic___intuitionist_logic
subdomain: substructural_logic paraconsistent_logic natural_deduction
intuitionistic_logic;
substructural_logic
annotation: "one system of propositional calculus that is
weaker than the conventional one",
subdomain: relevance_logic linear_logic,
url: http://en.wikipedia.org/wiki/Intuitionistic_logic;
relevance_logic
annotation: "a kind of system developed to avoid the paradoxes
of material and strict implication",
url: http://plato.stanford.edu/entries/logic-relevance/;
|
Indexing and organizing KM resources (techniques, projects, conferences, ...) is the focus of several projects. The simplest ones use static Web pages, e.g., Peter Clark's Web page titled "Some Ongoing KBS/Ontology Projects and Groups" [Clark, 2008]. There exists several wikis related to this subject (e.g., semanticweb.org) and Wikipedia also has many pages related to it. On the other hand, there is no directory about or close to this subject in Yahoo's Directory nor in the Open Directory Project (Google's Topics Directory) [www-ODP, 2009]. The ACM thesaurus has about 90 categories related to "Artificial Intelligence" (a good part of which are related to KM, e.g. acm#I.2.6.4__Knowledge_acquisition and acm#I.2.4__Knowledge_Representation_Formalisms_and_Methods) and about 50 categories which are related to "information interfaces and presentation" and hence also to KM. The KA2 project [Benjamins et al., 1998] proposed a predefined ontology for some KM domains, asked KM researchers to register their Web pages and annotate some of the content using categories from this ontology, and automatically imported this indexed content into a Web-accessible knowledge base. However, the main part of this ontology was composed of a hierarchy of only 37 Knowledge Acquisition (KA) domains (the names of which also alluded to tasks; thousands of well-organized KM process categories would have actually been needed to permit a useful indexation of the works of researchers). E.g., this hierarchy included:
reuse_in_KA > ontologies problem_solving_methods;
problem_solving_methods > Sysiphus-III_experiment;
Despite the large publicity that surrounded this project and the good reception it received, few researchers actually indexed their Web page content. The reason may be that even simple indexations are not something that researchers are actually prepared to do but a more probable reason is that the approach was unscalable and hence quite discouraging for the users. Indeed,
The "Semantic Web (SW) Topics Ontology" of ISWC 2006 [www-SWTO, 2006] has about 250 categories about SW related topics, techniques and projects but it too is unscalable and an inefficient support for Information retrieval (IR), even for document indexation, since it is static, arbitrary (there is no "right place" to find or insert a new concept) and does not follow knowledge representation/sharing best practices.
Ontopedia [www-Ontopedia, 2009] is a project whose purpose is to represent and relate subjects of information and knowledge management using Topic maps. As noted in the Web page of the Ontopedia server, only "a small number of subject identifiers" have so far been entered.
An efficient support for IR in KM would need to have an initial well organized core of hundreds of categories, be updatable in a scalable way by people and, as above noted, have an ontology of processes for backbone. Then, the question becomes: how to collaboratively organize processes and their related elements in a scalable and easy-to-retrieve way?
The next subsection organizes types of description objects used by KM processes.
The type pm#description_content/medium/container and its rationales have been introduced in Paragraph 2.1.1.13. Reminder:
pm#description_content/medium/container
> {(pm#description pm#description_container)};
The type pm#description has also been introduced. Below are more details. pm#data is defined as any pm#description that is not a pm#formal_or_semi-formal_well-formed_statement. Informal statements (e.g., natural language sentences), informal terms and numbers are data.
From now on in this document, a category identifier is in bold characters when specialization relations from the category are presented in a subsequent table, i.e., when the category is really presented. For emphasis purposes, bold italic characters are used, not plain bold characters.
pm#description___information
annotation: "description (content/medium) of an entity or a situation",
> pm#description_content pm#description_medium
{(pm#data pm#formal_or_semi-formal_well-formed_statement___knowledge)};
Document elements, the units of information considered by structured document editors and hypertext systems, are description containers:
pm#document_element___document___DE
< (pm#description_container
annotation: "e.g., file, image, ... but not a disk or a piece of paper"),
definition: "part of a document or whole document",
> wn#document;
Words such as "models" and "statements" refer to concepts about description content and to other concepts about description mediums. To avoid representing both interpretations and hence avoid 'duplications', since the description medium interpretations seem to have more "interesting to represent" relations with other description medium categories than the description content interpretations, only the description medium interpretations have so far been represented in the MSO. This is not a very important modeling choice since there is no exclusion relation between pm#description_content and pm#description_medium but this prepares the work for a future formal distinction of the two aspects using more specialized exclusive categories. The next tables show important description medium types in Information Sciences and Knowledge Management.
pm#description_medium
annotation: "e.g., a syntax, a language, a script, a structure, a term",
> {(pm#non-symbolic_description_medium pm#symbolic_description_medium)},
wn#structure
sumo#content_bearing_object; // > (sumo#linguistic_expression > sumo#language)
pm#non-symbolic_description_medium > pm#connexionist_structure;
pm#symbolic_description_medium
> {(pm#atomic_symbolic_description_medium pm#divisible_symbolic_description_medium)}
{(pm#fully_formal_description_medium pm#non-fully_formal_description_medium}
pm#abstract_data_type wn#symbolic_representation wn#language_unit;
pm#atomic_symbolic_description_medium
> pm#term
(pm#atomic_abstract_data_type < pm#abstract_data_type,
> sumo#number pm#boolean);
pm#divisible_symbolic_description_medium
< pm#non-spatial_collection,
> (pm#structured_abstract_data_type < pm#abstract_data_type,
> sumo#list pm#array pm#queue pm#stack pm#keyed_collection_ADT
pm#graph_ADT wn#lattice pm#number_container xmls#block_set
owl#data_range)
{(pm#fully_formal_structure pm#non-fully_formal_structure)}
pm#statement;
pm#graph_ADT
> pm#graph_ADT_storing_knowledge-representations;
pm#fully_formal_description_medium
> {(pm#formal_term pm#fully_formal_structure)};
pm#fully_formal_structure
definition: "mathematically or logically defined structure",
> pm#well-formed_formal_statement;
pm#non-fully_formal_description_medium
> {(pm#informal_term pm#informal_or_partially_formal_structure)};
wn#symbolic_representation
> (pm#knowledge_representation
> {pm#formal_term pm#formal_or_semi-formal_well-formed_statement});
|
pm#term
definition: "sign (character string, icon, sound, ...) that has been given at least one
meaning, and hence that refers to one or several types or individuals",
> {(pm#informal_term pm#formal_term)}) wn#symbol;
pm#informal_term //already stated: < pm#non-fully_formal_description_medium,
definition: "term without known creator or having several meanings for its creator",
> term_from_informal_hierarchy_of_terms_such_as_the_DMOZ_topic_hierarchy
less pm#constraint_on_meaning than: pm#formal_term __[any<->any];
pm#formal_term //already stated: < pm#fully_formal_description_medium,
description of: 1 pm#thing __[1..*<-any],
|
pm#statement
> {(pm#semantically-atomic_statement pm#non-semantically-atomic_statement)}
{(pm#statement_having_a_meaning_for_at_least_someone
pm#statement_having_a_meaning_for_nobody)}
{( (pm#assertable_statement
> {( (pm#Definition pm#manually_set_corresponding_relation_type: pm#definition)
(pm#belief
> (pm#Annotation pm#manually_set_corresponding_relation_type: pm#annotation)
pm#observation pm#preference
(pm#interpretation > pm#deduction pm#assumption)
(task_description description of: task __[*<->*])
(domain_description description of: domain __[*<->*])
)
)} )
(pm#non-assertable_statement > pm#query) )},
member: 1..* term 0..* statement;
//statements are particular collections of terms and, possibly, of sub-statements
pm#semantically-atomic_statement
> {pm#OR-set_of_statements pm#XOR-set_of_statements
pm#universally_quantified_statements pm#definition
pm#statement_with_a_restrictive_context}
pm#semantically-atomic_formal_or_semi-formal_statement;
pm#non-semantically-atomic_statement
> (pm#AND-set_of_statements_without_restrictive_context_on_the_set
> (pm#PCEF_statement___positive_conjunctive_existential_formula
wn#expressiveness=> some PCEF_logic
) ); //a pm#PCEF_statement is an AND-set of relations (atomic statements)
|
pm#statement_having_a_meaning_for_at_least_someone
definition: "set of quantified terms having at least one logical or semantic meaning",
member: 1..* pm#term,
pm#description_content: 1..* pm#description_content __[*<-any]),
> {pm#well-formed_formal_statement pm#partially_formal_statement_with_some_meaning}
pm#AND-set_of_statements_having_a_meaning_and_with_or_without_context_on_the_set
{(pm#semantically-atomic_statement_with_some_meaning
pm#non-semantically-atomic_statement_with_some_meaning)}
pm#formal_or_semi-formal_well-formed_statement;
pm#formal_or_semi-formal_well-formed_statement___knowledge___knowledge_statement
> pm#semantically-atomic_formal_or_semi-formal_statement
pm#well-formed_formal_statement;
pm#well-formed_formal_statement //already stated: < pm#fully_formal_structure,
< pm#formal_or_semi-formal_well-formed_statement,
definition: "statement using only formal terms and with a formal grammar
having a logic-based interpretation",
member=> 1..* pm#formal_term, //"only formal terms"
> rdf#statement pm#PCEF_statement;
pm#partially_formal_statement_with_some_meaning
> {pm#fully_informal_statement_with_some_meaning
pm#semi-formal_statement_with_some_meaning},
less pm#constraint_on_meaning than: pm#well-formed_formal_statement __[any<->any];
pm#fully_informal_statement_with_some_meaning
definition: "statement not using formal terms but having a
logical or semantic meaning for at least someone",
member: 0 pm#formal_term __[0<-?],
less pm#constraint_on_meaning than:
pm#semi-formal_statement_with_some_meaning; __[any<->any];
pm#semi-formal_statement_with_some_meaning
< pm#formal_or_semi-formal_well-formed_statement,
definition: "statement that would be formal if it did not use at least
one informal term",
member: 1..* pm#formal_term 1..* pm#informal_term,
> pm#non-semantically-atomic_semi-formal_statement;
|
pm#AND-set_of_statements_having_a_meaning_and_with_or_without_context_on_the_set
> {pm#AND-set_of_formal_or_semi-formal_well-formed_statements
pm#AND-set_mostly_composed_of_formal_or_semi-formal_well-formed_statements
pm#AND-set_of_mostly_informal_statements_having_a_meaning_for_at_least_someone}
model,
member: 1..* pm#statement_having_a_meaning_for_at_least_someone;
pm#AND-set_of_formal_or_semi-formal_well-formed_statements
> ontology //default creator for concept nodes: `pm/km´
pm#AND-set_of_statements_in_FS;
ontology //this is the interpretation of an "ontology" by `pm´
annotation: "an ontology is a set of formal terms which can be considered
as a pm#non-semantically-atomic_statement' statement asserting that
i) the terms are formal terms, and that
ii) they have certain associated statements (axioms/definitions)",
member: 1..* pm#formal_term;
pm#AND-set_mostly_composed_of_formal_or_semi-formal_well-formed_statements
> pm#AND-set_only_composed_of_formal_well-formed_statements
KB_or_KBS,
member<= most pm#formal_or_semi-formal_well-formed_statements;
pm#AND-set_only_composed_of_formal_well-formed_statements
> fully_formal_knowledge_base
(pm#AND-set_of_formal_well-formed_statements_in_FS
< pm#AND-set_of_statements_in_FS,
member: 1..* pm#well-formed_formal_statement_in_FS),
member: 1..* (pm#well-formed_formal_statement
> (pm#well-formed_formal_statement_in_FS pm#language: a pm#FS) );
KB_or_KBS
> (knowledge_base //reminder: according to `pm´ and for KM
> fully_formal_knowledge_base knowledge_base_at_least_minimally_organized,
subset_or_equal<= 1 ontology;
knowledge_base_at_least_minimally_organized
annotation: "KB where each term and each statement has been manually
– or can be automatically – connected to a formal term or statement
by a logic-based generalization relation of some kind",
member: 1..* (pm#knowledge_representation_with_generalization
< pm#knowledge_representation,
pm#extended_specialization of<=
1..* pm#knowledge_representation))
(knowledge_base_system
subset: 1..* knowledge_base);
pm#AND-set_of_mostly_informal_statements_having_a_meaning_for_at_least_someone
member<= most pm#fully_informal_statement_with_some_meaning;
model
> {conceptual_model_for_a_KB_or_KBS design_model_for_a_KB_or_KBS}
{generic_model instantiated_model}
{task_description domain_description},
subset_or_equal: task_description __[*<->*] domain_description __[*<->*];
|
The next table represents some supports of description in the Knowledge Acquisition methodology KADS (referred to via the source/user `kads´). They are referred to in some tables of the next subsection.
description_medium_in_KADS
< pm#description_medium,
> description_with_KADS_inference_structure kads#role;
description_with_KADS_inference_structure
definition: "dataflow graph of 'inferences' (tasks)
the inputs/outputs of the inferences are described by 'roles'",
< structured_abstract_data_type,
description of: 1..* kads#inference,
part: 1..* kads#role;
kads#role
> kads#hypothesis kads#observable kads#finding kads#complaint kads#norm
kads#difference kads#discrepancy_class kads#diagnosis_result
kads#parameter kads#system_model kads#historical_data;
|
knowledge_management
acronym: "KM",
definition: "set of processes aimed to design, maintain or exploit a `KB´ or `KBS´",
part of: wn#information_science,
subdomain of: wn#engineering_science,
subdomain: knowledge_engineering,
domain_object: creating_or_exploiting_knowledge __[.<->any];
knowledge_engineering
definition: "part of `knowledge_management´ aimed to
design or maintain a `KB´ or `KBS´",
subdomain: knowledge_acquisition;
creating_or_exploiting_knowledge
< is#information_sciences_process,
> {creating_or_selecting_knowledge_from_data_or_knowledge
associating_data_or_new_knowledge_to_knowledge}
language/structure_specific_knowledge-based_process
creating_or_exploiting_knowledge_for_a_non-knowledge-engineering-goal,
object: KM_structure __[.<->any];
creating_or_selecting_knowledge_from_data_or_knowledge
< process_creating_copies/versions/selections_of_its_main_inputs,
> knowledge_search/retrieval knowledge_extraction/modeling/representation
knowledge_importing/exporting/translation
knowledge_distribution/sharing/merging
knowledge_inference/reasoning/generation knowledge_classification
(storing_knowledge_assertions_or_queries_in_a_document
subprocess: knowledge_extraction/modeling/representation);
associating_data_or_knowledge_to_knowledge
< process_associating_values_to_its_main_inputs,
> knowledge_comparison knowledge_annotation knowledge_based_indexation
knowledge_evaluation;
knowledge_evaluation
> knowledge_validation;
knowledge_based_indexation/annotation .(pm#input: *x, pm#output: *y)
< wn#indexing,
input: pm#description_content/medium/container *x,
output: pm#knowledge_representation *y,
subprocess: knowledge_extraction/modeling/representation .(*x, *y);
language/structure_specific_knowledge-based_process
< process_with_a_main_input/object/output_of_a_particular_kind,
> {Formal_Concept_Analysis_specific_process Conceptual_Graph_specific_process};
creating_or_exploiting_knowledge_for_a_non-knowledge-engineering-goal
< process_that_is_input_or_output_of_another_process,
> teaching_knowledge_management;
|
Table 2.1.4.1 represents and relates four domains: Knowledge Management, Information Sciences, Knowledge Engineering and Knowledge Acquisition. The supertype of all processes in these domains is pm#creating_or_exploiting_knowledge, a subtype of is#information_sciences_process. Each of these domains may also be seen as a collection of processes: a representation as a domain is compatible with that interpretation, e.g., pm#domain_object is subtype of pm#member. The direct subtypes of pm#creating_or_exploiting_knowledge specialize the "distinctions among processes based on their inputs/outputs" presented in Subsection 3.1.4 and thus propose three (intuitive and easy to distinguish) kinds of decomposition, two of which being "according to goals" and "according to languages or data structures". The following sections of this chapter introduce specializations for these subtypes. As in the previous subsections, bold characters are used for the identifiers of categories that are specialized in a subsequent table.
The next tables represent Knowledge Acquisition (KA) and top-level processes in it. In the description of common KA methodologies (e.g., KADS [Breuker & van de Velde, 1994]) such processes are not explicitly represented and organized. Furthermore, the subprocess relations are here distinguished from specialization relations. If a process type "X" specializes a process type "Y" then "X is one particular way to do Y"; WordNet uses such patterns for the classification of types of processes and, in WordNet terminology, the "troponym" relation is the "hyponym" (specialization) relation between types of processes. If a process "X" is a subprocess "Y" then "X is a part of Y" and, generally, "the duration of X is a strict part of the duration of Y". Making a choice between a specialization relation or a subprocess relation is not always easy. For example, should kads#analysis_in_knowledge_acquisition_with_KADS be the source node of subprocess relations or subtype relations to kads#environment_analysis_with_KADS and kads#problem_analysis_with_KADS ? In the MSO, subprocess relations have been used. This permits to see the source node of these relations as the sum of all analysis tasks in KADS rather than a supertype for each of them.
KADS refers to tasks rather than processes and distinguishes tasks from goals, problem definitions and problem solving methods (PSMs). A long analysis of their differences and relationships can be found in [Breuker, 1994]. A one-line summary of this analysis could be: "a task permits to solve a well defined problem by applying PSMs". However, this long analysis does not fully permit the reader to ascertain what tasks and PSMs really are with respect to classic distinctions: are they
Because representing them as descriptions would lead to either complicated representations or a duplication of many of the basic relations associated to processes, the MSO represents pm/km#task and pm/km#problem_solving_method as subtypes of pm#problem_solving_process (on the other hand, the "inference structures" of PSMs clearly are descriptions). This has the additional advantage of not requiring many duplications between types for tasks and types for PSMs.
knowledge_acquisition
acronym: "KA",
definition: "part of knowledge engineering aimed to design a `KB´ or `KBS´",
domain_object: designing_a_KB_or_KBS;
designing_a_KB_or_KBS
> {KBS_design_via_quick_prototyping model_directed_design_of_KB_or_KBS}
analysis_in_knowledge_acquisition
designing_a_KB_or_KBS_with_a_particular_methodology,
subprocess: collecting_data_for_building_a_KB_or_KBS;
collecting_data_for_building_a_KB_or_KBS
input: knowledge_source environment_of_system_to_build,
output: data_for_building_a_KB_or_KBS;
KBS_design_via_quick_prototyping
input: data_for_building_a_KB_or_KBS,
output: executable_KB_or_KBS;
|
model_directed_design_of_KB_or_KBS
subprocess: building_a_conceptual_model_for_a_KB_or_KBS
building_a_design_model_for_a_KB_or_KBS
operationalizing_a_KB_or_KBS_from_its_design_model,
input: data_for_building_a_KB_or_KBS,
output: KB_or_KBS;
building_a_conceptual_model_for_a_KB_or_KBS < knowledge_modeling,
subprocess: compose_and_instantiate_generic_task_models
compose_and_instantiate_task_independent_models
map_instantiated_task_model_and_task_independent_model,
input_material: data_for_building_a_KB_or_KBS
input_parameter: knowledge_modeling_principle,
input-output: generic_conceptual_model_for_a_KB_or_KBS,
output: instantiated_conceptual_model_for_a_KB_or_KBS;
building_a_design_model_for_a_KB_or_KBS
input: data_for_building_a_KB_or_KBS,
input_parameter: KB_design_principle,
output: design_model_for_a_KB_or_KBS;
operationalizing_a_KB_or_KBS_from_its_design_model
input: design_model_for_a_KB_or_KBS,
output: KB_or_KBS;
|
kads#knowledge_acquisition_with_KADS
< designing_a_KB_or_KBS_with_a_particular_methodology,
subprocess: kads#real_life_process kads#generic_process;
kads#real_life_process_in_knowledge_acquisition
> kads#analysis_in_knowledge_acquisition_with_KADS knowledge_engineering,
description: (kads#real_life_process_description part: kads#generic_model);
kads#analysis_in_knowledge_acquisition_with_KADS
< analysis_in_knowledge_acquisition,
subprocess: {kads#environment_analysis_with_KADS kads#problem_analysis_with_KADS
kads#task_analysis_with_KADS kads#function_analysis_with_KADS
kads#implementation_analysis_with_KADS};
kads#generic_process
> {kads#problem_solving_method kads#problem_solving_task kads#inference},
description: (kads#generic_model < generic_model);
kads#problem_solving_method___PSM
< (problem_solving_method___PSM < pm#method pm#problem_solving_process),
input: kads#problem_solving_task,
output: inference_structure;
|
kads#problem_solving_task___PST___generic_PST < task problem_solving_process,
> {(kads#non-primitive_PST kads#primitive_PST)}
kads#system_synthesis kads#system_modification kads#system_analysis,
subprocess: kads#inference,
input: kads#role, //subtypes given in the previous subsection
output: kads#role;
kads#non-primitive_PST
subprocess: 0..* kads#problem_solving_task;
kads#primitive_PST
> kads#assignment kads#planning_and_reconstruction kads#modeling
kads#design kads#modeling kads#prediction kads#assessment
kads#correction kads#monitoring kads#diagnostic kads#repair,
subprocess: kads#inference;
/* //Relations of succession between KADS primitive PSTs:
kads#assignment
succ of: (planning_and_reconstruction succ of: modeling)
(design succ of: modeling)
succ: (prediction succ: (assessment succ: correction)
(prediction succ: (monitoring succ:
(diagnostic succ: repair)))
);
*/
kads#system_synthesis < creating_knowledge,
comment: "operation without known system model"
"multi-dimensional and generally involves 'space'",
> kads#modeling kads#design kads#planning/reconstruction;
kads#system_modification < modifying_knowledge,
> kads#assignment;
kads#assignment
> kads#scheduling kads#configuration;
kads#system_analysis
< exploiting_knowledge_without_modifying_the_main_inputs,
comment: "operation on a known system model",
> kads#prediction kads#monitoring kads#diagnosis kads#assessment;
|
kads#inference___primitive_non-problem-solving_task
> kads#collect kads#generalize kads#specialize kads#compare kads#modify
kads#determine_truth_or_relevance kads#explore_option kads#reduce_working_set;
kads#generalize___finding_a_kind_or_supertype
> (kads#classify___finding_a_kind_or_supertype_in_a_given_type_hierarchy
> kads#abstract_class___finding_a_supertype_in_a_given_type_hierarchy
kads#identify___finding_a_type_from_an_individual)
kads#cluster___discovering_types_from_examples;
kads#specialize___finding_subtypes_a_subtype_or_an_individual
> kads#refine___finding_a_subtype_for_a_type
kads#instantiate___finding_an_individual_for_a_type
kads#select___finding_a_subset_of_things_in_a_set_of_things;
kads#compare
> kads#compare_values kads#match___compare_structures;
kads#modify
> kads#assign___assigning_a_value_to_an_attribute_of_something
kads#valuate___producing_a_concept_regarding_the_structure_of_something
(kads#transform > kads#sort kads#restructure)
kads#assemble
kads#decompose;
kads#determine_truth_or_relevance
> kads#establish kads#cover kads#verify;
kads#explore_option
> kads#make_decision___select_decision kads#propose_solution___generate_solution;
kads#reduce_working_set
> kads#anticipate kads#prefer kads#rule_out;
|
Information sharing approaches can be partitioned according to the kind of information they permit to share (e.g., data versus knowledge) and the kinds of processes they rely on (e.g., "(semi-)independent creation of resources to be indexed and possibly merged" versus "collaborative edition of a same resource"). Table 2.2.1.1 show how such distinctions can be categorized based on the description concept types introduced in Subsection 2.1.3. Since this categorization is rather superficial, it is is mainly i) an illustration or initial guideline for the categorization of knowledge sharing approaches and processes they rely on (e.g., indexation processes), and ii) a way to categorize the advocated approaches with respect to other approaches.
is#information_sharing
< wn#sharing,
> {(is#asynchronous_information_sharing is#synchronous_information_sharing)}
is#information_sharing_based_on_the_creation_and_indexation_of_multi-statement_resources
{is#information_sharing_based_on_the_semi-independent_creation_of_resources
is#information_sharing_based_on_the_collaborative_edition_of_a_same_resource}
{(knowledge_sharing_based_on_at_least_one_formal_or_semi-formal_knowledge_base
is#information_sharing_based_on_at_least_one_non-formal-or-semi-formal_resource)}
knowledge_sharing,
object: 1..* pm#description,
subprocess: is#information_retrieval;
is#information_sharing_based_on_the_creation_and_indexation_of_multi-statement_resources
> {is#information_sharing_based_on_the_indexation_of_mostly_informal_resources
knowledge_sharing_based_on_the_creation_and_indexation_of_mostly_formal_resources}
is#info_sharing_based_on_the_indexation_of_mostly_independently_created_resources,
subprocess: 1..* (is#indexing_of_multi-statement_resources < wn#indexing,
object: 1..* pm#non-semantically-atomic_statement);
is#information_sharing_based_on_the_indexation_of_mostly_informal_resources
subprocess: 1..* (is#indexing_of_mostly_informal_resources < wn#indexing,
object: 1..*
pm#AND-set_of_mostly_informal_statements_having_a_meaning_for_at_least_someone);
knowledge_sharing
< knowledge_distribution/sharing/merging,
> (knowledge_sharing_based_on_at_least_one_formal_or_semi-formal_knowledge_base
> knowledge_sharing_based_on_the_indexation_of_mostly_formal_resources
knowledge_sharing_based_on_a_knowledge_base_for_problem_solving_purposes
sharing_of_a_knowledge_base_that_is_physically_distributed_or_not),
object: 1..* knowledge_base,
subprocess: supporting_knowledge_sharing_between_KBs
supporting_the_valuation_of_knowledge_or_knowledge_authors;
sharing_of_a_knowledge_base_that_is_physically_distributed_or_not
> (sharing_of_a_knowledge_base_at_least_minimally_organized
object: 1 knowledge_base_at_least_minimally_organized)
),
subprocess:
(supporting_loss_less_knowledge_sharing_between_KBs
< supporting_knowledge_sharing_between_KBs,
subprocess: integrating_all_published_information_specified_as_parameter)
(supporting_the_collaborative_building_of_the_KB_specified_as_output
subprocess:
supporting_collaborative_knowledge_editions_in_a_shared_KB
(supporting_a_precise_valuation_of_knowledge_statements_or_authors
< supporting_the_valuation_of_knowledge_or_knowledge_authors)
);
|
Based on these distinctions, the next three subsections argue that almost all current approaches for the sharing of information are not scalable. The argumentation has not yet also been represented or organized in a semi-formal way. However, most of the arguments are directly or indirectly derived from
The type pm#constraint_on_meaning is a subtype of wn#precision, and knowledge precision is important for the scalability of knowledge sharing. Here is what the next three subsections argue for:
sharing_of_a_knowledge_base_at_least_minimally_organized
more wn#scalability than:
is#information_sharing_based_on_the_creation_and_indexation_of_multi-statement_resources
knowledge_sharing_based_on_a_knowledge_base_for_problem_solving_purposes;
After these next three subsections, the three subsequent subsections present an approach for the following kinds of processes:
Only asynchronous information sharing is considered here since its techniques underly (and are more scalable than) those supporting the exchange of information between co-temporal users of a system.
Here are some reminders or precisions on two terms used in the next three sections (and at some other places in this document):
Some metadata related to some resource or created by some person(s) can also be considered as a resource.
The more statements a resource contains, and the more resources there are, the more these resources contain similar and/or complementary pieces of information, and hence the less the metadata for each resource can be useful: queries will return lists of resources that are partially redundant or complementary with each other and that need to be manually searched, compared or aggregated by each user.
The more statements a resource contains, the more its metadata have to be information selective, and hence the less such metadata are representative of the contained pieces of information and the more the indexation methods and usefulness are task/user/domain dependent and somewhat arbitrary.
Finally, the more statements some resources contain, and the less formal the statements are (or the more context-dependent their interpretations are),
For example, the statement "some animal sits above some artifact" is a generalization of both "Tom (a cat) sits on a blue mat" and "any animal sits above some artifact" because all the objects and quantifiers of the first statement are identical or generalize those of the second and third statements (such relations can be automatically inferred if the statements are formal or semi-formal). However, such relations rarely hold between two collections of statements, and especially between any two documents.
Statistical similarity/distance measures between documents, ontologies or metadata, are useful for many purposes [Euzenat et al., 2009] but, like all statistical measures, have no intuitive "semantic" meaning: they are experimentally or mathematically designed to be of some help for some specific kinds of data, tasks or users. For example, Knowledge Zone [Lewen et al., 2006] allows each of its users to i) rate ontologies with numerical or free text values for criteria such as "usage", "coverage", "correctness" and "mappings to other ontologies", ii) rate other users' ratings, and iii) uses all these ratings to retrieve and rank ontologies. This approach compounds several problems:
To sum up, however sophisticated, techniques that index resources are inherently limited in their possibilities and usefulness for information seekers. Furthermore, since they do not provide re-use mechanisms, they force information providers to repeat or re-describe information that is described elsewhere and thus add to the volume of redundant data that information seekers have to sift through. Yet, techniques to index data or people form the bulk of Learning Object retrieval/management techniques and Semantic Web related techniques, for example in the Semantic Learning Web [Stutt & Motta, 2004] and the Educational Semantic Web [Devedzic, 2004]. Although the number and seeming variety of these techniques is important, our definitions permit to categorize most of them in the following few categories (with, in each of the two groups, the listed kinds of techniques or tools being more or less ordered by increasing indexation precision).
Illustration with learning objects (LOs) [www-LO 2009]. Many researchers in (e-)learning focus on techniques to create, index, retrieve or manage LOs (learning materials aimed to be "re-used and combined" for creating a course, or browsed for learning purposes). It is now well accepted by such researchers that the smaller and less contextual the "LOs available for re-use" are, and the more precisely indexed or inter-connected via metadata they are, the more easily they can be semi-automatically retrieved and combined to create i) "LOs to teach with" that are adapted to particular course objectives or kinds of users, and hence ii) "context-dependent LOs" [Downes, 2001] [Hodgins, 2006].
However, in current LO repositories [www-LOR, 2009], a LO is almost never a "semantically meaningful LO", i.e., either a formal term or an "un-decomposable semantic statement" (an instance of pm#semantically-atomic_statement_with_some_meaning; typically, one semantic relation between semantically meaningful LOs, with some formal information about the context of this relation, such as its creator and temporal, spatial or modal constraints on its validity). For example, a typical LO about Java is an "Introduction to Java" listing some features of Java and giving an example of code, instead of being a relation between Java and one of its features. Since LOs are not atomic knowledge representations and since LO repositories are not cgosSWs, retrieving and combining them is difficult.
Current LO related standards [www-LOS, 2009] (e.g., AICC, SCORM, ISM, IEEE WG12) and projects (e.g., CANDLE, GEODE, MERLOT, VLORN) essentially focus on associating simple meta-data to whole documents or big parts of them (e.g., author, owner, terms of distribution, presentation format, and pedagogical attributes such as teaching or interaction style, grade level, mastery level and prerequisites). Each of such LOs cannot actually be a "truly re-usable LO" but is a package of objects selected and ordered to satisfy a certain curriculum. Although such packages are useful for pedagogical purposes and ease the task of most course designers since they are ready-made packages, they are black-box packages, that is, their decomposition into objects of a cgosSW has not been made explicit and hence they cannot be easily modified nor compared or efficiently retrieved: i) they can only be retrieved via keywords, not via arbitrary complex conceptual queries on the objects they contain, and ii) from a browsing viewpoint or a conceptual querying efficiency viewpoint, they cannot be organized into a lattice (partial order) according to the objects they combine.
More details on the interest of a cgosSW for LOs and (e-)learning can be found in [Martin & Eboueya, 2008]. Details on the use of WebKB-2 related techniques for (e-)learning can be found in [Martin, 2009]. [Yessad et al., 2009] introduces an ontology-based semantic distance for detecting the relevance of LOs.
Some information repository projects use formal KBs, e.g., the OpenGALEN project which created a KB of medical knowledge, the QED Project [www-QED, 1995] which aimed to build a formal KB of all important, established mathematical knowledge, and the Halo project [Friedland et al., 2004] which has for very long term goal a system capable of teaching much of the world's scientific knowledge by preparing and solving test questions for students according to their knowledge and preferences. Such formal KBs permit to support problem solving but they are not meant to be directly read or browsed, and designing them is difficult even for teams of trained knowledge engineers, e.g., the six-month pilot phase of Project Halo was restricted to 70 pages of a chemistry book and had encouraging but far-from-ideal results. Hence, such fully formal KBs are not adequate for scalable information sharing or retrieval.
Informal documents (articles, emails, wikis, etc.), that is, documents mainly written using natural languages such as English, as opposed to knowledge representation languages (KRLs), do not permit objects to be explicitly referred and interconnected by semantic relations. This forces document authors to summarize what has been described elsewhere and make choices about which objects to describe and how: level of detail, presentation order, etc. This makes document writing a time consuming task. Furthermore, the lack of detail often makes difficult for people or softwares to understand the precise semantic relations between objects implicitly referred to within and across documents. This leads to interpretation or understanding problems, and limits the depth and speed of learning since retrieving or comparing precise information has to be done mostly manually. The automatic indexation of sentences within documents permits to retrieve sentences that may contain all or parts of some required information (this process is often called "question answering"; tools supporting it, e.g., WeboPedia and Ask Jeeves, are evaluated by the TREC-9 workbenches) but the lack of formalization in the sentences often does not permit to extract and merge their underlying objects and relations.
Cognitive maps [www-COM, 2009] and concept maps [www-CM, 2009] [Novak, 2004] – or their ISO version, topic maps [www-XTM, 2009] – have often been used for teaching purposes (e.g., many examples in biology can be found in [Leung, 2005]). However, they are overly permissive and hence do not guide the user into creating a principled, scalable and automatically exploitable semantic network. For example, they can use relations such as "of" and nodes such as "other substances" instead of semantic relations such as "agent" and "subtask", and concept names such as "non-essential_food_nutrient". Thus, concept maps are often more difficult to understand or retrieve, aggregate and exploit than regular informal sentences (from which, unlike deeper representations, they can currently be automatically generated); [Sowa, 2006] gives commented examples of many shortcomings of concept maps.
Similarly, the modeling of the preferences and knowledge of students or other people is often very poor, e.g., a mere keyword for each learned subject or LO (e.g., "Java") and a learning level for it (e.g., "advanced"). This is for example the case with the CoAKTinG project [Page et al., 2005] which aims to facilitate collaboration and data exchange during or after virtual meetings on a semantic grid, and the Grid-E-Card project [Gouardères et al. 2005] which manages a model of certification for each LO and each student on a grid to facilitate the learning and insertion of this student within relevant communities. For efficacy and scalability purposes, it is necessary to use a more fine-grained approach in which all the statements for which a student has been successfully tested on are recorded.
Some reasons why more knowledge-oriented solutions are not developed can be listed as follow:
Point 2 was the reason given by many creators of "knowledge-oriented" hypermedia systems or repositories to explain the limited expressiveness of the formal features or notations proposed to the users. This was for example the case for the creators of SYNVIEW [Lowe, 1985], AAA [Schuler & Smith, 1992], ScholOnto [Buckingham-Shum et al., 1999] and the Text Outline project [Sanger, 2006]. [Shipman & Marshall, 1999] notes that the restrictions of knowledge-based hypermedia tools often lead people not to use them or to use them in biased ways. Although this fact appears to be presented as an argument against knowledge-based tools, it is actually an argument against expressiveness restrictions set by tool developers to ease their tasks (especially for designing graphical interfaces) and, supposedly, to avoid confusing the users. As in [Shipman & Marshall, 1999], it can be concluded that annotation tools should provide users with generic and expressive structuring features but also convenient default options, and that the users should be allowed to describe their knowledge at various levels of details, from totally informal to totally formal so that they can invest time in knowledge representation incrementally, collaboratively and only when they feel that the benefits out-weight the costs.
Although the points 1 to 5 of the previous paragraph are valid, it has been previously argued that effective or scalable knowledge sharing and retrieval cannot be achieved without a "global virtual KB" (a cgosSW) which, to a large extent, is collaboratively updated by the information providers themselves. Although using this cgosSW requires the learning and use of graphical or textual notations for representing information precisely, in the long term this probably won't be a problem. Here are some rationales.
For the following reasons, a global virtual KB is likely to be first adopted by (e-)learning researchers:
[Martin, 2009] describes an experiment to validate the sixth point. The most important concepts and statements of learning materials of three courses at Griffith University were represented into three semantic networks in FL (this did not require making quantifiers explicit). For each course, its related network was presented to the students and, as a replacement for an informal learning journal, the students were invited to extend the network (first separately and then collaboratively) by adding not-yet-represented concepts and statements from the informal supports for the courses. The input files for the semantic networks are (like all Information Sciences related input files of WebKB-2) directly or indirectly accessible from http://www.webkb.org/kb/it/. The sixth point was acknowledged by many of the students after they had learned how to read the semantic networks.
Like the previously presented distributed knowledge sharing strategies, the W3C's strategy is minimal: the W3C only proposes a low-level KRL (RDF+OWL and the Rule Interchange Format) and some optional rudimentary "best practices" [Swick et al., 2006], and envisages the Semantic Web to be composed of many small KBs (RDF documents) which are more or less independently developed and hence partially redundant, competing and very loosely interconnected since the knowledge provider is expected to select, import, merge and extend other people's KBs into her own [Hendler, 2001] [Rousset, 2004]. This approach which relies on "formal documents" has problems that are analogue to those above listed for "informal documents":
Most of current Semantic Web related approaches focus on supporting the manual setting or automatic discovery of relations between formal terms from different ontologies. They are quite understandably imperfect but can be sufficient for certain applications and are necessary to re-use existing ontologies. Comparisons of ontology matching tools are yearly done by the "Ontology Alignment Evaluation Initiative", e.g., see [Euzenat et al., 2005] and [Caraciolo et al., 2008]. Although more interested in "ontology matching" [Euzenat & Shvaiko, 2007], Euzenat has acknowledged the interest of (semi-)formal KB servers that let both people and software agents directly exploit and save new knowledge or object alignments, that is, query, complement, annotate and evaluate the existing objects, guided by large, shared and well-organized KBs.
The next subsections and sections present some avenues to support an approach that would be efficient and scalable for knowledge sharing and retrieval on the Internet or within large intranets: the collaborative creation of a cgosSW, hence a global virtual well-organized (semi-)formal KB without redundancies nor implicit inconsistencies.
A cbwoKB lifts some governance and scalability issues but not the dependence of the users with respect to the owner(s) of these servers. Furthermore, one cbwoKB server cannot support knowledge sharing for all communities. For scalability purposes, the cbwoKB servers of different communities or persons should be able to interact for creating a global virtual cbwoKB (gv-cbwoKB) without a central brokering system, without restrictions on the content of each KB, and without the individual cbwoKB servers necessarily having to register to a particular super-community or peer-to-peer (P2P) network. For several cbwoKB servers to be collectively seen as such a gv-cbwoKB, it should not matter which KB a user or agent chooses to query or update first. Hence, object additions/updates made in one KB should be replicated into all the other KBs that have a scope which covers these objects. Similarly, when relevant, a server should forward (parts of) a query to other servers in order to give a more complete answer.
With respect to these specifications, current approaches for distributed querying amongst a few KB servers (e.g., [Lee et al., 2010]) or for collaboration between KB servers/owners seem insufficient. Examples for such collaboration approaches are those of [Casanovas et al., 2007] [Noy & Tudorache, 2008] which are based on integrating changes made in other KBs, and those of [Palma et al., 2008] which also use a workflow system. Indeed, even in Semantic Grids and semantic-based P2P networks, both kinds of approaches are based on partial descriptions of the content of each KB or on predefined roles for each user, and the redundancies or inconsistencies between the KBs are not made explicit. This often makes difficult for people or systems i) to find relevant KBs to search or update, and ii) to integrate query results. Furthermore, no semantic-based Peer-to-Peer architecture (e.g., see [p2p-Pitoura 2006] [p2p-semanticQuerying 2006]) is yet flexible enough to permit the semantic querying – or (cross-)indexation of knowledge among peers – required by a gv-cgosSW.
As in the previous sections, a solution is to let the knowledge indexation and distribution be made at the object level instead of the file/KB/community/owner level. More precisely, to satisfy the above cited specifications, for a cbwoKB server to be part of one or many gv-cbwoKBs, the requirement is that for every term T it stores in its KB, this server must either
Thus, via forwards between servers, all objects using T can be added or found in each nexus for T. This requirement is an adaptation and refinement of the 4th rule of the Linked Data approach [Bizer et al., 2010]: "link things to their related ones in some other data sets". Indeed, to obtain a gv-cbwoKB, the data sets must be managed via cbwoKB servers and there must be at least one nexus for each term. A consequence is that when the scopes of two nexus overlap, they share the same common knowledge and there is no inconsistencies nor implicit redundancies between them. Thus, the gv-cbwoKB has a unique ontology distributed on the various cbwoKB servers. This approach would work with cbwoKB servers on the Web but also in a P2P network (or semantic grid) where each user has his own KB server. The main difference is that a P2P network permits to implement systematic push/pull mechanisms instead of relying on KB servers to regularly check the KBs of other servers and integrate new updates.
For the owners of a cbwoKB server wanting it to be part of one or several gv-cbwoKBs, the difficult task is to integrate the ontology of their server into the global one(s). More precisely, for each term T of their ontology they must first find via a Web/ontology search engine if some other server has advertised itself as a nexus for T. Once they have found a nexus for one of the terms, by navigation between nexus, finding a nexus for other terms may be easier. This search is somewhat similar in its approach to the search of a "right place" to insert a new term in a cbwoKB. When all the nexus are found, via the forwarding of updates between them, the protocol described in the previous section could be applied at the scale of the whole gv-cbwoKB constituted by this network of nexus. An integration task is at the core of most knowledge sharing or re-use approaches. Their methods can be re-used for the loss-less integration of the ontology of a cbwoKB into the one of a gv-cbwoKB. An important point is that this integration of an ontology has to be done only by the creators of this ontology (or owners of the cbwoKB) who know what the objects in this ontology mean. Once this integration is made, regularly and (semi-)automatically integrating new knowledge from other nexus is much easier since a common ontology is shared. Thus, it can be envisaged that one initial cbwoKB server be progressively joined by other ones to form a more and more general gv-cbwoKB.
The cornerstone of the approach is the formal commitment of being a nexus for a term (and hence of being a cbwoKB since direct searches or updates by people must be allowed). This formal commitment should be advertised in a standard way. To do so, the media and KRL to use is not an issue: a public RDF+OWL/XML file should at least be used (since this is what is recommended by the W3C and hence is de-facto standard) even though, as illustrated in Chapter 4 (e.g., in Subsection 4.2.6), RDF+OWL and RDF/XML are respectively a low-level model and syntax which, whenever the reference collections are not simple to describe, leads knowledge engineers to come up with ad-hoc representations that are difficult to compare, merge or exploit.
What is an issue is that there is no standard term to describe this scope and commitment. The W3C does not recommend any formal term to assert that a particular server commits to continually i) search for other resources (e.g., database/KB servers or static files) that provide knowledge relevant to its scope, and ii) include this knowledge in its KB. Ideally, the specification should also indicate the list of discovered relevant resources, the kind of knowledge that could or could not be copied from these sources or integrated in the KB, and the polling period. The Dublin Core relation type named "Coverage" is often used for specifying the content of a resource but is not meant to also permit to specify that the resource is in a certain sense "complete" with respect to the described scope. This is why the MSO proposes the following process type.
integrating_all_published_information_specified_as_parameter
.(agent: *x, parameter: ontology *y)
//subprocess of: supporting_loss-less_knowledge_sharing_between_KBs, //already stated
agent: 1..* pm#causal_entity *x,
parameter: (KB_reference_collection < pm#ontology //a set of formal terms and definitions
) __[any->1..* *y],
output: 1..* pm#description_content/medium/container,
input: 1..* pm#description_content/medium/container,
period: 1 pm#time_measure;
Any other type of "relation from/to a process" can be used from a node of this type. Here are two examples of its use:
pm#Ray_White_real_estate_in_Southport_Queensland_Australia
url: http://www.raywhite.realestate.com.au/southport,
agent of:
(any pm#Ray_White_integration_of_all_real_estate_information_in_Southport_Queensland
< integrating_all_published_information_specified_as_parameter,
parameter: { (wn#real_property place: QLD#Southport) },
output: http://www.raywhite.realestate.com.au/southport,
input: every (. pm#real_property_agency_in_southport < #enterprise,
agent of: 1..* (. wn#selling_real_estate_in_Southport_Queensland
< wn#selling,
location: QLD#Southport,
object: 1..* wn#real_property
) ),
period: every pm#Thursday
);
pm#Ray_White_real_estate_in_Southport_Queensland_Australia
integrating_all_published_information_specified_as_parameter:
{ pm#real_property_agency_in_southport };
The actual coverage of a particular domain by a knowledge server X can be checked by automatically exploring other servers related to this domain and see which percentage of all their "knowledge relevant to the specified domain" are also in the knowledge server X. Such information may for example be published by competitors. In a business environment, it is in the interest of a competitive company to check what its competitors or related companies offer and either integrate (and hence compare) their public information in its public files (Web pages, database, KB) and/or refer to Web-accessible Web pages. It is also in its interest to refer to the most comprehensive KBs of its related companies.
Furthermore, if a formal term such as pm/km#integrating_all_published_information_specified_as_parameter was made a recommendation by the W3C, it would be in the interest of a competitive company to create such an integration and use such a term to advertise this integration (and hence be listed prominently by Web Search engines), thereby advertising its competitiveness and thus attracting clients. The next two subsections and, more generally, all the chapters of this document, show how these clients could be allowed to contribute information or feedbacks to such an integration, while keeping it well ordered and easy to filter (by end-users or the company) according to certain viewpoints. Thus, it may be in the interest of a competitive company to also allow this. Indeed, many companies not only survey what people write about them in their blogs but also offer people a place on their official Web sites to blog about them, even if this often means that the company offers a well-visited platform for people to complain about its practices or products. Supporting the techniques described in the next subsections permit the feedbacks to be semantically ordered and collaboratively evaluated (and hence retrievable, comparable and reducing the impact of spams or dishonest messages) instead of being mostly long unordered lists of messages. Thus, supporting such techniques would likely ease the gathering of information and increase the trust that people have in the collected information. To distinguish servers who allow a collaborative building of their KB from other servers, the MSO proposes the following process type.
supporting_the_collaborative_building_of_the_KB_specified_as_output
.(agent: *x, output: KB *y) //the next line has already been stated
//subprocess of: sharing_of_a_knowledge_base_that_is_physically_distributed_or_not,
agent: 1..* pm#causal_entity *x,
output: 1 KB;
Here is an example of its use:
official_WebKB
kind: WebKB,
url: http://www.webkb.org,
supporting_the_collaborative_building_of_the_KB_specified_as_output:
(MSO_of_WebKB-2___MSO___Multi-Source_Ontology_of_WebKB-2
< ontology, subset of=> WebKB __[?mso->any]
) __[.->?mso];
To conclude, if the W3C recommended a small ontology that included the last two introduced process types along with the common basic types of relations associated to processes or to situation (see Table 3.1.3.3 to Table 3.1.3.5), this would likely go a long way towards popularizing techniques similar to those described in this subsection and hence towards much better knowledge sharing.
The approach described in this section seems the simplest knowledge distribution approach because
This approach would work with servers on the Web but also in a peer-to-peer (P2P) network (or semantic grid) where each user has her own KB server: the main difference is that a P2P network permits to implement systematic push/pull mechanisms instead of relying on KB servers to regularly check the KBs of other servers and integrate new additions.
None of the current P2P architectures (e.g., [p2p-semanticRouting 2004] [p2p-semanticDrivenHashing 2004] [p2p-semantic 2005] [p2p-semanticQuerying 2006]), even RDF-oriented P2P architectures [p2p-Pitoura 2006], are flexible enough to permit the semantic querying – or the (cross-)indexation of information from the knowledge sources – required by a cgosSW. However, two complementary approaches seem possible to ensure the cross-indexation of knowledge among peers. One would be to implement the replication mechanism described in this subsection. A more restricted but complementary approach would be to redesign the Distributed Hash Table (DHT) routing/indexation algorithm to use formal terms (semantic categories) – and their specialization relationships – instead of informal terms (words). One idea is to encode the specialization hierarchy of the virtual ontology into a DHT structure with nodes also representing concepts and node neighborhood relationships representing specialization relations between these concepts. This idea is an extension of the idea used in [p2p-semanticRouting 2004].
Integrating knowledge from other servers of large KBs is not easy but it is easier than integrating dozens or hundreds of (semi-)independently created small KBs. Furthermore, since in the proposed approach the first integration from a server is loss-less, the subsequent integrations from this server are much easier. A more fundamental obstacle to the widespread use of this approach is that many industry-related servers are likely to make it difficult or illegal to mirror their KBs; however, this problem hampers all integration approaches.
The above described replication mechanism complements works on the distributed querying of KBs (e.g., [Gandon et al., 2008]). It is also a way to combine advantages commonly attributed to "distributed approaches" and "centralized approaches". Indeed, distribution and centralization only have opposite meanings when applied to actions and physical things, not when applied to easy-to-copy information. Decentralizing actions (and in particular, decisions), i.e., distributing manpower and decision power to individual persons, is generally a good thing. Centralizing information only implies gathering and relating information; this can be done without centralized storage and management and, at least under those conditions and when security and privacy are ensured, is also generally a good thing: this is how data is progressively transformed into knowledge. More generally, every standard is a centralization tool; the Internet and the Web currently work because they make everyone use very few protocols (TCP/IP, HTTP, ...) and languages (HTML, XML, RDF+OWL, ...). Yet, most Semantic Web related researchers seem to believe that a Web of "mostly independently created resources" is the only possible way to achieve the distribution of actions and information. This section shows (hopefully) that this is not the case. The next two subsections highlight other related misconceptions. They show that "collaboratively editing a same KB" (i.e., centralization) does not imply that the users have to agree or even discuss terminological issues or beliefs, nor that a committee making content selection or conflict resolution for the users is necessary.
Most knowledge servers support concurrency control and users' permissions on files/KBs but only two knowledge servers seem to have special protocols to support collaboration between users: Co4 [Euzenat, 1996] and WebKB-2. (Freebase, Ontolingua, Ontosaurus, Ontowiki, DBpedia ... have no collaboration protocols; Wikipedia is not a knowledge server and has no such protocol either). More generally, WebKB-2 seems to be the only knowledge server having editing protocols that permit, enforce or encourage people to interconnect their knowledge into a shared KB, without having to discuss and agree on terminology or beliefs, and while keeping the KB consistent. Co4 had knowledge sharing protocols based on peer-reviewing for finding consensual knowledge; the result was a hierarchy of KBs, the uppermost ones containing the most consensual knowledge while the lowermost ones were the private KBs of contributing users. Starting from a shared KB where each statement has many associated creators or believers, a similar hierarchy of KBs could also be generated. The fact that a protocol exploits many KBs or one shared KB is only an implementation issue. It can be changed without changing the spirit of the protocols. However, implementations are likely to be easier and more efficient in a shared KB. Protocols used in other knowledge servers [Lausen et al., 2005] or in knowledge oriented approaches in peer-to-peer networks [Rousset, 2004] or Semantic Grids [Page et al., 2005] focus on managing the integration of a source KB into a private/shared target KB: these protocols are not permitting the users of the two involved KBs to tightly interconnect their knowledge.
The cbwoKB editing protocol used in WebKB-2 are not tied to any particular knowledge representation (KR) language or inference mechanism (hence, in this document, no comparison is made on such mechanisms). This protocol only requires that conflicts between knowledge representations – i.e., partial redundancies or inconsistencies between terms or statements – are detected by some inference mechanism or by people. (Hence, the protocol also works with informal pieces of knowledge as long as they can be inter-related by semantic relations). The more conflicts are detected, the more the KB is kept organized and hence exploitable. Section 2.2.5.1 reminds and summarizes the language model for the KB editing protocol used in WebKB-2. Section 2.2.5.2 details this protocol, i.e., the approach used for letting people edit the shared KB. Section 2.2.5.3 quickly compares the approach with some other ones. Section 2.2.5.4 illustrates some technique used in WebKB-2 detecting conflicts and, more generally, comparing statements.
The model for the protocols
– i.e., their view on a KB (whichever KR language it actually uses) –
is a set of objects which are either terms or statements.
Every object must have at least one associated source (e.g., creator,
interpreter or source ontology) represented by a formal term.
A formal object is one that has a unique meaning according to its source.
Any non-contextual identifier for a formal object must include an
identifier for its source.
A formal term refers to either a concept type ("class" in RDF),
a relation type ("property" in RDF) or an "individual" (an instance
of a first-order type, e.g., a particular set, relation, statement or concept).
A formal statement has an interpretation in some logic and hence has a
logic-based syntax.
A statement that has a logic-based syntax but includes at least one
informal object is semi-formal.
Every asserted statement is either a definition or a belief.
Here are some examples in FE, an English-like but formal notation used
in WebKB-2.
pm#bird and wn#bird are formal terms for
concept types respectively created by pm and wn,
two formal abbreviations (contextual identifiers)
in the default KB of WebKB-2 for respectively the author of this document
and WordNet 2.0.
"bird", en#"bird" and
pm#"bird" respectively
refer to the string "bird", the English word "bird", and one or
several concept types created by pm and having for informal name "bird".
u1#u2#"birds fly" is an informal statement from u2 that is
represented by u1.
u1#`any u1#bird is pm#agent of a pm#flight´ is a formal statement and
partial definition by u1 of u1#bird :
it states that to be a bird, a necessary condition is to (constantly) fly.
u2#`every u1#bird is agent of a flight´ is a semi-formal
belief by u2 that "every u1#bird is flying".
u3#` `every daylight ?t `every u1#bird is agent of a flight with time ?t´ ´
= u3#edbfdi = u3#every_daylight__birds_flies_during_it'
is a definition by u3 of the terms u3#edbfdi and
u3#every_daylight__birds_flies_during_it as referring to a
non-asserted statement representing in a semi-formal way that
"every day, every bird is flying during all daylight".
u3#` u3#edbfdi is a pm#corrective_specialization of
u2#`every u1#bird is agent of a flight´ ´ is a semi-formal belief
by u3 that his previous statement is a correction and a specialization
of u2's belief.
In KIF [Genesereth, 1998], a knowledge representation language (KRL) with a second-order notation that can be interpreted in first-order logic, these example statements can respectively be represented as follows.
(pm#creator_of u1 '(defconcept u1#bird (?b) :=> (exists ((?f pm#flight)) (pm#agent ?f ?b))))
(pm#believer_of u2 '(forall ((?b u1#bird)) (exists ((?f flight)) (agent ?f ?b))))
(pm#creator_of u3 '(= u3#edbfdi u3#every_daylight__birds_flies_during_it
'(forall ((?t daylight)((?b u1#bird))
(exists ((?f flight)(?t daylight)) (and (agent ?f ?b) (time ?f ?t))
(pm#believer_of u3 '(pm#corrective_specialization
'(forall ((?b u1#bird)) (exists ((?f flight)) (agent ?f ?b))) u3#edbfdi)
)
u3#edbfdi specializes u2's belief.
A statement Y specializes a statement X if it uses more specialized terms or
structurally contains more information (and hence
either contradicts it or makes it redundant).
Thus, negating a statement is also specializing it.
Hence, "generalization between formal statements" does not solely refers to
"logical implication". The general
"specialization" relation type (for which one identifier is
pm#specialization) is applicable to organize all kinds of objects,
formal or not, and has many subtypes, including the classic subtype relation type.
As illustrated in Section 2.2.5.4, there are efficient (polynomial) – but incomplete – ways to test if there exists a relation of specialization between two statements, whatever their expressiveness [Mugnier & Chein, 1992] [Chein & Mugnier, 1997]. Searching a specialization hierarchy can be done in O(N3) comparisons, where N is the number of nodes in the compared statements [Levinson & Ellis, 1992].
bagID keyword, thus no way
to represent contexts and hence believers or beliefs. XML name-space
prefixes (e.g., u1:bird), Dublin Core relations and statement
reification do not permit to do this. This is likely a temporary only
constraint since many RDF-related languages or systems extend RDF in
this direction: Notation3 (N3), Sesame, Virtuoso, ...
u1#cat as a subtype of wn#chair;
there is no inconsistency as long as the ways u1#cat is
further defined or used respect the constraints associated with
wn#chair. A definition may be changed by
its creator but then the meaning of the defined term is changed rather
than corrected. This distinction is important for a cbwoKB editing
protocol since it leads to different conflict resolution strategies:
"term cloning" and "loss-less correction" (Rule R4 and Rule R8 of the
next subsection).
A shared KB is "ideally well organized with respect to the objects it includes" if, given any two objects in this KB, given the meaning of each of the two objects in its creator's head and given their various possible relationships, i) all the relationships that can be represented using only already represented objects are represented, and ii) to that end, all already represented objects relevant for representing these relationships are directly or indirectly used. Thus, the most precise objects (e.g., terms) have to be used and there must exist a pecialization hierarchy (alias, generalization hierarchy) organizing all the objects. This also entails that the bigger such a KB is, the more precise the objects tend to be. By supporting and guiding knowledge providers in representing knowledge as explicitly as they are willing to, the best practices of Section 2 are an help in aiming towards that ideal KB.
The KB editing protocol described below is meant to keep a shared KB at-least-minimally-well-organized via the next two "minimal requirements": i) each term (resp. statement) – except the most general one, which may be generated if it is not given – must be explicitly related to at least another term (resp. "belief", i.e., asserted statement that is not a definition) via inferred or manually-set relations of generalization or equivalence as well as exclusion (resp. equivalence, argumentation or correction, and if possible, corrective_generalization or corrective_specialization), and ii) manually or automatically detected inconsistencies or partial/total redundancies are prevented or made explicit via these relations. Indeed, these two requirements imply that every object of the KB has a unique place in the global, fully connected, specialization hierarchy of the KB and in the network composed of relations of the above cited kinds. This "unique place" in the specialization hierarchy is a minimal requirement for knowledge insertion and retrieval to be done in a scalable way in this hierarchy and thus in the KB of which it is a backbone [Dromey, 2006]. Rejecting an action introducing an implicit partial/total redundancy is also important because this often permits the author of the action to detect a mistake, a bad interpretation or a lack of precision (on his part or not). At the very least, this reminds the users that they should check what has already been represented on a subject before adding something on this subject.
This protocol is not tied to any particular language or inference mechanism. However, it requires the language to support the previously described language model and hence, for example, to distinguish between beliefs and definitions. Regarding inference mechanisms, this protocol only requires that i) the source of each object is explicitly represented in some way, and ii) at least one inference engine tells it if, when an object addition, modification or removal is made in the KB, this creates a conflict in this KB and if this conflict is implicit or not. A conflict is an inconsistency between statements or a partial/complete redundancy between statements that are not both definitions. Indeed, a detected redundancy between terms or term definitions is not a conflict, it is an inferred equivalence or specialization relation. Apart from this case, if a statement Y is detected as a specialization of a statement X and if the main object of Y is not an instance of the main object of X, there is a detected conflict. Implicit means not explicitly represented by an inferred or manually set relation which i) between beliefs, may be a relation of equivalence or correction, and ii) between definitions, and hence between terms too, may be a relation of equivalence, specialization or exclusion. Since the integration is loss-less, there is no reason why the order in which users enter knowledge should affect conflict detection and hence should be of importance.
Table 2.2.5.2.1 gives the high-level algorithm of the protocol in
an object-oriented Java-like syntax (for clarity purposes). Then,
explanations are given via a list of informal rules. Table 2.2.5.2.2
summarizes the cases related to conflict detections. This algorithm
checks on a user's attempt to remove or add a statement, and rejects
the action ("return false") or accepts it. It may perform some
automatic repair step before accepting the action. It only allows
statement removal and addition since
i) an update is considered as a removal followed by an addition
(WebKB-2 and the informal rules also consider direct updates),
ii) reading or re-using an object is always accepted (privacy control is
not dealt with in this document), and
iii) term removal or addition must be made via the removal or addition of a
statement (Rule R2 below).
Like Rule 2, Rule 12 is implicitly enforced because the algorithm only uses
the generic knowledge model described in Section 1.1.
From now on, the word "user" is used as a synonym for "source".
Furthermore, u1 and u2 are used as examples of different users.
//The informal rules enforced by some steps of the following functions are referred to via comments
boolean statement.removal_by (User agent) //true is returned to accept the removal
{ if (this.creator != agent) return false; //rule R1
if (agent.created_statements_without(this).are_conflicting()) return false; //R3
if (KB.statements_without(this).are_conflicting())
{ if (this.is_definition_of_existing_term_created_by(agent))
{ s= KB.statements_of_other_users_using_this_term_and_conflicting_with_this(agent,term,this);
s.clone_this_term_for_and_in_these_statements(this.defined_term()); //R4
}
else
{ s= KB.statements_of_other_users_using_this_belief_and_conflicting_without_it (agent,this);
s.clone_this_belief_for_and_in_these_statements(this); //R9
}
}
KB.remove(this); return true;
}
boolean statement.adding_by (User agent) //true is returned to accept the addition
{ if (agent.created_statements_with(this).are_conflicting()) return false; //R3
if (KB.statements_with(this).are_conflicting())
{ if (this.is_definition_of_existing_term_created_by(agent))
{ s= KB.statements_of_other_users_using_this_term_and_conflicting_with_this(agent,term,this);
s.clone_this_term_for_and_in_these_statements (this.defined_term()); //R4
}
else if (! KB.statements_with(this).have_only_explicit_conflicts_between_beliefs() ) )
return false; //R5, R6, R7 or R8
}
else if (this.is_definition_of_a_new_term())
{ if (!this.is_connected_to_at_least_another_term_via_a_relation_of_one_of_the_following_types
("pm#equivalence", "pm#generalization", "pm#exclusion")
return false; //R11
}
else if (this.is_a_belief())
{ if (!this.is_connected_to_at_least_another_belief_via_a_relation_of_one_of_the_following_types
("pm#equivalence", "pm#correction", "pm#argumentation")
return false; //R10
}
KB.add(this,agent); return true;
}
|
| Action by u1 | Conflicting with | Rule |
| add/delete s1 | a statement of u1 → reject | R3 |
| add/delete definition of already existing t1 |
s2 → clone t1for u2 (and in s2 if needed) | R4 |
| add definition to new t1 | s2 → reject | R5 |
| add definition to t2 | s2 → reject | R6 |
| add belief b1 | definition of t2 → reject | R7 |
| add belief b1 | s2 being a belief → reject if the conflict is implicit | R8 |
| delete belief b1 | s2 → clone b1 for u2 | R9 |
Here are the informal rules enforced by this algorithm.
R1 (object update only by its creator). Any user may add and use any object, and hence may for example add a relation between objects he has not created, but an object may only be modified or removed by its creator.
R2 (operations on terms via operations on statements). Adding, modifying or removing a term is done by adding, modifying or removing at least one statement (e.g., one relation) that uses this term.
R3 (no conflict between statements of a user). If a user adds, modifies or removes a statement that introduces a detected conflict between statements believed by this user, this action is rejected by the system. Thus, in the case of an addition, the user must refine his statement before trying to add it again or he must first modify at least one of his already entered statements.
R4 (cloning of updated terms used by other users in a way not consistent with the new definition). If the addition, modification or removal of a statement defining an already existing term u1#T by a user u1 introduces an inconsistency involving statements directly or indirectly re-using u1#T and created or believed by other users (i.e., users different from u1), u1#T is automatically cloned to solve this conflict and ensure that the original interpretation of u1#T by these other users is still represented. Indeed, such a conflict reveals that these other users had a more general interpretation of u1#T than u1 had or now has. Assuming that u2 is this other user or one of these other users, the term cloning of u1#T consists in creating u2#T with the same definitions as u1#T except for one, and then replacing u1#T by u2#T in the statements of u2. The difficulty is to choose a relevant definition to remove for the overall change of the KB to be minimal. In the case of term removal by u1, term cloning simply means changing the creator's identifier in this term by the identifier of one of the other users (if this generated term already exists, some suffix can be added). In a cbwoKB server, since statements point to the terms they use, changing an identifier does not require changing the statements. In a global virtual cbwoKB distributed on several servers, identifier changes in one server need to be replicated to other servers that use this identifier. [Djedidi & Aufaur, 2009] proposes an ontology pattern using term cloning for knowledge integrations that are not loss-less.
R5 (no new term with a definition conflicting with statements of other users). If the addition of a new term u1#T by a user u1 introduces an inconsistency with statements of other users, this action is rejected by the system. Indeed, such a conflict reveals that u1 has directly or indirectly used – and misunderstood – at least one term from another user in his definition of u1#T. Hence this conflict reveals an incorrect belief of u1 and thus, as in R8, this conflict should be rejected.
R6 (no new definition to another user's term if the definition conflicts with another statement). The addition by a user u2 of a definition to u1#T implies a belief of u2 about the meaning of u1#T. Thus, this addition should be rejected if it conflicts with another statement (from u1 or not since no term cloning is not worth to done here: u2 can simply create and/or use another term).
R7 (no conflict between a new belief and a definition). If adding or modifying a belief introduces an implicit conflict with a definition, this action is rejected, whether or not the definition is about a term created by the author of the definition. Indeed, as in the previous rule, no term cloning is worth to done here.
R8 (no implicit conflict when adding a belief). If adding a belief introduces an implicit conflict involving beliefs created by other users, this action is rejected. However, a user may still represent his belief (say, b1) – and thus "loss-less correct" another user's belief that he does not believe in (say, b2) – by explicitly connecting b1 to b2 via a corrective relation. This has already been illustrated but here are two other examples of "corrections to solve conflicts":
i) u2#` u1#`every bird is agent of a flight´ has for pm#corrective_specialization
u2#`every healthy flying_bird is able to be agent of a flight´ ´, and
ii) u2#` u1#`every bird can be agent of a flight´ has for pm#corrective_generalization
u2#`75% of bird can be agent of a flight´ ´.
In the first case, u2's belief specializes u1's belief and corrects it. In the second case, u2's belief generalizes u1's belief and corrects it.
R9 (cloning of a belief if its update conflicts with other users' statements). If modifying or removing a belief introduces an implicit conflict with other users' statements, the belief is cloned: its associated creator becomes one of these other users.
R10 (no addition of a belief without at least one relation of equivalence, correction or argumentation to another belief).
This is particularly important for informal beliefs since conflicts between them and
the rest of the KB are difficult to detect. It is preferable to use relations of type
pm#corrective_generalization or pm#corrective_specialization
than non-transitive correction relations, and preferable to avoid using argumentation
relations whenever possible. A most general belief is predefined:
`a pm#thing´ ("there exists something").
R11 (no addition of a term without at least one relation of equivalence or generalization as well as exclusion to another term).
If these two relations can be automatically inferred based on the definitions of the
new term, there is no need to add them manually.
A most general term (pm#thing) is predefined.
R12 (no other asserted objects than beliefs or term definitions).
If a set or collection is asserted, it should be via the definition
of an instance to a collection type (e.g., pm#set).
A query – and even a query operator – can be described via
a term definition. The purpose of this rule is for the protocol to
cover all objects of the KB, for it to be
at-least-minimally-well-organized, for the same mechanisms to work
on all objects and for knowledge to be represented in an explicit
way, even if some objects are represented informally.
In particular, evaluations by individual users on the interest (originality, importance, veracity, ...) of other user's objects should be done via meta-statements. Thence, these evaluations can be exploited in user-defined queries to filter knowledge when browsing the KB or when generating modules for an application. Section 2.2.6 gives a framework for possible "default evaluation measures".
More generally, methods to measure the average interest of an object based on these evaluations – or the interest of a knowledge provider based on the average interest of its statements – should be represented in an explicit way, ideally in a declarative way. Thus, each user can easily create measures tailored to his needs by deriving them from other measures (e.g., the default measures proposed by a cbwoKB) and these measures can be organized into the specialization hierarchy (they are function definitions, hence term definitions).
This approach may be seen as the beginning of a technical vision for the very general "model of discursive practice" of [Brandom, 2000]. These measures do not have to be in a rule-based language, they may be in a functional language or in a command-based script language. WebKB-2 proposes the language FC (For Control) which permits users to combine query/assertion commands via shell script like control structures (loops, pipe, procedure declaration). The arguments of its query/assertion operators can be knowledge objects in any of the accepted notations (FE, FL, FCG, KIF, ...).
Ideally, the knowledge presentation used by the above cited measures or by the query operators of a cbwoKB should be specifiable by the user. This is partly the case in WebKB-2: users can use FC to change the value of predefined "presentation variables".
The approach of this protocol leads to a loss-less integration of knowledge, based on the interconnection of objects, preferably undecomposable ones. Relating multi-statement objects has not been forbidden by a rule in order to ease incremental knowledge refinement but this constraint could be added. In any case, an object can be decomposed objects whenever correction relations or other meta-statements have to be specified about a statement that is part of this object. Thus, unlike approaches based on physically separated multi-statement modules, this approach does not have to deal with the problems associated to the use of different versions for such modules. This approach may be used by current knowledge integration methods to save their results into a cbwoKB instead of creating new KBs partially redundant with the ones they use as sources. Conversely, these methods can guide loss-less integration.
The approach followed by the protocol is unrelated to the approaches of defeasible logics based works.
This approach also solves some governance and scalability problems since the KB stays "at least minimally well organized" without restricting users on content or terminology choice. Previous sections have presented problems that current shared KB servers (semantic wikis included) suffer from due to not having such a protocol.
The approach of the protocol assumes that all beliefs can be argued
against and hence be "corrected". This is true only in a certain
sense. Indeed, among beliefs, one can distinguish
"observations",
"interpretations" ("deductions" or "assumptions") and
"preferences".
Although all these kinds of beliefs can be false (their authors can lie,
make a mistake or assume a wrong fact), most people
would be reluctant to argue against self-referencing beliefs such as
u2#"u2 likes flowers" and
u2#"u2 authored this sentence".
The editing protocol of WebKB-2 relies on this reluctance to argue against such
beliefs that should generally not be argued against.
Before browsing or querying the shared KB, a user can set "filters for certain objects (categories or statements) not to be displayed". These filters may set conditions on statements about these objects or on the creators of these objects. Filters are useful whenever the KB is not sufficiently organized for a user to avoid being overwhelmed by the large amount of information on a certain point.
The above described editing protocols – especially via the specialization and corrective relations that they lead users to set – encourage or enforce a minimal re-use, precision or connectivity between objects: they ensure the "minimal semantic organization" described in Subsection 1.1.2. They also permit a loss-less information integration approach. Interesting aspects of this approach is that it works for semi-formal KBs and is incremental:
This approach can be seen as a precise asynchronous dialog between the knowledge providers. The summarizing expression used at the beginning of this subsection is now more understandable: "editing protocols that permit, enforce or encourage people to interconnect their knowledge into a shared KB, without having to discuss and agree on terminology or beliefs, and while keeping the KB consistent".
The next subsection complements this one by allowing the exploitation of argumentation relations (e.g., corrective relations) in valuations of creators and created statements. Thus, the remarks on the interest and possible applications of the above described approach also apply to – and are re-inforced by – the (results of the) techniques of the next subsection.
Since the approach described in this subsection and the next one works on semi-formal KBs, it is a solution to some problems that most shared information repositories have, e.g. wikis.
From an application viewpoint, the approach seems interesting for collaboratively-built states of the art, corporate memories, catalogs, e-learning, e-government, e-science, research, etc.
This approach – which exploits and encourages the representation of undecomposable objects – presents a viable and better alternative to the classic "module based approach", i.e., the storage of knowledge in separate files, KBs or large contexts, in formal or informal information repositories. It is viable because the information stay minimally organized and hence information overload can be avoided. It is a better alternative because it leads to more relations between objects. Indeed, in knowledge libraries, e.g., existing ones such as the Ontolingua library or imagined ones such as "The Lattice of Theories" [Sowa, 2000, 2005], the "module based approach" is intended to create "minimal and internally consistent theories" to maximize their re-use; however, this also leads to few relations between objects of different modules, as well as implicit redundancies or inconsistencies between them, and hence more difficulties for module creators or module users to compare, merge or relate (objects of) different modules. Furthermore, the isolation of knowledge into large modules is often arbitrary. On the other hand, if needed for some applications, such modules and their inclusion relationships could be automatically generated from the above described shared KB, based on the relations between objects and some criteria for dividing knowledge into modules. This was acknowledged by [Sowa, 2003]. Regarding module generation for inferencing purposes, see [Le Pham et al., 2008].
This shared KB approach removes or reduces many problems related to the existence of (module) versions. If a user adds or removes a definition to a category, she associates a new meaning to the same category identifier. If this leads to manually or automatically detected redundancies or inconsistencies, manual or automatic cloning can solve the problems. To permit this cloning to be replicated in other KBs (e.g., via the mechanism described in the previous subsection), the occurrence of this cloning should be represented. For example:
pm#category_cloning input: wn#bird, output: oc#bird, time: 21/01/2005;
If the cloning has been done automatically, such a statement can be generated automatically. However, to prevent such cloning, whenever possible it is preferable to use precise category identifiers from the beginning, e.g., wn#carinate___carinate_bird___flying_bird. However, since contexts can be used to set constraints on a relation, it is not necessary to declare identifiers such as pm#Paris_as_the_capital_of_France_from_1990_to_2000. It is actually better to avoid such an identifier since, although it is possible to declare it as an extended specialization of wn#Paris___French_capital___capital_of_France (which is an instance of the first order type wn#national_capital), very few tools allow and handle specializations of individuals.
The hypothesis that this shared KB approach relies on are that:
The author of this document has not encountered any ontology integration or mapping that invalidated these hypothesis (in each case, they seemed true). The least conceptually straightforward mapping that he made was between DOLCE and OCHRE, two general top-level ontologies that were somewhat similar but based on different ontological assumptions. Mapping them required the introduction of intermediary categories, as illustrated in the following example. Its approach was validated by the authors of DOLCE.
metaOchre#set_of_DOLCE/OCHRE_types___structure_for_translation_from_OCHRE_to_DOLCE
< pm#non-empty_set_of_types,
member: {(metaOchre#PT metaOchre#P metaOchre#F metaOchre#C metaOchre#A metaOchre#SI
metaOchre#CM)}
{(metaDolce#ED metaDolce#PD metaDolce#Q metaDolce#T metaDolce#T
metaDolce#P_2 metaDolce#P_3 metaDolce#K metaDolce#PC metaDolce#qt
metaDolce#ql)};
metaOchre#PT
definition: "non-empty set of OCHRE's particulars",
member: ochre#PT __[any->?, ?<-any];
metaOchre#TH
definition: "non-empty set of DOLCE's endurants",
= metaDolce#ED,
member: (pm#ochreTH_member_of_metaOchrePT < ochre#TH,
member of=> metaOchre#PT __[any->?]) __[any->?, ?<-any],
member: dolce#ED __[any->?, ?<-any];
metaOchre#TK
member: (pm#ochreTK_member_of_metaOchrePT < ochre#TK,
member of=> metaOchre#PT __[any->?]) __[any->?, ?<-any];
metaDolce#PD
definition: "non-empty set of DOLCE's perdurants",
member: (pm#something_with_member_a_member_of_metaOchreTK
member=> (pm#something_member_of_metaOchreTK
member of=> metaOchre#TK __[any->?]
) __[any->?] ) __[any->?, ?<-any];
//and so on for the other members of metaOchre#set_of_DOLCE/OCHRE_types
WebKB-2 detects potential conflicts between two statements by detecting exclusion and specialization relations between (parts of) them. If one of the statements specializes the other, there is a potential conflict. In WebKB-2, a statement Y is detected as being a specialization of a statement X (i.e., Y includes the information of X and hence either contradicts it or makes it redundant) if X structurally matches a part of Y and if each of the terms in this part of Y is identical or a specialization of its counterpart term in X. Six examples of relations of specialization or exclusion (between statements) that WebKB-2 can detect that way are given by Table 2.2.5.4.1. Two of the six examples illustrate an "instantiation" relation, i.e., a specialization relation where one of the types in the source statement is replaced by an instance in the destination statement. Such a relation does not reveal an inconsistency or a total/partial redundancy that needs to be made explicit, since adding an instantiation can be seen as giving an example for a more general statement. Hence, its detection does not lead the protocol to reject a statement. Apart from this case, if a new belief specializes, generalizes or is equivalent to an already stored belief, and if they are not connected by a correction relation, there is an implicit "inconsistency or partial/total redundancy" between them and the new belief is rejected.
##### : exclusion relation; -------- : specialization relation; ===== : instantiation relation
`no animal can be agent of an action´
# #
# #
`50% of bird can be agent of an action´ `at least 1 bird can be agent of a flight´
| |
| |
`every bird can be agent of a flight´ `1 bird can be agent of a flight´
|| ||
|| ||
`Tweety can be agent of a flight with duration at least 0.5 hour´
In KIF, `Tweety can be agent of a flight´ may be written: |
The above cited matching takes into account numerical quantifiers and measures, not just existential and universal quantifiers. Apart for this, it is similar to the classic graph matching (or "projection") in Conceptual Graphs, which is performed on "positive conjunctive existential formulas, with or without non-restrictive associated outer-contexts". When performed on such statements, this classic graph matching is sound and complete, and can be computed with a polynomial complexity if the query graph (i.e., X in the above description) has no cycle [Mugnier & Chein, 1992] [Chein & Mugnier, 1997]. When performed on other kinds of statements, graph matching for detecting a specialization is not always sound and complete. However, this operation works with languages of any expressiveness and the results of searches for specializations of a query graph are always "relevant for knowledge retrieval purposes". For example, if a query statement is "a bird that is agent of a flight", the answer "no bird can be agent of a flight" is a relevant one.
If two types are not related by exclusion relations, any specialization of one type may also be a specialization of the other. Since it is often not easy to explicitly relate a term to all its generalizations in a KB, a search for the specializations of an object may not retrieve all the possible specializations of that object. This is why, for knowledge retrieval purposes, it is often interesting to search objects which are not exclusive with the query object, rather than looking for its specializations and generalizations. For such querying and other knowledge inferencing purposes, ideally, all the exclusion relations between the direct subtypes of a type should be made explicit. The simplest way to do that is to create direct subtypes only via the use of open/complete subtype partitions. This is one of the "knowledge representation best practices" advocated to the users of WebKB-2. It has not been included as a rule to be enforced by the previously described protocol but it could be.
To establish and display a valuation for an object (product, text, statement, person, etc.) current information management systems
Subsection 2.2.1 cited Knowledge Zone as an example for a system grading ontologies, and listed many reasons why grading multi-statement resources and making statistical measures about these gradings compounds several problems. Some of these problems apply to "multi-criteria decision making" since they order objects according to many criteria, except in the rare case where an object is better than all the other objects for all the considered criteria. Here are some relations between some knowledge valuation related processes.
supporting_the_valuation_of_knowledge_or_knowledge_authors
> {supporting_a_precise_valuation_of_knowledge_statements_or_authors
supporting_free_text_annotations}
supporting_the_association_of_grades_to_knowledge_or_knowledge_authors
ordering_knowledge_or_knowledge_authors_according_to_their_valuations,
subprocess:
exploiting_knowledge_representations_for_valuating_knowledge_statements_or_authors
valuating_how_consensual_the_input_statement_is
valuating_the_interest_of_the_input_statement
valuating_the_usefulness_of_the_input_statement
valuating_the_usefulness_of_the_input_user;
It is not possible to avoid loosing information when creating values synthesizing the opinions of several persons. Hence, as opposed to the techniques of the two previous subsections, the approach presented in this subsection is not loss-less. However, it is meant to reduce unintended loss of information in the valuation process. First, it exploits valuations via knowledge representations and starts from "individual valuations" (i.e., valuations by each person as opposed to "global evaluations" which are already statistical measures on individual evaluations). Second, it encourages the creation of true and precise "individual valuations", via (preferably formal) statements rather than via votes. Third, it allows each user to specify its own method to calculate a global evaluation instead of using the default global evaluation method. Thus, this approach is more a template than a fixed method. It, and more generally all the approaches useful to create a cgosSW, are complementary to multi-criteria decision making techniques in the sense that they permit to gather more precise information than other methods and hence provide better inputs to these techniques when they remain useful helps for making decisions.
It is important to distinguish three complementary kinds of techniques that are called "argumentation techniques" as well as "decision support techniques". One kind is about multi-criteria decision making techniques. A second kind is about logics supporting (or permitting to explain) certain kinds of argumentations, e.g., defeasible reasoning and logics for legal argumentation. A third kind is about "argumentation systems" based on semantic networks of users' beliefs connected by argumentation relations, such as those used in hypertext argumentation tools (e.g., ArguMed, SIBYL, gIBIS, ArgNoter and the ontology-based ScholOnto) or in the following theories described in Wikipedia: argumentative dialogue, critical thinking, computer-supported collaborative argumentation, argumentation-based design rationale and Brandom's model of discursive practice. Some systems of the third kind are also of the second kind. By allowing the creation of argumentation structures and more generally a cgosSW, and by proposing a technique to value contributions and contributors, WebKB-2 belongs to the third kind.
WebKB-2 permits its users to create meta-statements for arguing for or against a statement or for representing various criteria on a statement, e.g., on its originality.
wn#valuation //an indirect subtype of pm#process
> {pm#individual_valuation __[pm] pm#global_valuation _[pm] } wn#believing
valuating_how_consensual_the_input_statement_is
valuating_the_interest_of_the_input_statement
valuating_the_usefulness_of_the_input_statement
valuating_the_usefulness_of_the_input_user;
wn#believing .(agent: ?a, object: ?o)
> pm#finding_the_destination_statement_interesting,
agent: (pm#causal_entity pm#believer of: pm#description_content/medium/container __[?a->?o]
) __[any ?b -> 1..* ?a],
object: pm#description_content/medium/container __[?b -> 1..* ?o];
valuating_the_usefulness_of_the_input_statement .(input: 1..* ?i, output: ?o)
agent: 1..* pm#causal_entity ?a, //i.e. "agent: pm#causal_entity __[any ?v->1..* ?a]"
input: 1..* pm#description_content/medium/container ?i, //i.e. "... __[?v->1..* ?i]"
output: (wn#usefulness ?o pm#measure of: ?i) __[pm#author: ?a];
However, the shared KB edition protocols do not exploit any measure of the "usefulness" of each statement, a value that would represent its "global interest", acceptation, popularity, originality, etc. Yet, this seems interesting for a knowledge repository, especially for argumentation structures or semi-formal discussions (see the examples given in Table 2.1.1.22.1 and in Section 4.1). Indeed, statements that are obvious, not argued, or for which each argument has been counter-argued, should be marked as such (e.g., via darker colors or smaller fonts) in order to make them less visible (or invisible, depending on the selected display options) and thus discourage the entering of such statements.
General idea. When the statements of a user have individual measures of usefulness associated to them, these measures can be used (among other things) to measure the global usefulness of that user. Conversely, the global usefulness of a user can be used (among other things) to weight the individual measures that she creates. This feedback loop should help highlight "good" contributions (well argued, interesting, ...) and this highlighting should encourage them.
Here are descriptions of the default valuation measures that will first be implemented in WebKB-2. Then, users will be allowed to use FS to modify these measures by making them take into account additional elements and use functions designed by the users.
Global measure of how consensual a statement is. Since it does not make sense to believe in a definition, it cannot "directly" be consensual. However, i) the relation of type pm#name (or pm#extended-specialization) between a word and a (defined) category can be consensual, and ii) a measure of how consensual this relation is can also take into account how many times the category is used in statements not authored by the creator of the category. An observation, interpretation or preference can be shared (i.e., can have many believers and/or argumentation relations connected to it) and hence can be consensual. Any statistical measure to sum and weight the three above cited kinds of relations (the ones highlighted with italic characters) would be arbitrary. Regarding the taking into account of argumentation relation, here is one default function that is considered for implementation in WebKB-2.
Global measure of how interesting a statement is. This measure should take into account the individual measures of interest for this statement by users of the KB. These individual measures can be represented using the above cited pm#finding_the_destination_statement_interesting process type (this can then be considered as a binary kind of votes by the users) or take into account similar kinds of votes (or more precise valuations) on more precise attributes such as "originality" and "acceptation".
The types of the beliefs (typically, pm#observation, pm#deduction, pm#assumption and pm#preference), when they are made explicit, can also be used as a factor to calculate the global usefulness of a statement (e.g., by considering that a deduction is more interesting than an observation).
In any case, the usefulness of the valuating users (see the last point below) may be taken into account to weight their votes or representations. Here is one candidate for a default function to calculate this usefulness.
Global measure of the usefulness of a statement. This measure should take into account the above two measures for this statement. Here is one function that is considered for implementation in WebKB-2.
Global measure of the usefulness of a user. This measure should take into account the global measures of usefulness of the statements of the user and may also take into account her participation to valuating statements of other authors. Here is one function that is considered for implementation in WebKB-2:
sum of the global usefulness of the statements from the user +
square root (number of times the user valuated statements of other users)
The second part of this equation acknowledges the participation of the user in valuations while decreasing the weight of these valuations as their number increases. Functions decreasing more rapidly than square root may perhaps better balance the value of the two processes: contributing information and valuating information.
It is clear that the above points constitute only a framework that should be refined. However, even if the above described simple functions are used for the default valuations, it may be hoped that this framework should incite the users to be more careful and precise in their contributions (affirmations, arguments, counter-arguments, etc.) and give arguments for them. Indeed, unlike in traditional discussions or anonymous reviews, careless statements here penalize their authors. This may lead users not to make statements outside their domain of expertise or without verifying their facts. (Using a different pseudo when providing low quality statements does not seem to be an helpful strategy to escape the above approach since this reduces the number of authored statements for the first pseudo). Since counter-arguments must be justified, it may also be hoped that this framework avoid the under-rating of "correct but outside-the-main-stream contributions". Finally, when a belief is counter-argued, the usefulness of its author decreases, and hence this information provider is incited to deepen the discussion or remove the faulty belief. However, there is still a need for some specially privileged users to remove "completely irrelevant statements" (spams) that have been marked as such by some users (and hence that were not prevented by the constraints of the shared KB edition protocols).
In his description of a "Digital Aristotle", [Hillis, 2004] describes a "Knowledge Web" to which researchers could add ideas or explanations of ideas "at the right place", and suggests that this Knowledge Web could and should "include the mechanisms for credit assignment, usage tracking, and annotation that the Web lacks" (pp. 4-5), thus supporting a much better re-use and evaluation of the work of a researcher than what the current system of article publishing and reviewing permits [AFIA, 2002]. However, Hillis does not give any indication on such mechanisms. The approaches presented in this subsection and the two previous ones may be seen as frameworks for such mechanisms. Complementary approaches, e.g., the module-based approach of Co4 for finding consensual knowledge were also cited.
The above described framework may also be seen as the beginning of a technical vision for Brandom's very general "model of discursive practice" [Brandom, 2000]. To support Brandom's vision, it is necessary that the users are able to define their own valuation functions, and it is necessary to exploit an inference system for allowing each user to test the consequences of adding certain statements to the network and for allowing each user to change the "rules of the game for evaluating the interest and veracity of certain statements". The authors of [Keeler & Majumdar, 2008] work on Brandom's statement valuation "game" based on i) the synthesis of [Holland, 1998] about (natural or artificial) mechanisms valuating, selecting or composing facts, methods or genes, ii) an automatic extraction of simple IF-THEN rules from natural languages sentences, and iii) a measure of similarity between these rules.
Although independently developed, the above described framework appears to be an extension of the version designed for SYNVIEW [Lowe, 1985]. In this hypertext system, statements had to be connected by (predefined or user-invented) relations and each statement was valuated by users (this value, and another one calculated from the value of arguments and counter-arguments for the statement, were simply displayed near the statement in order to "summarize the strengths assigned to the various items of evidence within the given contexts"). In 1986, the authors of SYNVIEW removed the constraint of using explicit relations between statements (the statements still had be organized hierarchically but the relations linking them were unknown) and replaced the possibility of grading each statement by the possibility of ranking them within the list of (sibling) statements having a same direct super-statement. This change was made in the hope of easing information entering and thus hopefully permit the collaborative work of a small community towards the creation of an information repository large enough to interest other people and lead them to participate and store information too. A similar move away from structured representations was described in [Buckingham-Shum et al., 1999] for the same reason and the idea of making the approach more "scalable". Although such a move clearly makes information entering easier, Subsection 2.2.1 showed that it actually makes the system far less scalable. Such moves apparently failed to attract more interest than the original more structured approaches.
The next section lists and organizes "best practices to represent knowledge in more precise, organized and normalized ways".
This section lists some rules to normalize knowledge and thus ease its exploitation with simple methods but without restricting the expressiveness of the languages. These rules can be seen as "best practices" and are complementary to those of [Rector & Rogers, 2006] and those of the "Semantic Web Best Practices and Deployment (SWBPD) Working Group" [Swick et al., 2006]. The lexical or structural normalization rules given below may also be seen as ontological normalization rules since, for example, they cite certain relation types. However, unlike the content of Chapter 3 which organizes many conceptual categories that are important for representing knowledge in a normalized way, these lexical or structural normalization rules do not give any ontology.
representing_knowledge
< knowledge_extraction/modeling/representation,
> (following_a_style/rule/method_when_representing_knowledge
> (following_a_precision-oriented_normalization_rule_when_representing_knowledge
definition: "following a knowledge representation style/rule/method that
i) reduces the number of non-easily-automatically-comparable ways
information can be represented (with the criteria for an
easily automatically comparable way being a unique graph matching
process), or
ii) leads to representing a piece of information in a way that
permits to generate (and hence be compared with) the
representations that are created if that method is not followed.
The more precise or organized the representations, the
more normalized they are because the more they make explicit common
basic components (e.g., basic relations) that permit comparisons
with other representations. This also eases readability and
understanding",
> following_a_lexical_normalization_rule_when_representing_knowledge
following_a_structural_normalization_rule_when_representing_knowledge
)
using_a_normalized_and_scalable_input_file_structure_when_representing_knowledge
);
following_a_lexical_normalization_rule_when_representing_knowledge
> (following_a_lexical_normalization_rule_for_category_identifiers
> following_a_loss-less_category_naming_style
using_singular_nouns_or_nominal_expressions_for_category_identifiers
ending_second_order_concept_types_by_class_or_type
ending_second_order_relation_types_by_relation/function_type_or_property
)
following_a_lexical_rule_for_an_informal_category_annotation/definition;
2.3.1.1. Following a loss-less category naming style. Paragraph 2.1.1.24 states that any new identifier following the "W3C category naming style" is converted into the "loss-less category naming style" whenever i) there is a way to quickly convert it back to the W3C category naming style for knowledge export purposes, and ii) this does not lead to a lexical conflict. Indeed, the W3C category naming style is not loss-less and is less readable. Furthermore, using the "loss-less category naming style" with European languages often permits to lexically distinguish between types and individuals since these last ones are often denoted by words with a capitalized first letter. This is as important as lexically distinguishing concept types from relation types (as done with the W3C category naming style).
For many reasons, it would be simpler for people if XML – and hence RDF and then most KRLs since most KRLs are now meant to be translated in RDF/XML – were case insensitive and if the '_' or '-' characters were ignored. Similarly, OWL has different lexical scopes for concept types and relation types; this can be seen as a design flaw by some persons since this makes things harder for the programmers (especially since relation types, like any other kinds of types, should be allowed to be described and hence should be allowed to be used in concept nodes) and it does not make life simpler for the users. Protégé-2000, a popular editor for creating ontologies in RDF+OWL, has a single name space (WebKB-2 too). Domain names, HTML and the Meta Content Framework Using XML [www-MCF/XML, 1997] are case insensitive. The main reason for case sensitivity seems to be that the performance cost of mono-casing in Unicode is important [www-XML-case-sensitive, 2004]. By default, for search purposes, WebKB-2 first converts the query words or category identifiers in lowercase and ignore the '_' or '-' characters as well as the final 's'; if this leads to lexical ambiguities (i.e., several possible identifiers), the given spelling and capitalization are used. In an expression or statement in FE or FCG it is possible to add a final 's' to a name or an unprefixed category identifier when it is used with a numerical quantifier (as in `75% of cats´), as long as there is no lexical ambiguity.
2.3.1.2. Using singular nouns or singular nominal expressions for category identifiers. This is a convention in the Meta Content Framework Using XML and there are many reasons why a singular nominal expression – or a gerund (the '-ing' form of an English verb) for referring to an action – should be used for naming a type of concept or relation. For example, the names "defining" (for the action) and "definition" (for the output of the action or for the relation) should be used instead of names such as "hasDefinition", "definitionOf", "define", "defined" and "definitions".
This reduces the use of verbs for naming relation types and hence the use of relation nodes instead of concept nodes for representing processes. It is better to use concept nodes (with a nominal expression for their types) for representing processes because, as opposed to relation nodes, concept nodes can be quantified in various ways (as in `at least 2 hit´ or `75% of hitting´) and connected via many common relations to other concepts, e.g., relations to represent the agent, object, instrument, goal, time and place of the process. This is easier than declaring relation types with names such as "hits", "hits_with_instrument", "hits_with_instrument_for_a_reason" and defining them with respect to a concept type with name "hitting" to permit the statements using these relations to be automatically compared. This also avoids 'duplications' and permits other users to "correct" a particular basic relation (e.g., the one used for representing the instrument) rather than the whole process. Finally, many users would not actually take the time to define relation types such as "hits" or "hits_with_instrument" with respect to a concept type "hitting". For similar reasons, using names such as "define", "defined" or "definedIn1990" for types of processes should be avoided in favor of the nouns "definition" or "defining" ("two definings" is correct in English, even though it sounds odd). Using this last name is better because it clearly refers to a process and this leaves the possibility to use "definition" for the identifier referring to the output of the "defining" process (in the MSO, wn#definition is the output of wn#defining) or for the identifier of a "definition" relation (e.g., pm#definition in the MSO).
wn#defining .(pm#input: *x, pm#output: *y)
< wn#indexing,
input: (pm#thing *x pm#definition: *y),
output: wn#description *y;
This reduces the use of adjectives as category identifiers. Indeed, the use of an adjective is not warranted when the category does not refer to an attribute or measure, as with sumo#abstract which, given its meaning, should actually have been named sumo#abstract_thing (or sumo#abstract_entity since `sumo#entity = pm#thing´). However, (proto-)ontologies such as WordNet include categories with adjectives, verbs or adverbs as names, and it is interesting to integrate such categories, for example to permit a more direct representation of natural language sentences. This is why in WebKB-2 the following statements are equivalent.
[an abstract process];
[a process, kind: abstract];
[a process, kind: abstract_thing]; //there is a process that is an abstract_thing
[an abstract_thing, kind: process]; //there is an abstract_thing that is a process
Using an adjective for naming a subtype of pm#attribute_or_quality_or_measure
(e.g., wn#red) is not a problem: an adjective is a way to express a measure
(Subsection 3.1.2 gives the rationale for the classification of measures,
attributes and qualities; they include minimizing the number of ontology specific
relation types to use and hence creating simpler and more normalized knowledge
representations). Thus, in a conjunction of
types such as the one in the previous example, WebKB-2 interprets differently the use
of a subtype of pm#attribute_or_quality_or_measure. Since
pm#attribute_or_quality_or_measure is a subtype of
pm#thing_that_can_be_seen_as_a_relation (see Paragraph 2.1.1.14),
FCG statements such as the following ones are considered equivalent.
[a wn#red wn#mat];
[a wn#mat, attribute_or_quality_or_measure: some wn#red];
[a wn#mat, wn#color: some wn#red]; //since wn#color is also a supertype of wn#red
[a wn#mat, wn#red___redness: some wn#red];
The end of Subsection 3.2.4 lists the ways the WordNet categories having verbs, adverbs and adjectives as names are planned to be integrated in the ontology of WebKB-2.
A category with an adjective as name should be defined with respect to a category with a nominal expression as name. To allow this, FL, FCG and FE have some predefined keywords: "important", "small", "big", "great", "good" and "bad". For example:
wn#long :<=> important wn#length; //rationale in the subsections 3.1.2 and 3.2.4:
// wn#long < (wn#length < pm#attribute_or_quality_or_measure)
// wn#length is neither a characteristic (a quality) nor a value (a quale) but
// can be specialized by categories of each kind
Since WordNet relates each of its categories with adjectives as names to its nominal counterpart, definitions such as the above one can be generated. Then, the following statements are equivalent.
[Tom, owner of: a wn#long wn#car];
[Tom, owner of: (a wn#car, wn#length: an important wn#length)];
[Tom, owner of: (a wn#car, attribute_or_quality_or_measure: an important wn#length)];
However, for precision purposes, adjectives should still be avoided. For example:
[Tom, owner of: (a wn#car, length: 7 wn#meter)];
[Tom, parents: the set {John,Mary} ]; //or: [Tom, parents: parents_of_Tom];
[Tom, parent: a wn#man];
On the other hand, using collections in their distributive interpretation is not a problem: they are just shortcuts to avoid repeating relations. For example, the FCG statement, [Tom, parent: {John,Mary}] specializes [Tom, parent: a wn#man] if `John´ or `Mary´ is instance of wn#man. The possibility of using numerical quantifiers is important too since numerical quantifiers can be easily compared and they reduce the need of declaring concept types representing collections.
2.3.1.3. Naming of second order types. To help people distinguish second order types – or types of greater order – from first order types, and hence eases their understanding of the ontology organization, it is interesting to end the identifiers of second order types by
This would for example eases the understanding of SUMO and CYC.
2.3.1.4. Informal definitions and annotations of categories. Like all other definition bodies or statements, informal definitions and category annotations should preferably be undecomposable (to that end, several different definitions or annotations can be written) and should be either be expressions (e.g., nominal expressions or sentences beginning by a relational expression such as "example: " or "e.g., ") or stand-alone statements. These last ones should be understandable without having to know that they are related to a particular category, and hence, they should not use expressions such as "this category". This permits anyone to relate these expressions or statements to other ones (e.g., a formal version or a specialization) or to argue for or against their relations to the category they are associated to. Informal definitions and annotations of types may refer to the category itself or to its instances: it is not possible or handy to either always speak of the instances or always speak of the type. Finally, the more formal the definitions and annotations, the better.
It is better to use a long explicit category identifier than to use an informal definition to compensate for an imprecise short identifier. Indeed, statements should be understandable without having to access the definition of the terms they include (this is a loss of time and may require losing sight of other important information thereby making mental synthesis a more difficult exercise). Hence, for example, declaring the identifier pm/km#sharing_of_a_knowledge_base_that_is_physically_distributed_or_not is at least as good as declaring:
KB_sharing definition: "sharing of a knowledge base, physically distributed or not";
Finally, the "generation of categories for organization purposes" described in
Paragraph 2.1.1.25 participates to lexical and structural normalization.
following_a_structural_normalization_rule_when_representing_knowledge
> (following_a_normalization_rule_for_the_signatures_of_relation_types
> (following_the_graph-oriented_reading_convention_for_the_signatures_of_binary\
_relation_types
subprocess: using_singular_nouns_or_nominal_expressions_for_category_identifiers
avoiding_the_declaration_or_use_of_non-binary_relation_types)
(maximizing_precision_and_organization_while_minimizing_the_use_of_expressive_constructs
subprocess: (maximizing_precision_and_organization
subprocess: relating_formal_and_informal_categories
connecting_or_adding_to_large_ontologies
relating_to_and_organizing_informal_terms_and_statements)
minimizing_the_use_of_expressive_constructs_but_not_high-level_constructs
avoiding_the_declaration_or_use_of_non-binary_relation_types
defining_non-binary_relation_types_with_respect_to_binary_relation_types
(using_subtype_relations_instead_of_instance_relations
> using_first-order_types_instead_of_second-order_types
using_first-order_types_instead_of_individuals)
(keeping_the_relation_type_hierarchy_small_and_organized
subprocess: avoiding_the_representation_of_processes_via_relation_types
));
2.3.2.1. Following the graph-oriented reading convention for the signatures of binary relation types. This was argued for in the first two points for Subsection 2.3.1.2 ("using singular nouns or singular nominal expressions for category identifiers"; this last process can be seen as a subprocess of this one).
2.3.2.2. Avoiding the declaration or use of non binary relation types. Many KRLs and KR systems do not support non-binary relations and there is no common reading convention for non-binary relations. However, many of these relations can be replaced with binary relations. For example, instead of being declared as ternary relations, relation types with names such as "location_between" and "sum" can be declared as binary relations using a set.
pm#location_between .(pm#spatial_object, {pm#spatial_object});
pm#sum_of_real_numbers .({sumo#real_number} -> sumo#real_number);
Relations with such types can be read with the graph-oriented reading convention. Here are some equivalent statements.
[the set {1,2,3,4}, sum: 10]; [sum({1,2,3,4}) = 10]; //in FCG
{1,2,3,4}_[cuml] sum: 10; sum({1,2,3,4})_[cuml] = 10; //in FL
This use of sets may also be a good alternative to variable-arity relations since the use of sets may be more intuitive and there are probably more KRLs able to support sets than variable-arity relations.
2.3.2.3. Defining non binary relation types with respect to binary relation types. Most KRLs with a predicate-logic oriented notation allow n-ary relations but not contexts, e.g., CLIF. KRLs with a frame/graph oriented notation more commonly allow contexts and only binary relations. Whenever possible, if n-ary relation types are declared (e.g., to use temporal and modal arguments), they should be defined using binary relations. Examples in FCG:
[definition_3 .(thing *i, description *o, time_measure *t), supertype: pm#definition,
:<=> [a wn#definition, input: *i, output: *o, time_measure: *t] ];
[definition_3 .(input: *i, output: *o, time_measure *t), supertype: pm#definition,
:<=> [ [a wn#definition, input: *i, output: *o], time_measure: *t] ];
In WebKB-2, given the facility described in Paragraph 2.1.1.21 ("Contextual relations from/to (descriptions of) processes"), these two definitions are seen as equivalent. This facility can be seen as a kind of structural normalization.
These two definitions also state that pm#definition_3 is a subtype of the binary relation type pm#definition. It seems reasonable to allow relation types to be declared as subtypes of relation types having less arguments. Doing so or using variable arity signatures permits to organize relation types into a single specialization hierarchy. In the MSO, only the variable arity signature based approach is currently used. Here are examples in FL (reminder: "*" refers to an unknown number of arguments of unknown types, while "?" refers to a unique argument of unknown type).
relation .(*)
> (relation_from_collection .(collection, *)
> (member .(collection, *)
> kif#first .(kif#list -> ?)
kif#nthrest .(kif#list, kif#natural -> kif#list)
) );
2.3.2.4. Maximizing precision and organization. The more precise and organized the knowledge, the more understandable, retrievable, (correctly) re-usable, scalable and easy to validate it is. Hence, at least until a certain level of formalization is not reached, the more it prevents people to enter redundancies and inconsistencies. (When a certain level of formalization or ontological detail is reached, this opens or reveals (onto-)logical cans of worms and increases the number of alternative (onto-)logical ways to represent objects). Here are some of the things that information providers should do when entering a new category or statement:
Before representing objects, the information providers should i) check which categories share a name with these objects, and ii) check which relation types are associated to these kinds of objects (e.g., via definitions or signatures). Whenever possible, objects should not be used before being declared, i.e., be given at least one type or supertype. Except for the use of certain relation types, this is always possible. This permits early semantic checks and participates to normalize the presentation of the code (and thus, to a certain extent, its content).
In WebKB-2, the shared KB edition protocols enforce a (very) minimal organization. They should probably be completed by "certificates of connectivity" delivered by the system for each newly entered object based on the existence or not of the above listed kinds of relations from/to the object. Then, if needed, certificates of "semantic well-formedness" or "semantic normalization" (e.g., in the spirit of Codd's database normalization levels) could also be envisaged for each statement and hence, via statistical measures, for the whole KB. This paragraph and the next ones in this subsection describe advisable subprocesses of `maximizing_precision_and_organization_while_minimizing_the_use_of_expressive_constructs´. Following, enabling or enforcing the processes described in the previous sections (supporting_knowledge_sharing_between_KBs, supporting_the_collaborative_building_of_the_KB_specified_as_output and supporting_the_valuation_of_knowledge_or_knowledge_authors) are complementary advisable subprocesses. Hopefully, in the future, the lexical, syntactic, structural and semantic rules associated to all these processes will be refined to reduce the entering of formal/informal statements that are over-general, false, redundant or poorly structured and inter-related. The next paragraphs focus on structural and semantic rules for (semi-)formal knowledge.
2.3.2.5. Minimizing the use of expressive constructs but maximizing the use of high-level constructs. The more precise and organized the knowledge, the more understandable, retrievable and re-usable it is. However, the more it uses expressive constructs (e.g., second order statements), the less inference engines can use it and the less the inferences can be efficient, complete or consistent. Hence, for example, using exclusive relations or cardinalities instead of using a general negation on a whole statement is a good thing. However, for general knowledge representation and sharing purposes, there is no point to bias or restrict the precision, organization and readability of knowledge by using low expressiveness constructs (e.g., those of OWL-Lite) when entering it since i) such knowledge is not dedicated to particular kinds of application, and ii) knowledge using expressive constructs can be automatically translated to less precise/correct knowledge using less expressive constructs, or these expressive constructs can be directly only partially interpreted by inference engines (as is for example the case with the ext-gen or ext-spec operators of WebKB-2 which are efficient and give "relevant" results – but not always complete results – whichever the expressiveness of the knowledge is). The next four points lists some characteristics of RDF+OWL related to this issue since RDF+OWL is the de-facto standard general model.
En: Any human body has at most 2 arms. Any arm belongs to at most 1 body.
FL: human_body part: wn#arm __[?->0..2, 0..1<-?];
KIF: (forall ((?b pm#human_body)) (atMostN 2 '?a wn#arm (pm#part ?b '?a)))
(forall ((?a wn#arm)) (atMostN 1 '?b pm#human_body (pm#part '?b ?a)))
RODX: <rdf:Property rdf:ID="ArmPart"><rdfs:subPropertyOf rdf:resource="±part"/>
<owl:inverseOf rdf:ID="ArmPartOf"/>
<rdfs:range rdf:resource="&wn;Arm"/> </rdf:Property>
<owl:Class rdf:about="±HumanBody"><rdfs:subClassOf>
<owl:Restriction><owl:onProperty rdf:resource="#ArmPart"/>
<owl:maxCardinality rdf:datatype="&xsd;nonNegativeInteger">2
</owl:maxCardinality></owl:Restriction> </rdfs:subClassOf></owl:Class>
<owl:Class rdf:about="&wn;Arm"><rdfs:subClassOf>
<owl:Restriction><owl:onProperty rdf:resource="#ArmPartOf"/>
<owl:maxCardinality rdf:datatype="&xsd;nonNegativeInteger">1
</owl:maxCardinality></owl:Restriction> </rdfs:subClassOf></owl:Class>
With (in KIF):
(defrelation atMostN (?num ?var ?type ?predicate) :=
(exists ((?s set)(?n)) (and (size ?s ?n) (=< ?n ?num)
(truth ^(forall (,?var) (=> (member ,?var ,?s)
(and (,?type ,?var) ,?predicate)))))))
Using high-level constructs (short expressive constructs such as numerical quantifiers, expressed via quantifiers/relations such as `atMostN´ or via syntactic sugar such as in FL) is needed. Since writing constructs such as `atMostN´ is not easy and since their definitions are unlikely to be exploited by most current engines, the ontology of a would-be general language such as RDF+OWL-Full should include such constructs. Then, inference engines may be adapted to (partially) interpret them or not. Like FE and FCG, the Lisp-based KRL of the Knowledge Machine [www-KM, 2006] keywords such as "a", "the" and "every" for its existential and universal quantifiers. FL, FE and FCG will continue to be completed to include more high-level constructs for representing expressions common in natural languages such as "1 by 1" or "3 by 3".
2.3.2.6. Using first-order types instead of second-order types. Using second-order types permits to associate properties to a type without associating them to its subtypes too. For example, the OntoClean methodology [Guarino & Welty, 2002] encourages the implicit or explicit use of class properties (hence second-order types) such as rigidity and unity in order to reduce mis-uses of the subtype relation such as for example i) subtyping wn#water by wn#ocean, ii) subtyping a type representing a role by a type not representing a role, and iii) using subtype relations instead of partOf or instance relations. These second-order type of DOLCE are shown by Table 3.1.10.2.
However, using second-order types when first-order types would be sufficient is not a good idea. For example, declaring a second-order type named "product_type" and relating types of products to it via pm#kind relations, instead of declaring a first-order type named "product" and relating types of products to it using pm#supertype relations, is a kind of problem that occurs in general ontologies such as SUMO (e.g., see Table 3.1.10.2), CYC and TAP. Indeed, using supertype relations between first-order types whenever it is possible
OWL proposes a second-order type owl#transitiveProperty since transitivity is a property of a relation type which is not necessarily shared by its subtypes. However, since it is often shared, to ease the organization of transitive relation types and avoid the tedious setting of a pm#kind relation to owl#transitiveProperty from each of them, the MSO proposes the following two types.
rX supertype: relation_instance_of_transitiveProperty_unless_directly_overrided __[.->0];
A more debatable case is the representation of "species" as second order types which have various types of plants or animals as instances. It seems better to represent them as subtypes (or instances) of pm#collection which have various types of plants or animals as members. This is the case in WordNet and hence in the MSO.
wn#class_Mammalia___Mammalia
member: wn#young_mammal wn#mammal
wn#subclass_Prototheria wn#subclass_Pantotheria wn#subclass_Metatheria
wn#subclass_Eutheria wn#Ungulata wn#Unguiculata,
member of: wn#subphylum_Vertebrata,
supertype: (wn#class < (wn#taxonomic_group < wn#biological_group));
2.3.2.7. Using first-order types instead of individuals. Individuals (i.e., instances of first-order types) may also be over-used. For example, it may be tempting to represent a certain doctrine, language, program or day of the week as an individual, but then what about their variants and their occurrences? For example, "Monday" has a potentially infinite number of occurrences, and so has "Whitmonday" (the day after Whitsunday). Considering the currently existing tools, the simplest solution (for people and for automatic exploitation) is to represent "Whitmonday" as a subtype of "Monday" and its occurrences as individuals (anonymous or not). Similarly, an alphabetic character (seen as a symbol) and the content of a book may also have (existing or potential) variants; for example, the Bible has many versions in many languages and the character 'A' has "versions" too (uppercase, lowercase, ...). There are many ways to view, categorize and relate such "versions" but, as for "Whitmonday", using subtype relations seems the simplest way, including for inferencing purposes. This is the option that was chosen when integrating WordNet into the MSO. Things that intuitively cannot have versions, e.g., persons or cities, were declared as individuals in the MSO even though categories such as pm#Paris_as_the_capital_of_France_from_1990_to_2000 can be declared. To relate such an identifier to wn#Paris, an extended specialization relation (e.g., with type pm#term_specialization) can be used. However, it seems preferable to use contexts on statements than to declare such identifiers.
2.3.2.8. Keeping the relation type hierarchy small and organized. The more organized the relation types, the more understandable they are and the more comparable the statements using them are. The smaller the hierarchy of relations, the less 'duplications' with respect to the concept type hierarchy it has and the more it contains only primitive or common relation types, and hence i) the easiest the relation type hierarchy is to understand and use, ii) the easiest it is to find and use these relation types, and iii) the more comparable the statements are. To keep the relation hierarchy organized without drawbacks for the precision of knowledge representations, it is necessary to i) allow subtype relations between types of relations of different arities, as above specified, and ii) organize relation types according to their arguments (other methods can also be used in addition). To keep the relation hierarchy small, certain kinds of concept types should be allowed to be used in relations nodes and hence should be allowed to have associated signatures. Subsection 4.2.13 gives definitions in KIF for relations permitting to generate and organize relation types based on the signatures associated to concept types and the organization of these types. These KIF definitions may be used for exporting purposes or for defining this relation type generation process.
2.3.2.9. Avoiding the representation of processes via relation types. This is a subprocess of the one described in the previous paragraph. Some rationales and techniques have been cited in the previous subsection and the previous paragraph. Relatively few basic relation types are required for most general knowledge representation, e.g., for the representation of natural language. When the content of courses taught at Griffith University were represented in the MSO, most of the used relation types were (without the "pm#" prefix): subtype, instance, specialization, part (physical_part or subprocess), technique, tool, definition, annotation, use, purpose, rationale, role, origin, example, advantage, disadvantage, argument, objection, requirement, agent, object, input, output, parameter, attribute, characteristic, support and url. (This list is ordered topically, not by frequency of occurrence. The representations in this document also use a small number of relation types. This eases the automatic comparison of these representations and hence their retrieval or exploitation.
2.3.2.10. Connecting/adding to large ontologies. The biggest the ontology, the more complete, precise and organized it has to be for being manageable, and hence the more it guides or spares knowledge entering, and the more users (e.g., information providers) it is likely to have. For general knowledge sharing and representation purposes, specializing (or connecting to) a large general ontology not only saves efforts and improve re-usability and scalability but also eases the comparison and retrieval of objects specializing (or connecting to) this ontology. Here are some quick description of some large general ontologies.
2.3.2.11. Relating to – and organizing – informal terms and statements. Relating formal objects to informal ones improves the retrieval of these formal and informal objects and eases their understanding. An informal object may have less meanings or interpretations than another one, and its meanings may be less general. Hence as with formal objects, it is interesting to organize informal objects via specialization relations (manually for informal terms, and manually or automatically for informal or semi-formal statements). To that end, the MSO proposes the pm#extended_specialization relation type and WebKB-2 exploits it.
Even if the two previous subsections include recommendations that may appear obvious to many knowledge engineers, they were not meant to include simple advices that would be obvious to all knowledge engineers. To find simple advices, a well-referenced document titled "Ontology Development 101: A Guide to Creating Your First Ontology" is [Noy & McGuinness, 2000]. However, this guide seems more oriented towards creating an ontology for a particular application than towards knowledge sharing purposes. Indeed, it contains statements that have been argued against in the two previous subsections. Here are some examples.
Here are other examples showing that the recommendations of the two previous subsections are unfortunately not obvious to every knowledge engineer.
The "OWL Web Ontology Language Guide" [Smith, Welty & McGuinness, 2004] uses a small ontology as an example translation in FL (see Table 2.3.3.1 below). This ontology is extremely restricting (it only permits to state sub_area relations), semantically shallow and hard to extend or re-use. Table 2.3.3.2 shows a more scalable approach: i) it uses a process based modeling (the given example illustrates what this permits), ii) it uses types from the MSO and thus only wine#Chateau_Margaux has to be declared, and iii) even when the types already declared in the MSO are counted, this new version is not much longer than the original one. In this original version, dividing the sub_area relation into three subtypes serves no purpose: using one "spatial_part" relation is more general and no more ambiguous. Finally, in the original version, several inverse relations have to be declared because the RDF+OWL/XML syntax does not provide any way to change the direction of a relation node!
wine#ProductionArea > wine#Country wine#Region wine#Vineyard;
wine#hasSubArea (?,?) kind: owl#TransitiveProperty, inverse: wine#subAreaOf,
> wine#hasRegion wine#hasSubRegion wine#hasVineyard;
wine#hasRegion (wine#Country <- wine#Region) inverse: wine#regionOf;
wine#hasSubRegion (wine#Region<- wine#Region) inverse: wine#subRegionOf;
wine#hasVineyard (wine#Region<-wine#Vineyard) inverse: wine#vineyardRegion;
//Examples of uses:
wine#Country instance: wine#France wine#Italy;
wine#Region instance: wine#Aquitaine wine#Roussillon;
wine#Vineyard instance: (wine#Chateau_Margaux wine#vineyardRegion: wine#Aquitaine);
|
wn#vineyard instance: wine#Chateau_Margaux;
/* Concept types and relations already declared in the MSO:
wn#process > wn#production;
pm#spatial_object > (wn#location > wn#country wn#region wn#vineyard);
pm#physical_entity > wn#wine;
pm#attribute_or_quality_or_measure > wn#volume;
pm#object .(pm#situation, ?); //or: wine#object .(wn#process, wn#wine);
pm#place (pm#situation -> pm#spatial_object);
pm#time (pm#situation, pm#time_measure)
pm#spatial_part .(pm#spatial_object, pm#spatial_object) kind: owl#transitive_property;
*/
//Example of use in FCG:
[wine#Chateau_Margaux, pm#spatial_part of: wn#Aquitaine,
pm#place of: (a wn#production, pm#time: 2003, pm#object: some wn#wine,
wn#volume: 20000 wn#liter)];
|
Uses of input files. Creating an ontology, especially a large one, has analogies with programming: this leads to i) modularize the code in various input files, ii) make comments, iii) submit each file to the parser and correct mistakes, and iv) correct structural and ontological decisions which affect many objects. This last point requires modifications of all the affected objects in one transaction, i.e., by re-submitting one or several input files. Thus, like (good) procedural/declarative programs, (good) ontologies cannot (efficiently) be created via a line command interface or a graphical interface that is not multi-views, does not include a textual editor or does not permit to build several input files. In other words, directly updating a KB via a line command editor or a "classic" ontology editor does not permit to design a good KB or to design it solely using the line command editor. Here, the expression "classic ontology editors" refers to
Such a view is indeed necessary to permit people to compare and hence understand objects and their connections, and then make good design/update choices. Most current ontology editors, including Protégé, are "classic ontology editors". One reason is that their simple design pleases (at least the) beginners in knowledge representation. The focus for the interfaces of WebKB (WebKB-1 and WebKB-2) was on
Like most ontology editors and many ontology servers, WebKB-1 does not have a persistent KB: the user must specify which input files should be loaded into the KB before making queries. Thus, WebKB-1 can only be a "personal" KB server. WebKB-2 has a persistent shared KB. Hence, once an input file loads without problem (i.e., without errors being detected) and is considered as "complete" by its author, this one commits its content to the shared KB (i.e., unlike with the previous parsings of the file, the updates to the KB are not rolled back at the end of the parsing of the file) and then avoids to make modifications to this file. Indeed, to be able to re-submit an input file into the shared KB, its author must first (at least implicitly) ask WebKB-2 to delete the content of the committed previous version of the input file. This entails that the parts of this content that were reused by other users must be cloned: their creators become one of these other users. Then, ideally, when the new version of the input file is loaded, the objects that were cloned because of the deletion but that have not been modified in the new version should be un-cloned, i.e., their creator should be changed back. This is easy to describe but this is not yet implemented in WebKB-2. Hence, currently, to incorporate changes made to already committed input files the whole KB should be re-generated. In any case, once committed, an input file should essentially be considered by its author as a backup file for the content it contributes to the shared KB. This is useful since, like all tools, shared KB server have bugs, which means that the shared KB sooner or later gets corrupted and cannot be fully recovered without re-generating it from backup files.
Systematic knowledge modeling and organization of input files. WebKB permits to mix formal and informal statements in a same input file and hence permits not to separate (semi-)formal statements from their documentation or the document elements they index.
A recurrent problem when representing knowledge on a domain is how to organize the input file for grouping related pieces of information together in order to retrieve them efficiently, compare them and thus progressively represent them and make good modeling decisions. Unless one already knows the domain very well, choosing the right domain-dependent conceptual distinctions and corresponding (sub-)sections for grouping information or knowledge representations is difficult. Furthermore, as with any other structuring of a document via sections and subsections, such a choice often depends on the volume of information that one expect to finally have to represent in each (sub-)section. This is very difficult to know in advance and is not a scalable approach.
Fortunately, there exists a systematic, scalable and domain-independent approach since there exist domain-independent conceptual distinctions that are efficient guides for partitioning objects. This approach is simply to use certain partitions of the subtype hierarchy of the MSO (hence, certain levels of it; see Table 3.1.1.1 for the best high-level candidates) for structuring a document in sections and subsections. Which levels are used depends on the volume of information in each section and their different kinds. When a section becomes too big for guiding search and comparison, it can be divided according to subtypes. Adding intermediate sections changes the numbering of subsequent or lower sections (if they are numbered) but, even if this is done manually, this is not an important extra work.
All the input files of the MSO are organized that way: no better way was found for managing the large amount of information that had to be represented. The highest level distinction that is always first used is "situations / entities / roles shared by situations and entities". Then, which distinctions are used depends on the domain. As an example, Table 2.3.4.1 shows the Table of Content of an input file for high-level information security related concepts. Like many other input files of the MSO, this one was incrementally created by systematically dispatching information from many (partially redundant) Wikipedia pages on the subject into sections representing the most important distinctions of the MSO.
Within older input files on domains such as Conceptual Graphs and Formal Conceptual Analysis, the top-level decomposition was less systematic since the titles of the top-level sections were similar to the following ones (in this list, the parenthesis are used for giving examples of subsections): "Events (Conferences, ...)", "Problem-solving processes (tasks, methodologies, ...)", "Description supports (structures, languages, ontologies, ...)", "Description containers (publications, mailing lists, ...)", "Instruments (inference engines, editors, ...)" and "Agents (organizations, teams, ...)".
For files representing the content of certain courses at Griffith University, only a restricted decomposition was necessary. For example, the titles of the top-level sections for the content of a Workflow Management course were: "Process ('Modeled or performed process', 'Process/method to model business processes')", "Variable ('Methodology-related variable')", "Data structure ('Case description', 'Other structure')", "Instrument (tool)" and "Agent (organizations)".
1. General roles -- (in-)security related roles playable by entities as well as situations; threats (errors, failures, faults) are specialized in Section 3 of the Resist ontology 2. Entities -- (in-)security related things that are neither states nor processes 2.1. Attributes (criteria) -- availability, integrity, confidentiality, ... 2.2. Descriptions and description supports -- statements, documents, languages, ... 2.2.1. Regulations 2.2.2. Policies -- computer security policies/models, insurance policies, ... 2.2.3. Models and principles or strategies 2.2.4. Descriptions of controls or measures 2.2.5. Certifications (digital signatures, ...) and results of evaluations 2.2.6. Skill qualifications (diplomas, ...) 2.2.7. Policy languages 2.3. Agents 2.3.1. Organizations -- professional organizations, standard sources, ... 2.3.2. Individuals -- security professionals, hackers, ... 2.4. Instruments (devices, systems, ...) 2.4.1. Hardwares -- security evaluated/focused/threatening hardwares 2.4.2. Softwares -- security evaluated/focused/threatening softwares 3. Situations -- (in-)security related states or processes 3.1. States -- situations that are not processes, e.g., safety and insecurity 3.2. Security threatening processes -- attacking, eavesdropping, ... -- The remaining sections are about "security supporting" processes 3.3. Managing -- economic/sanitary/social/information risk management/engineering 3.4. Evaluating -- security classification/evaluation 3.5. Designing 3.5.1. Applying design principles 3.5.2. Using cryptographic techniques 3.5.3. Supporting authentication and access controls 3.6. Satisfying criteria -- supporting security needs 3.6.1. Supporting confidentiality 3.6.2. Supporting integrity 3.6.3. Supporting accessibility and interoperability 3.6.4. Supporting imputability 3.7. Fault tolerance/forecasting/prevention/removal -- Section 4 of the Resist ontology |
Normalization of input files. Since the above described approach can be seen as a way to normalize input files, in the same way that a certificate of "semantic well-formedness" could be delivered about any statement and thus about a whole KB, a certificate of "semantic-based structure well-formedness" could be delivered about an input file. This would require using a category identifier as the title of a section or using another way to make explicit the association between a section and the category it refers to. In the input files of the MSO, this association is only implicit. Some criteria for "semantic-based structure well-formedness" could for example be:
It is worth noting that the document formating rules that publishers generally impose, or the "document quality standards" that many organisms impose, are inconsistent with criteria for "semantic-based structure well-formedness". For example, Wikipedia's quality standards recommend to avoid using a list format for a text that can be presented using prose.
Use of the approach for organizing informal lists or hierarchies (topic hierarchies, FAQs, menus, etc.). Table 2.3.4.2 shows how the entries of the Google/ODP directory [www-ODP, 2009] can be structured using categories from the MSO (see the terms with a lowercase initial) in order to ease information retrieval.
thing
situation
activity_with_agent_a_person
Business: Industries, Finance, Jobs,...
Games: Board, Roleplaying, Video,...
Shopping: Autos, Clothing, Gifts,...
News: Media, Newspapers, Current Events,...
Recreation: Food, Outdoors, Travel,...
Society: Issues (see below), People, Religion,...
Sports: Basketball, Football, Soccer,...
activity_with_object_a_person
Health: Alternative, Fitness, Medicine,...
entity
spatial_entity
Reference: Education, Libraries, Maps,...
Regional: Asia, Europe, North America,...
physical_entity
non-living-entity
Arts: Movies, Music, Television,...
Science & Technology: Biology, Psychology, Physics,...
Computers: Hardware, Internet, Software,...
living-entity
person
Kids and Teens: Computers, Entertainment, School,...
Home: Consumers, Homeowners, Family,...
|
Here is a categorization of some processes (with some of their related objects) discussed in this section.
knowledge-based_search/retrieval
< (is#information_search/retrieval
> (is#lexical_search/retrieval > is#regular_expression_based_search) )
exclusion: is#lexical_search/retrieval,
> {retrieval_of_data_indexed_by_knowledge knowledge_search/retrieval},
subprocess: document_generation,
instrument: (knowledge-based_search_interface < (search_interface < wn#user_interface)
> {(dynamically_generated_knowledge-based_search_interface
static_knowledge-based_search_interface)} )__[any->1..*];
retrieval_of_data_indexed_by_knowledge
object: a (. pm#description_content/medium/container_indexed_by_knowledge
< pm#description_content/medium/container,
knowledge_based_indexation/annotation: a pm#knowledge_representation),
subprocess: knowledge_retrieval;
knowledge_search/retrieval
> {whole_graph_retrieval graph_portion_retrieval___path_retrieval}
{specialization_retrieval generalization_retrieval comparable_graph_retrieval}
analogy_retrieval
{structure_only_based_retrieval rule_based_retrieval}
{complete_knowledge_retrieval incomplete_knowledge_retrieval}
{consistent_knowledge_retrieval inconsistent_knowledge_retrieval}
knowledge_search/retrieval_in_WebKB,
subprocess: a (. knowledge_inference/reasoning/generation
> {monotonic_reasoning non-monotonic_reasoning}
{consistent_inferencing inconsistent_inferencing}
{complete_inferencing incomplete_inferencing}
{structure-only_based_inferencing rule_based_inferencing} ),
instrument: (search_operator < wn#subroutine) __[*<->*],
object: (a pm#knowledge_representation
object of: a storing_knowledge_assertions_or_queries_in_a_document
a knowledge_comparison );
/* //pm#knowledge_representation was declared at the end of Table 2.1.3.1:
pm#knowledge_representation < wn#symbolic_representation,
> {pm#formal_term pm#formal_or_semi-formal_well-formed_statement};
*/
structure_only_based_retrieval
> { (structure_only_based_specialization_retrieval < specialization_retrieval)
(structure_only_based_generalization_retrieval < generalization_retrieval)
(structure_only_based_comparable_graph_retrieval
< comparable_graph_retrieval) };
knowledge_search/retrieval_in_WebKB-2
instrument:
(search_operator_of_WebKB-2 < search_operator,
> spec gen ext-spec ext-gen specGtypes ext-specGtypes comp ext-comp
spec/noExcl gen/noExcl ext-spec/noExcl ext-gen/noExcl specGtypes/noExcl
ext-specGtypes/noExcl comp/noExcl ext-comp/noExcl
) __[any->1..*];
|
2.4.1.1. Storing knowledge commands (assertions or queries) within an informal document.
Many Semantic Web tools propose special "semantic tags" to include the result of certain semantic queries within a document (at the place the queries are), and such tags are often aimed to be used by Java programmers, not end-users.
Microformats, RDFa and the language of Semantic Wikipedia are now popular ways to hide knowledge within HTML tags. Their ancestor is the approach of Ontobroker [Decker et al., 1998] which parsed knowledge stored within the (invented) "onto" attribute of the HTML tag "A" as in the following example.
<a onto="page:Researcher"></a> <!-- this HTML page is about a researcher -->
<a href="http://www.iiia.csic.es/" onto="page[affiliation=href]">IIIA</a>
<!-- this researcher is affiliated to IIIA -->
Example with a microformat for geographic information:
The birds roosted at <span class="geo"><span class="latitude">52.48</span>,
<span class="longitude">-1.89</span> </span>.
Example with RDFa:
<p xmlns:dc="http://purl.org/dc/elements/1.1/"
about="http://www.example.com/book/wikinomics">
In his latest book <cite property="dc:title">Wikinomics</cite>,
<span property="dc:creator">Don Tapscott</span> explains ...
The book is due to be published in
<span property="dc:date" content="2006-10-01">October 2006</span>.
WebKB-1 permits to store knowledge representations within the "alt" attribute of HTML tags for non-textual elements (IMG, AREA, APPLET, and INPUT). Indeed, "alt" is meant to specify alternate text describing the content of these elements. This was used in WebKB-1 to index images and then allow their retrieval using conceptual queries that can be composed via a menu (http://www.webkb.org/kb/images/clubMed/). Here is an indexation used for one of the images (the representation is in CGLF since it was part of one of the earliest demonstration examples of WebKB-1, before CGLF was replaced with FCG and FE).
<img src="http://www.webkb.org/kb/images/clubMed/cocoTreeOnBeachNearStraightJetty.jpg"
alt="[Coco_tree]-
{ ->(On)->[Beach]->(Near)->[Jetty:*j]->(Attr)->[Straight];
->(Near)->[*j];
}">
The authors of Ontobroker claimed that their approach permitted to avoid repetitions between formal and informal text. This is true at least for simple statements. However, this approach is mainly only useful for representing and hiding simple kinds of knowledge representations since i) using expressive KRLs with this approach is not easy, and ii) most of the representations (e.g., the relations) are always hidden and hence cannot be shown for displaying a precise and organized version of the information or cannot be used for navigating between objects. This is why WebKB does not use this approach but instead permits to isolate FS commands and control structures within special marks, e.g., HTML marks such as "<script language='FS'>" and "<script>". If needed, the document author may hide these commands using HTML comments.
In certain contexts, WebKB-1 also permits to reuse certain HTML marks for simple knowledge representations, as shown by the next equivalent statements.
FE: The car that has for owner John and has for weight 1750 kg.
FCG: [the car, owner: John, weight: 1750 kg]
HTML: <dl><dt>The car <dd>owner: John
<dd><dl><dt>weight<dd>1750 kg</dl></dl>
To my knowledge, such an integration of a structured document editor with an inference engine has so far not been replicated. However, when Amaya (the W3C/INRIA browser based on Thot but using XML and CSS) finally has data/presentation/event model specification languages and graph editing capabilities as powerful as those of Thot, my work with Thot could be re-used to create an API for Amaya allowing i) the graphical display, editing and embedding of knowledge representations in XML/HTML documents, and ii) the exploitation of external inference engines to check these representations and answer queries. Such an API would likely be very popular since many KR notations currently exist and are not easy to display, edit and embed within XML/HTML documents.
Here are some types and relations to begin categorizing some of the above approaches and thus for example guide an author of a microformat-based approach in its categorization. This is also an example of an organized summary of information (the one given in the first above bullet point).
storing_knowledge_assertions_or_queries_in_a_document
> (creating_a_document_intertwining_formal_representations_and_informal_text
> {( (inserting_formal_representations_in_a_structured_document_language_or_editor
> inserting_XML-based-formal_representations_in_an_XML_document)
inserting_formal_representations_without_structured_document_language/editor )}
(using_a_command_based_KRL_within_informal_text
> using_FS_within_an_informal_text)
(using_a_programming_API_within_an_informal_text
> using_MQL_within_an_informal_text),
subprocess: (embedding_formal_or_informal_representations_between_special_marks
> (hiding_formal_representations_within_XML/HTML_tags
> using_micro_formats using_RDFa)
(embedding_formal_representations_between_marks
> embedding_formal_representations_between_HTML_marks
(embedding_formal_representations_between_FS_marks
object: pm#AND-set_of_statements_in_FS,
description_place: pm#document_element,
instrument: (FS_mark < pm#string,
> (FS_end_mark instance: "<script>" ")$")
) ) ) )
);
//subprocess: knowledge_extraction/modeling/representation; //already stated
2.4.1.2. Indexing any document element (DE) using knowledge. WebKB-1 permitted to refer (and index) a DE before languages such as XLink, XPointer, XQuery and XPath appeared and, unlike them, permits to refer to any DE. Indeed, even if the DE has not been isolated via XML or HTML tags, WebKB-1 permits to refer to it by indicating its content and its number of occurrence in the document. Here is an example of two DE indexations in WebKB-1.
$(Indexation
(Context: Language: CG; Ontology: http://www.bar.com/topLevelOntology.html;
Repr_author: phmartin; Creation_date: Mon Sep 14 02:32:21 PDT 1998;
Indexed_doc: http://www.bar.com/example.html;
)
(DE: {2nd occurrence} the red damaged vehicle )
(Repr: [Color: red]<-(Color)<-[Vehicle]->(Attr)->[Damaged] )
)$
$(DEconnection
(Context: Language: CG; Ontology: http://www.bar.com/topLevelOntology.html;
Repr_author: phmartin; Creation_date: Mon Sep 14 02:53:36 PDT 1998;
)
(DE: {Document: http://www.bar.com/example.html} )
(Relation: Summary)
(DE: {Document: http//www.bar.com/example.html} {section title: Abstract} )
)$
2.4.1.3. Sending commands with GET parameters, generating virtual documents and associating (menus of) commands to objects. The WebKB-1 and WebKB-2 servers can be sent one or several FS commands using the GET and POST parameter encoding methods of the HTTP protocol. The graphical interfaces for these servers may use HTML forms which send GET/POST commands to these servers. In answer to such such requests, these servers always generate and send back an HTML document containing the results of the commands (if they have results or if their execution must be acknowledged), in the given order of these commands, and preceded by text of the commands when this is specified via a presentation option. To send a command, an alternative to using a form is to use an HTML hyperlink whose destination URL begins with the URL of a server and whose GET parameters includes commands: clicking on the hyperlink calls the server and permits to see the results of the command. This execution of commands by activating an hyperlink is called "dynamic/virtual document generation" by researchers on hypertext/structured documents.
By default, in the results provided by WebKB-2, each displayed category name (identifier or simple name) is within an hyperlink that can call WebKB-2 with a search command permitting to see the direct relations from this category and the hierarchy of all its important supertypes. Each name of a transitive relation type is within an hyperlink that permits to call WebKB-2 to see the hierarchy of the transitive closure of this relation from this category. Menus at the bottom of each generated page (see Figure 2.4.3.2) permit to change the default search and presentation options (e.g., the exploration depth, the used KRL, the kinds of filtered out relation types, and the kinds of authors the knowledge of whom should be filtered out). Figure 2.4.3.3 shows the current interface for selecting some of these options in WebKB-2. This approach permits
Subsection 2.4.3 illustrates all this. This approach is also very flexible and relatively easy to implement. However, instead of only associating a search command to objects, it would be preferable to associate a cascading pop-up menu showing i) the direct and indirect relations from/to the object, and ii) commands for adding, removing, searching or comparing relations from/to this object or comparing it with a related object, with for each command a sub-menu for changing the default options. This would avoid the necessity to use separate menus for finding or accessing commands that may apply to the selected object. Separate menus require scrolling or opening another window and hence require to hide some of the displayed information. This object-centered interface is currently only partially implemented in WebKB-2 and hence is not yet usable on its main site (www.webkb.org).
2.4.1.4. A language of commands for a RESTful Web service. Despite the advantages of the approach described in the previous paragraph, WebKB-1 and WebKB-2 still seem to be the only knowledge servers that propose a language of commands that can be used with GET parameters and hence
When they can be sent commands, other knowledge servers use verbose messages such as KQML/SOAP/XML messages or use an API and sends calls to the server via protocols such as AJAX or the Java Message Passing Interface. The two approaches (RESTful and message-based) are complementary.
2.4.1.5. Script/shell-like commands, document exploration/exploitation/generation commands and knowledge search/generation commands. WebKB-1 and WebKB-2 provide these different kinds of commands. Here are examples of commands that can be selected (and then sent to WebKB-1 for execution) at http://www.webkb.org/interface/syntaxBasedIR.html
//the next command loads in parallel all the files accessible via hyperlinks (1 level of
//exploration) from http://www.webkb.org/kb/index.html and, via the shell "pipe", applies
//the shell command "grep" on each of them to see the lines including the word "knowledge"
accessibleDocFrom -maxlevel 1 -HTMLonly ../kb/index.html | grep -i knowledge
//example of a simple loop; a loop is a control command:
set a "1 2 3"; for i in $a { echo $i; };
Here are examples that can be selected (and then sent to WebKB-1 for execution) at http://www.webkb.org/interface/knowledgeBasedIR.html
//maximal join on statements that include (a specialization of) KADS1_Model_of_Expertise load http://www.webkb.org/kb/webkb1/KADS1.html; spec [KADS1_Model_of_Expertise] | maxjoin; //display of the subtype hierarchy of thing_needed_for_KADS_knowledge_engineering spec thing_needed_for_KADS_knowledge_engineering;
The Sisyphus-I problem – an office allocation problem originally used for comparing knowledge acquisition tools and later also used for comparing CG inference tools (Sisyphus-I track of ICCS 1999) – was solved in WebKB-1 using a procedural script written in FS (searches for specializations of types and statements were combined via control structures grouped in procedures) [Martin & Eklund, 1999a]. The three files (ontology, dataset and procedures) for this experiment, the menu for loading and executing them, and the article explaining the solution are accessible at http://webkb.org/kb/webkb1/sisyphus1.html
2.4.1.6. Options to display the indexed document elements. When WebKB-1 retrieves knowledge representations in answer to a query, depending on the selected presentation options, it displays each representation and/or the DE itself. Each displayed DE and representation is associated to an hyperlink (directly, or via an hyperlink that follows the DE/representation and that has "Source" for textual content) that permits to retrieve the Web file – and the place within this Web file – where this element (representation or indexed DE) is stored. This hyperlink does not directly refer to the file (e.g., an input file) where this element is stored since this element may not be referable via a URL. Instead, the hyperlink contains a call to WebKB-1 which makes it generate a copy of the file (opened at the place of the element, when this is an HTML file) with the element preceded by "------------ Source -----------" and followed by "---------------------------" or simply highlighted in pink if it is a block of text (e.g., if it is an indexed sentence or a description in FCG).
2.4.2.1. Background. In this document, "(conceptual) graph" is a synonym of "(semi-)formal statement". This synonymy is used a lot in this subsection, not much in the rest of the document. Graph matching (or "projection") to test if a query statement generalizes another statement is a classic operation at the core of many inferencing mechanisms in CG related works since 1984 and other works such as for example Algernon, an inference system based on a tractable reasoning system called Access-Limited Logic [Crawford & Kuipers, 1991]. Projection has been shown to be a sound and complete way to test if the query graph is a logical deduction of another graph when both are "simple graphs" (non-redundant existentially quantified conjunctive formulas) and even when this other graph has a context considered as non-restrictive ("positive context"); in both cases, if the query graph is a tree (i.e., without cycle), the projection can be computed in a polynomial way [Chein & Mugnier, 1997]. Rules with simple graphs as premise and conclusion, interrelated by variables, have been shown to be sound and complete with respect to first order logics (FOL) [Salvat & Mugnier, 1996]. Full CGs (simple CGs plus classical negation) have also been proved sound and complete with respect to FOL [Kerdiles, 2001]. An algorithmic study of deduction in simple CGs with classical negation can be found in [Leclère & Mugnier, 2008]. These last cited works are implemented in CoGITaNT [Genest & Salvat, 1998], the successor of CoGITo – the CG workbench that was re-used and extended in CGKAT and WebKB-1. Unlike WebKB, various systems – e.g., MIEL [Haemmerlé et al., 2007] – handle simple CGs extended to take into account fuzzy values.
2.4.2.2. Efficiency of searching graphs in WebKB-2. In WebKB, the 'spec'/'gen' and 'ext-spec'/'ext-gen' operators use a graph-matching algorithm on all statements in the KB and these algorithms use a simple depth-first exploration of both graphs in parallel (the query graph and a candidate asserted graph). This exploration starts from their first nodes and continue as long as the types and quantifiers of the nodes match (otherwise the exploration is started from another node in the query graph or then in the other graph). Checks are made to avoid loops.
Hence, these algorithms do not have a polynomial complexity and bring nothing new from a theoretical viewpoint. Indeed, given these algorithms are meant to compare any pair of graph of any expressiveness (level by level when contexts exist and, for example, with a quantifier such as 'at least 85%' specializing the quantifier 'at least 60%' or 'most'), the complexity of the structures to compare meant that writing a polynomial algorithm would have been complex even though possible since the graph matching approach is no different from the one in a classic projection.
As previously noted, when graphs that are not simple graphs are compared, the graph-matching in WebKB does not amount to checking that the query graph is a logical deduction of the other one, it amounts to checking that this other graph is "relevant" to the query.
Given that for their queries people most often use statements without cycle nor context, the fact that these algorithms do not have a polynomial complexity never proved a problem even in the case of a search of extended specializations with hundreds of potentially relevant (simple) graphs to compare with the query – more precisely, hundreds of potentially relevant graphs amongst about 3000 graphs and with about 200,000 relations between categories. These last two numbers are not very important given the efficient category indexation provided by the underlying OODBMS: the answer was always computed within a second or two. This can be checked by issuing queries at http://www.webkb.org (the computing time calculated using the Unix function 'gettimeofday()' is given at the end of the results). The time to load the database in memory (only for the first call to WebKB-2 if the calls are temporally close to each other) and the network delay (the machine where the main site of WebKB-2 is hosted has so far been in Australia) are more important factors since they each are at least two seconds long. The hosting machine is a classic Unix individual workstation and, so far, the graphs are not even ordered in a specialization hierarchy.
2.4.2.3. Extensions to traditional search operators and implications on the structure of the KB of WebKB-2. Such searches for specializations (or generalizations) of a query graph permit searches "by the content" and are relatively classic, even though an originality of WebKB-2 is to exploit an extended specialization relation on arbitrary graphs and to exploit it in a large KB. Below are some extensions to the methods for these classic searches. These extensions are currently not used in CG related works, except for one: a rule-based generalized form of "the third kind of extension presented below" is used in GALEN and presented in [Rogers & Rector, 2000].
Except for the first extension, the full implementation of these extensions in WebKB-2 has been put on hold because of a near-future structural change of the KB that will seriously ease i) their implementation, ii) knowledge navigation and presentation (especially in FL), and iii) a full implementation of certain fine points of the collaboration supporting mechanisms. This structural change consists in not keeping separate the statements provided by the users but integrating each of their relations into a unique network the way it is presented in FL (thus, with contexts associated to relations rather than to whole statements). Hence, at least internally, this new network will be more similar to those of description logics or other traditional logic-based semantic networks than to CG-based KBs (where users' statements are kept separate, as entered). In addition to easing the full implementation of the extensions, this structural change will speed up graph comparison because the structures will be simpler (the tables/classes ConceptNode and RelationNode will not be needed anymore) and hence faster for an OODBMS to traverse or manage.
Using unions of graphs. Let us assume the KB includes the separately entered FCGs [John, owner of: a car] and [John, owner of: an apartment]. A classic search for graphs specializing the query FCG [a man, owner of: a car, owner of: a lodging] would not retrieve the previous graphs since only their union specializes the query graph. When WebKB-2 tests if a graph G is a specialization of the query graph, it also looks for more information in graphs related to G i) by a same individual (i.e., with a same identifier or coreference variable), ii) that use a type in G with a universal quantifier (with an existential quantifier, there may not be any connection), or iii) that define necessary conditions for a type in G. If the union of G and those related graphs permits to answer the query graph, they are displayed but separately since joining them would often not produce a meaningful graph (e.g., their embedding graphs could not be joined). As another example, the following two FCGs could also be presented in answer to the previous query:
[ [[the IBM_employee Tom, owner of: an apartment], time: 2000], author: Tom] [ [any IBM_employee, owner of: a car], author: IBM]
However, WebKB-2 is not able to perform type expansion (i.e., replace a type by its definition within a graph) nor exploit rules – unlike in the CG tools CoGITaNT [Genest & Salvat, 1998] and Corese [Corby et al., 2004] for example; see also [Le Duc & Le Thanh, 2003] for a study about the combination of revision production rules and description logics.
With the future network tightly integrating all the statements, searches for query graph specializations will not retrieve separate users' statements (as entered) but portions of the network. If two of these portions overlap, presenting them separately or not will depend on a presentation option. By default, they will not be presented separately.
Exploiting partOf relations along with generalization relations. For instance, let us assume that the categories representing the geographical areas 'Gold Coast' and 'Southport' are connected via a pm#part relation and that the KB includes the following FCG.
[spamOnly@phmartin.info, agent of:
(a renting, object: (an apartment, part: 1 bedroom, place: Southport),
instrument: 140 Australian_dollars, period: a week,
beneficiary: Spirit_Of_Finance)]
The pm#part relation between the categories for 'Gold Coast' and 'Southport' should be exploited to find that this graph specializes the following one.
[an apartment, place: (a district, part of: Gold_Coast)]
In the general case, various rules such as those given in [Rogers & Rector, 2000] are required to state what can be deduced from certain combinations of certain partOf relations and generalization relations.
The operators spec, gen, ext-spec, ext-gen, specGtypes, ext-specGtypes, comp and ext-comp. Let us assume that the graph [John, owner of: a housing] is in the KB and that a query graph is [a man, owner of: an apartment]. The first graph is not a specialization of the query graph; it is only a "comparable" graph since wn#housing is a supertype of wn#apartment (hence a comparable type), not a subtype. However, a user may want such a graph to be provided as an answer for the above cited query graph. This is why WebKB-2 provides the search operators 'specGtypes' and 'ext-specGtypes' (alias '?') that respectively extend 'spec' and 'ext-spec' by also taking into account supertypes of the types of nodes in the query graph, not just subtypes or identical types. Except for type comparisons, 'specGtypes' and 'ext-specGtypes' are identical to 'spec' and 'ext-spec': the quantifiers of a matching graph must specialize or be identical to their counterparts in the query graph, and the matching graph may have more relations and nodes than the query graph.
The operators 'comp' and 'ext-comp' lift these last constraints: the first can be seen as a union of 'spec' and its inverse 'gen', while the second can be seen as a union of 'ext-spec' and 'ext-gen'. They proved necessary in an application for the retrieval of accommodations and other tourism related services on the Sunshine Coast (Australia) since many hotels had not published certain services or their prices but still offered such services. In the interface of this application, a checkbox permitted to allow the "listing hotels not providing information to answer the query", and hence the use of the last cited operators.
The operators spec/noExcl, gen/noExcl, ext-spec/noExcl, ext-gen/noExcl, specGtypes/noExcl, ext-specGtypes/noExcl, comp/noExcl and ext-comp/noExcl (alias '??'). A category X 'gen/noExcl' a category Y if X generalizes ('gen') Y (i.e., is a supertype of Y, a type of Y or a supertype of a type of Y) or is a specialization of a generalization of Y which is not exclusive with Y and which does not specialize Y. The 'gen/noExcl' operator is like the 'gen' operator except that it uses the above described gen/noExcl relation type instead of the pm#type_generalization relation type for comparing categories. Its inverse operator is 'spec/noExcl'. The other operators cited in the title of this paragraph are similarly constructed by extending those cited in the title of the previous paragraph. The rationales for these operators are similar to the rationales for extending 'spec' with 'specGtypes' and 'comp': even if this is not made explicit in the KB, the above category X could be a generalization of Y or they could have shared specializations, and hence Y is a relevant object for a spec/noExcl query using X. For example, a statement using (specializations of) the role types wn#adventurer or wn#achiever (which are non-exclusive subtypes of wn#person) is a correct answer to 'spec/noExcl [a wn#achiever]'.
In many research works, e.g., [Corby et al., 2007], search operators are based on similarities between categories and these similarities are often computed via statistical measures (sum, ...) based on the number of specialization or generalization relations that permit to connect two categories. The above cited operators could be further specialized by other operators that also exploit such statistical measures. The important point is that categories directly or indirectly related by an exclusion relation are considered as not similar. Within the MSO, many direct or indirected exclusion relations have been set between the top-level categories of WordNet (about a hundred categories) and hence between a good part of randomly selected pairs of WordNet categories. Unfortunately, below these top levels, few exclusion relations exist between WordNet categories.
Path expressions. They should be permitted in query graphs. This is partially the case in SPARQL, the query language proposed by the W3C for querying RDF resources. SPARQL has been extended in various directions, e.g., for update purposes [Seaborne et al., 2008], for making it query XML embedded in RDF [Corby et al., 2009] and for extending its path description features with regular expression patterns [Alkhateeb et al., 2009]. In any case, the SQL inspired syntax of SPARQL is hardly a model of flexibility and concision, and is more adequate for retrieving individuals than (portions of) graphs. Using common regular expression operators (with '*' meaning '0, 1 or many times', '+' meaning 'at least 1 time' and '?' meaning '0 or 1 time') in query graphs expressed in a language such as FCG brings flexibility and concision. Let us assume the following FCG is in the KB.
[spamOnly@phmartin.info, agent of: (a research, within_group: KVO_group)]
Users looking for a person conducting research on the "Gold Coast campus of Griffith Uni." (QLD#GCcGU) are unlikely to find this graph via classic searches for specialization only. However, since the category pm#School_of_IT_at_GCcGU is connected via a pm#part relation to pm#KVO_group and via a location relation to QLD#GCcGU and since pm#relation is the uppermost relation type, it should be possible to find this graph with any of the following queries using FCGs.
spec [a person, agent of: (a research, relation ^1..*: GCcGU)];
spec [a person, agent of: (a research, relation ^+: GCcGU)];
spec [a research, (relation: a thing)^1..* place: GCcGU)];
spec [a research, relation ^1..3 (part of: a group)^3 place:GCcGU)];
//('^3' means that at most 3 relations of the specified type should be traversed).
2.4.2.4. Searching categories. The above cited operators can be used for searching categories, not just graphs. For example, 'spec "cat" -depth 3' displays all the categories with name "cat" with their direct relations and their specialization hierarchy on three levels – direct relations are displayed (or not) from each of the specialization depending on the selected presentation format. The command ' ext-spec "cat" ' displays i) all the recorded direct relations from the string "cat" (this includes its direct extended specializations such as the categories having "cat" as name), and ii) the hierarchy of its extended specializations.
Another way to search categories is to specify variables in a query graph, as in the following example:
spec [a person ?x, agent of: (a research, place: ?y)]
In such cases, instead of returning graphs, the search operators return the set of identifiers of the instances corresponding to these variables in the graphs (e.g., {pm, GCcGU}) or, when an instance is not specified in the graphs, the node is given, as in {pm, [a place]}. There is no difference between such (query) variables and (coreference) variables used for specifying a cycle in the graph; hence, if there is a cycle in the query graph, some variable should be used for both goals.
2.4.2.5. Filtering categories or graphs. Before it is displayed, knowledge may be filtered out. WebKB-2 permits to use two filters: one for which the given users, categories or graphs specify that only knowledge believed by these users and specializing these categories or graphs can be displayed, and one for which the given users, categories or graphs specify that all knowledge believed by these users and specializing these categories or graphs cannot be displayed. Both are evaluated on any piece of information to display, and in that order.
Depending on the selected presentation options, a filtered out category or graph is fully hidden or displayed via a simple dot (such dots are useful in long hierarchies of categories or statements in order to better see that some levels have been filtered out). In the near future, this dot will have an associated menu that will show its identifier and – like other categories in this near future – direct relations from/to it and commands that can be applied to it.
Currently, WebKB-2 does not provide a more precise presentation scheme to display certain categories or graphs in smaller/larger font or with particular effects (italic, bold).
2.4.2.6. Comparing categories via search commands. Here are two examples using FL graphs.
? [wn#cat < wn#feline]; spec [wn#cat pm#relation 4+: wn#dog];
2.4.2.7. Categorizing operators for knowledge search or generation – future research work. This subsection has proposed five extensions to the classic search of specializations of a query graph, derived 15 operators from the classic "spec" operator, and mentioned other search and presentation options. Various options or parameters can also be associated to knowledge generation operators such as a "maximal join". For example, one of these options can specify if such a join is maximal with respect to the number of matched or final (concept or relation) nodes in the result graph. It would be interesting to categorize all these operators – with their extensions and options – as specializations of pm/km#knowledge_search/retrieval.