Jim Schoening's forwarding of my summary of WebKB-2 and its "multi-source ontology"

                        Message 11936 of the SUO list

Subject: SUO: Fw: Multi-Source Ontology (MSO)
Date: Tue, 2 Dec 2003 06:54:36 -0500
From: jim.s3@juno.com
In reply to: msg11841 by John Sowa
Follow-Up: msg11943 by Stefano Borgo, msg11941 by John Sowa, msg11940 by Jon Awbrey


SUO WG,

        Philippe Martin has joined this list and proposed the
Multi-Source Ontology (MSO) as a started document.  Is there a second?
(Only one needed.)  If so, this is open for discussion.

Jim Schoening


--------- Forwarded message ----------
From: Philippe Martin <phmartin at phmartin dot info>
To: standard-upper-ontology@ieee.org
Cc: spamOnly@phmartin.info, jim.s3@juno.com
Date: Tue, 02 Dec 2003 06:18:11 +1000
Subject: Multi-Source Ontology (MSO)

Jim,

Thank you for the invitation to introduce the Multi-Source Ontology (MSO)
of the WebKB-2 knowledge server (www.webkb.org) as a proposed starter
document. I also thank John Sowa for having made this proposal last week.
I first introduce WebKB-2 and its underlying approach, then its MSO
and finally gives URLs for testing and further details.


WebKB-2 is a shared knowledge server: it permits Web users to search and
update a large shared knowledge base (KB) on the WebKB-2 server machine.

WebKB-2 is also a private knowledge annotation server: it can access
Web-accessible files on users' machines, and execute knowledge commands
(i.e. statements (assertions and definitions) and queries) in these files,
and optionally add their statements into the shared KB.
To ease knowledge documenting, the commands may be mixed with other 
document elements (e.g. text in HTML) as long as these command are properly
isolated from the rest (e.g. within special tags). Indexation commands also
permit users to link any document element (e.g. any part of an HTML file) to
a statement by a "representation link". Then, such links may be exploited to
display document elements instead of statements in answer to queries.

I have been developing WebKB-2 since January 2000 (financially supported by
the DSTC, Australian W3C office) above an object-relational DBMS called
FastDB (for the main-memory version) or Gigabase (for the disk-based version). 
WebKB-2 is a partial rewriting and extension of WebKB-1 which focused on
private knowledge annotations and had no DBMS capabilities (persistence,
transactions, ...). WebKB-1 (developed from January 1997 to december 2000,
financially supported by the DSTO, Australian defence research center) was a
partial and Web-based rewriting of CGKAT, a knowledge acquisition tool that I
developed during my PhD thesis (at the INRIA Sophia Antipolis, France) above
the conceptual graph workbench Cogito and the structured document editor Thot
(the code of Thot is now used in Amaya, the prototype browser of the W3C).

WebKB-2 and WebKB-1 parse the FS language, which has various sub-languages:
- FT (For Taxonomies), a simple language for links between categories
  (subtypeOf, instanceOf, exclusion, identity, WordNet links, etc.),
- FCG, my adaptation of CGLF (Conceptual Graph Linear Format) for a more
  readable and normalized representation of expressive knowledge,
- FE (Formalized-English), a notation structurally identical to FCG but
  with syntactic sugar which makes it look like English,
- CGLF, CGIF, KIF, ... (they are sub-languages of FS but are currently
  only very partially parsed by WebKB-2;  WebKB-1 fully parses CGLF),
- FC (For Control), some basic procedural control structures,
- queries on links and graphs (various kinds of graph-matching are possible),
- commands for the various parsing/display options.
RDF/DAML+OIL is also partially parsed but is is not a sublanguage of FS.
Knowledge entering/querying may also be done via HTML menus which are
translated into commands before being sent to the WebKB-2 server via the
CGI protocol.


In WebKB-2, every element (i.e. category (type or individual), link between
categories, or graph (i.e. logical statement that is not just a link)) must
have a recorded creator (user or source, represented by a short identifier
and/or a URL). Hence, the expressions "Multi-Source Ontology" or 
"Multi-Source KB". Category identifiers may also include various "names" for
the category, as long as theses names are separated by at least two underscores
(unlike an identifier, a name may be shared by various categories).
Some examples of category identifiers are:
- wn#domestic_cat__cat__house_cat   (identifier for a WordNet category having
  "domestic_cat" as "key name", plus "cat" and "house_cat" as other  names)
- wn#domestic_cat   (shorter identifier for the same category)
- #domestic_cat     (idem; special shortcut for WordNet categories because
  they currently form 92% of all categories in the WebKB-2 default ontology)
- spamOnly@phmartin.info (a possible identifier for myself)
- http://www.webkb.org/doc/ (a possible identifier for the WebKB documentation)
Within graphs, names can be used instead of identifiers when there is no 
ambiguity (e.g. when a name refers to only one category or the signatures
of the used relations permit to discard all candidates categories but one).

A user may add a link between categories she has not created, or use them
in graphs, unless this leads to an inconsistency with an already entered
statement (if the knowledge entering is done via commands in a file, the
parsing of the file continues but no updates will be committed).
A user may remove an element only if she has created it.
For safety reasons, addition of links redundant with an already entered
link are rejected. Addition of redundant or partially redundant graphs are
rejected unless the redundancy comes from the user stating that she corrects
another user's statement via one of the following inter-graph relations: 
pm#corrective_specialization, pm#overriding_specialization, 
pm#corrective_generalization or, if none of the previous ones apply,
pm#correction. For example, assuming that oc#statement_on_bird_28 is the
identifier for the statement "birds fly" (in FCG: [any bird, agent of: 
a flight]; in FE: `any bird is agent of a flight') created by the user oc,
I can state via the following FCG that I believe that a corrective 
specialization of that statement is that "according to a study by
Foo@bird.org, in 1999, 93% of birds in good health are able to fly":
  [oc#statement_on_bird_28, corrective_specialization:
     [ [ [93% of (bird, experiencer of: a good health),
           can be agent of: a flight], time: 1999],
       source: (a study, author: Foo@bird.org)] ];
 //the creator and identifier of the graph are automatically added by WebKB-2
Corrective relations can only apply between assertions from different
users. There is no need for them between definitions: a definition
cannot be false, and whatever the concept the creator of a category has
(implicitely) in mind, that category refers to that concept, and when another
user is tempted to give another definition to that category, she actually
thinks of another concept and hence should instead define another category
(and link it to as much related categories as possible).
If the creator of a category see that other users have misinterpreted it
(e.g. when specializing it), she should add definitions/constraints to her
category to avoid such mis-interpretations. To escape the inconsistencies
that such new constraints are likely to bring to the KB, I see two stategies:
- automatic resolution by "cloning": the system keeps the old version of the 
  category (a generalization of the new version) and gives it a new generated
  identifier; actually, many generalizations and specializations of the
  category may have to be "cloned", so the general case is complex to
  implement, semantically sub-optimal and difficult for the users to
  understand (see www.webkb.org/doc/coopKBbuilding.html if you want to get
  an idea of what that approach leads to); hence, I have not adopted this
  strategy in WebKB-2;
- the creator of the category does not (cannot) do modifications that lead
  to detected inconsistencies (although she can give it a more general name,
  more adequate to the specialization that other users have given it) but
  she specializes it (and if needed, generalizes other categories by it).

This approach permits each user to enter or re-use as many categories as she
wants, use the names or identifiers she wants (alternative identifiers may
be introduced by connecting categories with identity links), filter out the
categories or statements she does not want to see, state her beliefs while
keeping the KB consistent (the FCG in the above example is not inconsistent
with oc#statement_on_bird_28, but states that I believe that
oc#statement_on_bird_28 is false) and keeps the knowledge elements as
connected/comparable as possible and with as few redundancies as possible.
Most importantly, the approach is asynchronous and the users do not have to
agree, meet or even discuss with each other.

For a particular application (e.g. an expert system) the categories that are
not used may be filtered out, and a selection can be done on alternative 
statements (e.g. one strategy may be to select the most specialized 
statements, or to select statements according to their creator identifiers,
types or even the graphs that describe those creators).

To constrast with the more traditional approach, here is a quote from
last week e-mail of John Sowa:
>  > One of the problems of a registery of ontologies (as in the Ontolingua
>  > server) compared to a multi-source ontology (as in WebKB) is that it
>  > is difficult for an ontology provider to relate the new categories (by
>  > subsumption/exclusion/identity/... links) to the categories of all
>  > other ontologies in the registry, and hence these ontologies are
>  > difficult to compare and re-use: each user must select various
>  > ontologies (and choose between competing ones) then complement and
>  > inter-relate their categories which is even more difficult than it
>  > would have been for the authors of the selected ontologies.
> 
> I agree.  We certainly need something more than just a registry as
> in Ontolingua.  What you have accomplished is what I was originally
> proposing:  a selection of modules, each of which was independent, but
> each of which was related to the others by their mapping to a super-
> hierarchy of categories that included all the categories from each of them. 
> ...
> But the registry ideas should also be included:  each module by itself
> should be documented and annotated with all the information about its
> history of development, contributors, testers, and especially all 
> significant applications.
WebKB-2 supports the development and documentation of modules by permitting
the storage of the knowledge of the modules into one or several Web files
(possibly mixed with and/or indexing other document elements) and test them
until they are considered "stable" (at which point, the instruction
"no storage;" may be removed from the file and its knowledge representations
will be committed into the KB; this is the procedure advocated by the
documentation of WebKB-2: www.webkb.org/doc/w2doc.html).



I now introduce the initial Multi-Source Ontology (MSO) of WebKB-2 (the one
currently browsable and updatable at www.webkb.org).
I consider all categories of all existing ontologies as having some value,
and being either identical or complementary. However, for time-related pratical 
reasons, the MSO of WebKB-2 is currently only composed of an extension and
correction of the noun-related part of WordNet plus various top-level 
ontologies (mainly, extensions of those of John Sowa in his 1984/2000 books,
DOLCE, and various categories from other sources). Its goal/rationale is
(like FCG) to support and ease the direct representation of English sentences.
Whenever possible, I also avoided to break links from WordNet. Details about
how I re-used and corrected WordNet, and merged its top-level ontology into 
Sowa's ontologies and DOLCE, are accessible from 
www.webkb.org/doc/papers/iccs03/ (each correction is documented and the
ontology is avaible in FS and, for less recent versions, also in CGIF,
RDF/DAML+OIL and WordNet formats).
The top-level ontology is composed of about 150 primitive relation types
(spatial/temporal/thematic/argumentative/... relations), about 30 high-level
concept types needed for the signatures of these relations, and about 
120 other top-level concept types (not including any WordNet type) that 
I thought interesting.
Some general statements are associated to some high-level types
(including WordNet ones) to describe prototypical relations from those
types and their specializations. E.g. see www.webkb.org/kb/top/schemas.html
They permit WebKB-2 to ease and normalize knowledge entering/querying by 
generating cascading content-oriented menus for any category selected by 
the user. However, many more "general statements" need to be entered to 
make this approach really helpful.

Many extensions to that work would be interesting and I plan some of them
for the short to medium term: integrating WordNet 2.0 in full, extending 
the parsing/display of other notations (RDF+OWL, KIF, CGIF),
(beginning to) write a natural language parser for English.
The release of WebKB-2 as open source may also be a short to medium goal.

Interests from the SUO group would certainly guide or prioritize my goals:
I offer to extend and develop WebKB-2 and its ontology as an ongoing project
for the SUO.

References:
WebKB-2 general doc:    www.webkb.org/doc/w2doc.html
WebKB-2 full interface: www.webkb.org/webkbShared.html
WebKB-2 example files:  www.webkb.org/kb/
WebKB-2 languages:      www.webkb.org/doc/languages/ and
                        www.webkb.org/doc/papers/iccs02/
WebKB-2 ontology:       www.webkb.org/doc/papers/iccs03/
WebKB-2 and the Semantic Web:  www.webkb.org/doc/papers/wi02/
WebKB-1 and WebKB-2 home page: www.webkb.org


Philippe MARTIN