To: cgstandard@cs.nmsu.edu
Subject: Re: cgstandard message : Quantifiers ISSUE #10 
Date: Thu, 23 Aug 2001 17:35:00 +1000
From: Philippe Martin <phmartin@meganesia.int.gu.edu.au>


> >   I have the same question.  According to documentaton [Cat: @3] is
> >   suppose to be the same as [Cat: {*}@3] which is an undefined set of
> >   3 cats.  So this would be 3 different cats like [Cat: {Yoyo, Fred,
> >   Tom}], but not more than 3 cats.  Right?  But would it be OK to have
> >   just 2 actual defined members? Like: [Cat: {Yoyo, Fred}@3]?
>
> Well, this get's us back to set designators and you've hit on one of the 
> problems.  But let's not get into that.  If we can't even handle "@3" by 
> itself, then what hope have we for the fancy stuff?


If you distinguish the various cases of collections and collection modifiers,
you greatly reduce the ambiguity and you may aim for a translation in KIF.
The graph notations accepted by WebKB permit to distinguish the following cases:

- bag, set (the usual set), xor-set, or-set, list (ordered set), ordered bag
- open collection, closed collection (meaning that all the elements have been given)
- distributive interpretation, collective interpretation ("together" in English)

Unless special keywords are given, the default interpretation in the graph notations
accepted by WebKB is: distributive open set.

For example, in FCG,
  [together 3 persons {#Mary, #John}, agent of: a dance]
means
  "there are 3 persons including #Mary and #John that dance together"
(Note: the 's' at the end of "persons" is automatically removed; this is possible
       in WebKB because "person" is only a category NAME; if a category IDENTIFIER
       such as #person is used, WebKB does not attempt to remove any trailing 's').

I would personnaly translate the CGIF  [Cat: @3]  by the FCG  [3 cats],
that is "a set of exactly 3 cats" (3 distinct cats since it is a set).

FCG has an extended range of quantifiers and has intervals, so you could also say:
  [at least 3 cats]    [2 to 3 cats]   [several cats]
  [any cat]  [most cats]   [at least 75% of cats]  [50% to 75% of cats] 

The grammars for representing these various kinds of collections and quantifiers
does not have to be big. WebKB grammars and examples are accessible at 
http://www.webkb.org/doc/grammars/
However, to store this additional information more attributes are needed in
the class for a concept node. I do not see this as a problem.


> If we can't even handle "@3"

The big question here is the meaning of "handle".
Is it the possibility of comparing any two graphs via a projection?
The above additional information can be taken into account in a projection
operation. Or it can simply be ignored. We should not expect a CG workbench 
to fully exploit such additional information for inferences or consistency detection.
Furthermore, such exploitation would often be application-dependant.
The additional information should be stored to permit applications to work.
The query mechanisms implemented in a CG workbench may be used by applications
as filters for knowledge (and then the application decide to exploit or not the
additional information).
Accepting only basic CGs force knowledge providers to enter false information
(which lead to use and re-use problems).
Even the representation of simple natural language sentences often requires
the need for collections and extended quantifiers.

A lot of persons think that CGIF should be basic and let users define their own
extensions. I never understood why:
- CGIF cannot be used for representation or translation to other languages
  (e.g. DAML+OIL and even RDF) without adding ad-hoc extensions thus
  making the knowledge parsable/exploitable only by the person who made/know of
  the extensions;
- extensions in CGIF are often supposed to be made via type definitions, rules or
  second-order graphs (though I haven't seen much details on this in the standard).
  This makes them much harder to exploit than if some keywords were reserved (as 
  in DAML+OIL). Even graph expansion has theoretical difficulties (see Michel Leclere's
  PhD thesis).

The reserved keywords in DAML+OIL are (alone) insufficient for representation of 
natural language: the general approach of KIF/CGIF is needed. My personal view is
that many reserved keywords should be added to CGIF and given a semantics (e.g. via 
a translation to KIF). Then, an implementor of a CGIF workbench could exploit more or less 
of the keywords. This is also the approach adopted in OKBC (not to confuse with ODBC;
see http://www.ai.sri.com/~okbc/). An OBKC server must state to its OKBC clients 
its level of inference (there are 3 levels according to which kinds of information 
are exploited). The important point is that the information is stored and can then 
be used in various ways.

I will soon make WebKB-2 export the ontology of its shared knowledge base
into CGIF. I will have to use my own extensions.

Philippe

__________________________________________________________________________
Dr. Philippe Martin
Research Fellow at the CRC for Enterprise Distributed Systems Technology 
                         (DSTC Pty Ltd;  DSTC is W3C's Australian Office)
Address: Griffith Uni, School of I.T., PMB 50 GCMC, QLD 9726 Australia
Email: philippe.martin@gu.edu.au;  Fax: +61 7 5594 8066
___________________________________________________________________________