DSTC Project Report

Quarter January to March 2003

Project Name: WebKB
Project Title: Knowledge Warehousing and Querying
Contact Researcher: Philippe MARTIN
E-mail: pm .@. phmartin dot info
Phone: +61 7 5552 8923

Project Team:
% of time on Project
Dr Philippe MARTIN DSTC 100

Reporting Criteria

1. General Description
2. Status of Project (includes highlights, new directions, risks and challenges, resource and other issues)
3. Progress in achieving milestones nominated in previous quarterly report or project proposal
4. Future activities and outcomes plan for next quarter
5. Journal, conference, book chapter or book publications
6. Summary of collaboration by project staff with other DSTC projects, DSTC participants or third parties
7. Visits by project staff to participants or third parties
8. Visitors to DSTC meeting project staff
9. Presentations given by project staff (including conferences, external and internal seminars and lectures)
10. Commercial Training Activities involving project staff
11. Standards meetings and contributions to standards involving project staff
12. Expected Travel for next quarter by project staff
13. Other publications by project staff (i.e. not published material, eg. technical reports, reports to participants)
14. Commercial Activity
15. Product/Prototype Downloads
16. Grants awarded to project staff
17. Students associated with Research Project
18. Promotions/Publicity (include media stories, exhibitions)
19. Patents applied for or granted

1. General description of the research project aims

The WebKB project is focused on precision-oriented Knowledge Representation. It has been a sub-project of the KVO project since January 1997, and has been supported by DSTC since January 2000. Since January 2003 the WebKB project is separate from the KVO project which has re-centered on its FCA-based data classification activities (FCA: Formal Concept Analysis).

The line of research of the WebKB project is to permit and help (i) the represention or structuration of any kind (and amount) of information in a formal or semi-formal way, manually or semi-automatically, and (ii) the retrieval and comparison of this represented information.

Before January 2000, research was focused on the design and implementation of WebKB-1, a private knowledge annotation server, i.e. a server permitting internet/intranet users
(i) to create documents that contain knowledge representations (formal statements), possibly mixed with usual document elements (text, images), and possibly indexing sentences or images in other Web documents,
(ii) to retrieve these knowledge representations, or the document elements they index/annotate, via knowledge queries.
In other words, each user may create and document private knowledge bases (KBs) within Web files, and ask WebKB-1 to parse the statements or queries within these files. (Associating WebKB queries to hyperlinks is a simple way to create virtual/dynamic documents).
Since 1999, the "Semantic Web" activity of the W3C focuses on designing a language (RDF) that permit the creation of such private knowledge bases. However, RDF is not very expressive, is difficult to read and use, does not include queries nor a language of commands, does not permit the statements to be mixed with other document elements, and its indexation capabilities are basic. It is also too low-level to ease the sharing/re-use of knowledge between knowledge providers.

Work on WebKB-1 was stopped in January 2000. Since then, WebKB research has been focused on easing knowledge sharing between users. Since the W3C approach (i.e. let people users set specialisation or identidy links between categories within private KBs) is difficult and extremely far from the optimum (compared to what a more centralised approach can offer), the adopted approach was to design WebKB-2 (www.webkb.org), a shared knowledge annotation server, i.e. a server permitting internet/intranet users to update a shared KB on the server machine. WebKB-2 has several advantages over other current knowledge servers:
(i) it has most of the features of WebKB-1, i.e. it also supports private KBs within documents and the indexation of document elements,
(ii) its shared KB can be very large and has a default large initial ontology which is a merge, correction and extension of the WordNet natural language ontology and several top-level ontologies,
(iii) WebKB-2 proposes high-level intuitive and expressive input/output notations and can exploit the shared KB to generate menus to guide knowledge entering,
(iv) its category naming scheme and its update/cooperation protocols make it the only system permitting its users to update a same KB without lexical/semantic conflicts nor redundancies and without forcing the users to agree with each other,
(v) it has a procedural language of commands and various query/navigation mechanisms that permit people or automated agents to explore the KB easily and filter out the knowledge of certain users or kinds of users if necessary.

Since the implementation of WebKB-2 began in January 2000 and since, as opposed to WebKB-1, it has been built above a free object-oriented database system, its IP fully belongs to DSTC. Although many small features still need to be implemented, WebKB-2 and its resources (e.g. notations and ontology) may be considered at the "pilot stage" for the goal of manual "knowledge storage, retrieval and comparison". Many small examples have been built (see http://www.webkb.org/kb/) and a customised application ("locating and comparing short-term accommodations on the Sunshine Coast") is in development (http://www.webkb.org/SC/). This application will be progressively extended to include other Yellow-Pages-like or Auction-like services for the same area (the Sunshine Coast). Its interface is more user-friendly than the generic interface of WebKB-2 but retains much flexibility for the knowledge representation process.

Tools to (semi-)automatically extract knowledge from general documents would be very interesting since, despite WebKB-2 features, manual knowledge representation remains a slow and difficult process. However, to be used for query answering, the extracted representations must not only be of good quality (i.e. explicit and factual), they must be converted into a normalized form that permit their comparison and hence logical inferences. Both aspects are very difficult and have never been achieved outside very limited domains. After conducting some tests on sentences from research articles, I believe the first aspect might be reached within a year but the second aspect (in its general case) would need much more time. These works are clearly at the "idea stage".

To re-use knowledge produced by people not using WebKB-2 (e.g. knowledge in RDF) or to permit the content of the KB to be exploited by other knowledge representation tools, some import/export wrappers need to be added to WebKB-2. The implementation of two such import wrappers (one for RDF and one for KIF) has begun. They are at the "exploration stage". However, the relevance of this work is uncertain. KIF (or its future successor CommonLogic) will probably remain an academic standard for a long time. On the other hand, RDF (or its extension OWL), the probable future industry standard, is very likely to remain too low-level to support (non-basic) knowledge re-use/sharing (except in peer-to-peer ad-hoc ways).

Thus, in the short term, a good part of the work on WebKB-2 will be focused on extending the above cited application or other ones.
Another opening direction is the application of the WebKB-2 model to extend UML for knowledge representation purposes. To that end, the cooperation with Kerry Raymond to answer an "Ontology Definition Metamodel RFP" from the OMG, has just begun.

2. Status of the Project


1. A long description (35p) of WebKB-2, its principles, and its relevance to the "Semantic Web" has been published in January as part of a book titled "Web Intelligence" (Springer-Verlag, Eds.: N. Zhong, J. Liu, Y. Yao).

2. The rationales and methodology for correcting and extending WordNet have been accepted for publication at ICCS 2003 (the conference will be in July). The resulting ontology has been translated in various formats (e.g. RDF) and made Web-accessible (www.webkb.org/doc/wn/). It has also been advertized on three mailing lists and private enquiries about it have been answered.

3. An abstract has been submitted to Evolve 2003.

4. The application of WebKB-2 for "locating and comparing short-term accomodations on the Sunshine Coast", which began mid-December, has reached a usable form. A presentation and demo was given at the Participant Forum in February. The original idea of this application - using WebKB-2 to complement the hotel brokering services of the Wotif Web site - comes from Andy Bond. A meeting with Paul Young, director of Wotif, will permit to decide which directions should be further be explored.

New directions

The cooperation with Kerry Raymond to answer an "Ontology Definition Metamodel RFP" by the OMG, has just begun.

Risks and Challenges (including planned mitigation strategy and corrective actions)

The initial investigation (3 days) about an automatic and normalized representation of English sentences extracted from research papers revealed the extreme difficulty of the task. This (semi-)automatic representation project is at the heart of a project for a possible next "Smart Internet CRC" that Pr. Abdul Sattar (Griffith Uni.; leader of the "Knowledge Representation and Reasoning Unit") would like me to be involved in.

Resource Issues


Other Issues


3. Progress in achieving milestones nominated in previous quarterly report or project proposal

Measurement for milestone (eg. date)
Progress to Date
The "Wotif application" has been implemented. Demo given at the Participant forum. Further extensions will be implemented.

4. Future activities and outcomes plan for next quarter

Measurement for milestone (eg. date)
Wotif application To be further extended. Demonstration at Demo Day in May
Ontology Definition Metamodel RFP To be completed by September

Comments (if applicable):

5. Journal, conference, book chapter or book publications

Title of
Name of
Publisher and
Published Date
Ph. Martin Knowledge Representation, Sharing and Retrieval on the Web. Chapter of a book titled "Web Intelligence" Springer Verlag, LNCS,
January 2003
Ph. MartinCorrection and Extension
of WordNet 1.7
ICCS 2003 Springer Verlag, LNCS,
July 2003

Comments (if applicable):

6. Summary of collaboration by project staff with other DSTC projects, DSTC participants or third parties

Project Staff Names
Collaborator Names
and Organisation
Area of Collaboration
and Dates
Elemental? Kerry Raymond Ontology Definition Metamodel RFP 8/04 (4 h), 16/04 (1 h)

Comments (if applicable):

9. Presentations given by Project Staff

9.1 Presentations to participants:
Project Staff Names
and Organisation
Participant Names
and Organisation
Title of
Philippe MARTIN Participants meeting at DSTC Knowledge warehousing with WebKB-2 4/03/2003

Comments (if applicable):

12. Expected Travel for next quarter by Project Staff

Project Staff Names
and Organisation
Host, External
Who is paying?
Ph. Martin Technische Universitaet, Dresden, Germany ICCS 2003 + recreation leave 20/06 to 12/08 (conference: 21-25/08) DSTC

Comments (if applicable):