The WebKB set of tools

Data and knowledge organisation, sharing, retrieval and management

Dr. Philippe A. MARTIN

Current main installation site:

Note: this documentation is essentially about WebKB-1, a project which ended late 1999.
WebKB-2 can currently be seen as a complement of WebKB-1.
In time, we hope it will completely supersede WebKB-1.
Both tools are usable from the WebKB site.
As a result of the ending of WebKB-1,
this documentation which was intended to be comprehensive, was never completed.
Click here if you want to access an article about WebKB-1.
Click here for the documentation of WebKB-2.
Click here to access published articles about WebKB-1 or/and WebKB-2.


The WebKB set of tools - or simply WebKB - allows its users to store, organize and retrieve knowledge or document elements (DEs) in Web-accessible files on their local disks or in our warehouse. (Note: the implementation of the warehouse has just begun).
The particularity of WebKB is that it unifies different kinds of complementary tools; it constitutes both: a) a private/shared annotation tool of information on the Word Wide Web, b) a knowledge-based Information Retrieval tool, c) a corporate memory (or warehouse) tool, and d) a lightweight Knowledge Acquisition tool (currently there is no rule system, knowledge cannot be executed).

WebKB is not an alternative to Web search engines since it doesn't do automatic search and indexation or representation of documents. The goal when answering information retrieval queries is not to retrieve documents related to the query but directly the precise knowledge answering the query. This implies that the search is done on an organised knowledge base (or documents storing knowledge). Users may represent knowledge (including knowledge annotating DEs) at the level of precision they desire, using either Conceptual Graphs (CGs), formalised english or structured text (currently only the handling of CGs is fully implemented).
A default ontology is provided for helping users to represent knowledge.

Below is a discussion about the WebKB set of tools. You may want to look at the window-based interfaces while reading the discussion below.

Goals: Precision-oriented information retrieval, Web document handling
Knowledge representation/modelling/acquisition/sharing/retrieval
Data and metadata management
Methods: Semantic/structural/syntaxic searches, knowledge management
Private/shared annotations of Web-accessible information sources
Warehousing, knowledge integration, asynchronous cooperation
Medias: Ontology ,
Conceptual Graphs, formalised english, structured documents, script language
Hypermedia and hypertext systems, World-Wide Web
Corporate memory, knowledge repository/warehouse,
Knowledge representation systems using databases.

Table of contents

  1. Goal: helping Information Retrieval (IR) and Knowledge Acquisition (KA)
  2. Approach: combining techniques for IR, KA and groupware
    1. Allowing users to represent any document element (DE) or link it to other ones
    2. Offering easy ways to represent knowledge at the desired level of precision
    3. Offering an ontology for helping users to represent or search knowledge
    4. Allowing users to store knowledge in documents or in a warehouse
    5. Offering users a language to exploit, merge and generate knowledge or documents
  3. Architecture
    1. Knowledge/Text processors
    2. Interfaces
    3. Knowledge bases
  4. Documentation for the WebKB languages   (separate document)
    1. Document loading&execution commands
    2. Knowledge handling commands
    3. Text processing commands
    4. Control structures
  5. Exploitation of ontologies for knowledge modelling and retrieval   (separate document)
    1. Definition and exploitation of a top-level ontology
    2. Exploitation of a natural language ontology
  6. Allowing different users to build a single knowledge base   (separate document)
    1. Procedures
    2. Conventions
  7. Applications and limitations
    1. Knowledge organisation and retrieval
    2. Warehouse or corporate memory
    3. Document generation
    4. Limitations
  8. Comparison with related systems
    1. Integration of distributed systems, exploitation of documents, or warehousing
    2. Exploitation of free text or poorly structured documents
    3. Languages for storing knowledge in documents
    4. Languages, tools and ontologies for representing and searching knowledge
    5. Public annotations, knowledge integration and collaboration
    6. Exploitation of knowledge
    7. Summary and comparison with closest related systems
  9. Conclusion
    1. Merge of IR and KA techniques: methods and benefits
    2. Merge of knowledge from different users: methods and benefits
  10. References

Goal: helping Information Retrieval (IR) and Knowledge Acquisition (KA)

An ideal information system would be able to comprehend, compare and merge the contents of information sources (text, images, etc.) accessible on the Word Wide Web [Web], and then answer queries precisely. For example, it would be able to answer the query "Compare the characteristics of the public-domain deductive database systems able to exploit thousands of rules". The answer would classify the systems according to their respective speed, expressiveness of their rule language and query language, maximum number of rules and facts, etc. (Indeed, the best system would probably be the ideal information system itself !). The answer would not only take into account the information sources (articles, documentation, etc.) written by the designers of the database systems but also the comments made on these systems by their users and published on the Web. The answer would show where the displayed pieces of information come from and if they have been "annotated" by other persons.

Unfortunately, the current tools for analysing the content of free text (let alone images or speech) have restricted possibilities outide of very limited domains. They cannot sufficiently "understand" the content of a document so that it can be safely merged with the content of other documents or queried as in the example above. This is a indeed a long term project though the Web search engine DR-LINK [Web robots] already use some natural language parsing techniques, and though the authors of the large knowledge base CYC claim since a few years to be "nearing the critical mass required for the reading and assimilation of online texts (news stories, encyclopedia articles, etc)" [CYC].

Until then, Information Retrieval (IR) systems, and especially large-scale IR system such as Web search engines, will not give precise answers to a query but documents related to it. These documents may include parts or all the precise answers the user is after but to get them, s/he has to read the documents and possibly compare or match their contents. For instance, with the current Web search engines and for the above example query, even if the adequate precise keywords "public domain deductive database systems" are used for searching the limited domain of deductive database systems ("capable of logical inferences" instead of "deductive" would not do), the user will be presented with hundreds of old and new articles, courses or documentation related to deductive systems theory or systems.

Currently, information must be represented with a knowledge representation language, or at least structured by conceptual relations, in order to be exploited for precise information retrieval or other tasks implying logical inferences. Such information, which from now we call "knowledge", may be stored in documents using either a (semi-)structured natural language when writing (parts of) these documents, or a more formal knowledge representation language and embedding the representations inside these documents (e.g. in hidden parts). Knowledge may also be interactively built into a warehouse for easing the interconnexion of knowledge from different users and their reuse. In both cases, the more precise the knowledge, the more the representation task is difficult or long, but then the more logical inferences can be automatically done for retrieving more relevant and complete answers to queries, and the more different pieces of knowledge can be interconnected or safely merged.

Knowledge representation is important for IR. Producing detailled and consistent knowledge representations implies doing knowledge modelling, which is often the main task of Knowledge acquisition (KA), the process of developing a knowledge-based system or simply a knowledge base (KB), e.g. an organised corporate memory.
Finally, information retrieval, via queries or navigation, is a basic task of KA, during the knowledge modelling phase but also during the testing, exploitation and maintenance phases. Here are some examples:
- searches for the instances of some concepts in the document sources of expertise or the intermediary documents;
- searches for the instances or specialisation of concepts or models into the knowledge base (e.g. for abstracting, combining or validating them);
- searches for the sources of some pieces of knowledge in the document sources of expertise or the intermediary documents (e.g. for validating the knowledge, explaining it, or generating a documentation about it).

It appears that a strong structuration and interconnexion of documents and knowledge can benefit both IR and KA. It is also clear that the sharing and reuse of document and knowledge by different users improve the completeness and possibly the accuracy of IR and KA.
Our goal when developping the WebKB set of tools (or more simply "WebKB") is to ease both IR and KA by offering users several ways to set and exploit relations between document elements, between knowledge, between document elements and knowledge. A special attention is paid on the semi-automatic interconnection of knowledge coming from different users. The Web has been chosen as our delievery platform since it's the most widespread and easy way to access and share information, both on internet and intranets.

WebKB is not an alternative to Web search engines since it doesn't do automatic search (webcrawl) and indexation or representation of documents.
It is not either a KA tool dedicated to a special methodology (e.g. KADS-II [KADS-II] or KREST [KA tools]), though many KADS concepts and some KADS models are included in its ontology.
It is aimed to be a step towards the integration of different information sources and of different techniques.

Approach: combining techniques for IR, KA and groupware

In order to best help IR and KA we chose to combine into WebKB the more generic features of IR, KA and groupware. We now discuss these features and thus compare our approach with others ones.

2.1  Allowing users to represent any document element (DE) or link it to other ones

We call "document element (DE)" any textual/HTML data, e.g. a sentence, a section, a reference to an image or to a whole document. This excludes binary data but includes knowledge. DEs may be stored in documents or in databases.

WebKB allows its users to annotate any DE in Web-accessible documents or (later) in its warehouse. For us, "annotate a DE" means "link a DE to another". The link may be named or it may be a representation link (the destination DE is then knowledge). The first case allows quick conceptual organisation while the second case allows a more precise knowledge-based organisation. The destination DE may be anywhere in Web-accessible documents (private annotations) or in the warehouse (public annotations).

To do this when a URL [URI/URL] is not sufficiently precise for refering to a particular DE, the user must specify the content of the DE (and as an option, its occurence in the referred document) in the query part of the URL. An interface is proposed for easing the creation of annotations.

Most current Web-based annotation tools allow only certain parts of documents to be annotated [].

When KA tools allow annotations, it is mostly between certain kinds of knowledge and sentences (outside documents). Besides, the links are only exploited via hypertext navigation, not via knowledge-based queries (e.g. for explanation or document generation purposes) [].

Most metadata languages allow their users to index only whole documents and some do allow the knowledge to be embedded inside the annotated document [].

2.2  Offering easy ways to represent knowledge at the desired level of precision

Users won't take the time to represent knowledge with a formal language just to structure some information and enhance its accessibility (this is a reason why so few hypertext systems are knowledge based). Besides, even knowledge engineers wanting to build a knowledge-based sytem must first structure the knowledge sources in an informal way.

A solution we have adopted is to provide different but combinable ways of representing knowledge: the rather formal but still expressive and intuitive Conceptual Graphs [CGs], and more readable or handy to use notations which are automatically converted into simple CGs. The handling of a frame-oriented CG notation and of some HTML structures has already been implemented. Other HTML structures, indented text and a formalised english will also be handled by WebKB.

Furthermore, in each of these cases and in queries, the user has the choice to use the exact names of entities defined in an ontology (structured list of concept types, relation types and instances) or just words commonly used for such entities. These words can be matched to the entities of the natural language ontology WordNet [Ontologies] and then, according to the constraints attached to these entities, some of the knowledge ambiguities can be (semi-)automatically resolved.
Thus, this last feature enables knowledge-based retrieval without asking the user to search an ontology when building knowledge or queries. This is important for the tool to be used outside of a KA context and it also permits large-scale knowledge sharing.

However, it is also clear that the more knowledge has been precisely represented, the more logical inferences can then be done to exploit it. Thus, we think one of the languages proposed to the user must be have the following properties.

CGs are one of the languages which have such properties. A "simple" CG may be translated in an existential conjunctive and positive formula of first order logic. More complex CGs allow to introduce negations, contexts, sets, etc. (WebKB allows the user to represent some kinds of contexts, negations and sets, but these information are not exploited for logical inferences or search purposes). Especially interesting for WebKB is the fact that CGs have a graphic notation and a linear notation which are both relatively intuitive. Concept types and relation types must be declared or defined. Specialisation relations between them may be given or calculated according to their definitions. Then, specialisation relations between the simple CGs which uses these types may also be calculated. Thus, for example, WebKB may retrieve the specialisations of a given simple CG, arbitrary general or precise. It may also do "maximal joins" (or "merge") of given CGs, and thus generate new knowledge.

Most current metadata languages - i.e. languages designed to be used for embedding in Web documents some knowledge representions, e.g. on the content of other documents - are not genuine knowledge representation languages, and thus have not the above cited properties. For example, the Extensible Markup Language [XML] and the Resource Description Framework [RDF] are rather verbose and not logic-based. XML and RDF are designed to be very general-purpose languages and expected to be generated much more than directly edited. A more detailled discussion of current metadata languages and tools exploiting them is given in section 8.

2.3  Offering an ontology for helping users to represent or search knowledge

Existing top-level ontologies, or domain ontologies or even more complete knowledge bases in a relevant domain, can be reused to guide knowledge modelling: they show what kind of information might be modelled (and thus what kind of information to collect in documents), and how to model it. They also spare a lot of desig work.

WebKB provides a top-level ontology of about 200 concept types and 200 relation types . This ontology was created by merging other current top-level ontologies used in knowledge acquisition, knowledge representation, and cooperation-oriented hypertext tools. For example, the relation type ontology collects thematic, mathematical, spatial, temporal, rhetoric and argumentative relations types. A few task models from the KADS litterature have been represented in CGs and associated to task types under the form of type definitions.

Constraints associated to types in our top-level ontology help to increase the consistency of knowledge representations. For example, exclusion links between types ensures that exclusive types such as Spatial-entity, Information-entity and Process never have common subtypes. We have noticed from experiences with our users that such a safeguard is extremely useful especially because the subsumption relation is ill-employed (e.g. for defining a part or a role of a kind of entity instead of a more specialised kind) or because a user misinterpret or forget the category to which pertains a type (e.g. a type named "union" by a user to refer to an organization could be interpreted by another user as refering to a process, a physical entity resulting from a process, a state, and even a location; our top-level ontology prevent such misinterpretations). The signatures we have associated to our relation types also prevent similar misinterpretations during an ontology extension but also during the construction or the use of CGs.

At present, user knowledge, including ontologies, can only be stored in documents. When a warehouse will be used, we'll initialise it with our top-level ontology, specialised by the 90,000 categories of the natural language ontology WordNet. Thus knowledge modelling will be eased since, in most domains and applications, users will find a lot of relevant concept type to specialise and a lot of relations signed on this vocabularies. Knowledge sharing and reuse will also be eased since knowledge from different users will derive from a common vocabulary. Knowledge retrieval will greatly be enhanced because it will be possible to use a lot of natural language concept types (and use different synonym names to refer to them) for accessing knowledge. We will also allow this for retrieving knowledge stored in documents but with far less precision since the types they use is not related to the types in the warehouse (unless it is derived from our top-level ontology, in which case some good guess can be done by exploiting the constraints on the signature relations). We have already exploited WordNet in such directions in our previous tool, CGKAT [Martin, 1995; Martin & Alpay, 1996; Martin, 1996]. We discuss more in detail our exploitation of ontologies in section 4.

2.4  Allowing users to store knowledge in documents or in a warehouse

Warehouse (centralised database) and distributed documents have complementary advantages for storing knowledge.

2.4.1  Storing knowledge in documents

Knowledge representations or queries can be stored inside a document, mixed with other DEs by using structured english and/or by delimiting the knowledge area with special delimiters or tags. This area may or may not be visible when the document is displayed. For example, it may be hidden using HTML comments or by storing it in the content of an HTML tag attribute.

The WebKB user may include knowledge-management or file-management commands, or scripts of commands, inside a document by enclosing them with the special HTML tags "<KR>" and "</KR>" or with the strings "$(" and ")$". Furthermore, knowledge statements may be stored inside the value of the "alt" attribute of an image tag. Thus, for example, the user may see a graphic representation of a CG and WebKB may use its linear representation.
Such embeddings of commands inside documents have two advantages.

First, knowledge assertions (such as CGs) and their associated documentation may be stored in the same file. This allows the user not to duplicate knowledge into a separated knowledge base, and thus lowers maintenance costs. Knowledge and its documentation may be structured into sections and by hypertext links. Due to their structuration and their interconnection, they are both more comprehensible and manageable, especially since tools for editing, viewing and managing documents may be used to manage both knowledge and its documentation. Such facilities can ease various phases of KA.

Second, queries may be embedded into a document. Then, when the document is sent to a WebKB query processor, it can send back either the results of the requests or the whole document with the results instead of the requests (various parameters may be used for specifying what the output should be). Thus, for example, the WebKB query processor can filter a document and insert it into another one, ot it can process tests on a knowledge base and insert them into a document. This method may be easier to do knowledge/document management than to write a script with a lot of "print" commands for adding text to the generated document.
Another application with HTML documents is to use Javascript for associating a query to an hypertext link in such a way that it is sent to a WebKB query processor when the link is activated. Then the document result of the query is displayed in the same way as a normal document. In the hypertext literature, this technique is called dynamic linking and the generated document is called a dynamic document or a virtual document (Quint & Vatton, 1992). It has many applications, e.g. for adapting the content of a document to a user.

2.4.2  Storing knowledge in a warehouse

A Web-accessible warehouse may be seen (and referenced via a URL) as a big document storing or referencing related DEs or knowledge at a same place, but the construction of which may be cooperatively done by various users in a controlled way. Thus, if the warehouse allows the embedding of knowledge inside DEs, this option has the advantages we have seen for plain documents. Searchs are quicker in a warehouse than in distributed documents since information are centralised and generally more structured. The conterpart is that updates are longer and more constrained.

WebKB will propose a warehouse to its user by the end of 1999, based on a deductive database. The underlying ideas are discussed in section 6.

2.5  Offering users a language to exploit, merge and generate knowledge or documents

The WebKB query processor (WebKB server) is Web-accessible. It can be exploited with a Web browser from HTML pages (e.g. from the WebKB interface) or from other programs, by sending it commands. Three complementary kinds of commands are proposed.

Thus, the WebKB query processor may be exploited from and for a wide range of applications. The combination of the three kinds of commands allows the the combination of data or knowledge from one or several files or information systems.

Besides, we do not need to add some special operators to our query language, such as the modal operators "few" and "most", since they are easily definable by the user with the scripting language. Here is an example extracted from a script proposed by WebKB for showing the possibilities of the language.

    spec [Something] | nbArguments | set nbCGs;
    spec [Cat] | nbArguments | set nbCGsAboutCat;
    set nbCGsdiv2 `expr $nbCGs / 2`;
    if ($nbCGsAboutCat > $nbCGsdiv2)
    { echo "Most CGs of the base are about cats"; }

An important feature of these commands is that they may be embedded inside documents. Some applications have been discussed above.

The WebKB languages are detailled in section 5.


An important element of our work is an open architecture. Separate, combinable and Web-accessible tools permit ease of use and exchange with any other tool thought more interesting or suitable: a simple URL change allows such a substitution providing the same protocol is followed by the substituting module.

WebKB has three kinds of components: knowledge/text processors, interfaces and knowledge bases. As stressed above, all the components are Web-accessible.

3.1  Knowledge/Text processors

The WebKB processor (WebKB server) is a C/C++ program accessible via the Common Gateway Interface [CGI]. It may therefore be exploited from a Web browser via HTML forms (e.g. the forms of the WebKB interface), or from another software using the CGI protocol.

As opposed to a Java applet, this CGI server is not executed on the user machine but on the server machine, the access of which via the Web might be long. But a Java applet is not allowed to directly load a Web-accessible file except if it is located on the server machine (it may go around this security restriction by calling a program on the server machine to do the loading and send him the content or a processed content; this solution is similar to using a CGI server). Besides, the WebKB processor implement many functions and includes at least one CG processor. If it was implemented as an applet, the loading of its code on the user machine would be quite long. Two additionnal reasons led us to implement the WebKB processor as a CGI server rather than an applet:
(i) the warehouse will also have to be implemented on the server machine, and a deductive database will be used (thus many logical inferences will be directly done on the server machine),
(ii) the two CG workbenchs currently exploited are written in C++ and therefore could not have been included into an applet.

The two CG workbenchs [CGs engines] currently exploited for processing commands on knowledge are CoGITo [Haemmerlé, 1995] and Peirce [Ellis, 1995]. The user may specify its preferred inference engine along with the processing request (see for example how the choice is offered in the "Knowledge-based Information Retrieval/Handling tool"). Otherwise, the inference engine CoGITo is chosen by default. CoGITo is directly included in the code of the WebKB processor while Peirce is exploited as a separate process.

In the future, other inference engines and thus other knowledge representation languages might be exploited in WebKB. The "language" attribute in delimiters of sequence of commands will allow this. For the "<KR>" delimiter, it looks like "<KR langage='CG'>".

A list of the possible commands that may be submitted to a WebKB processor is given in a select menu in the "Knowledge-based Information Retrieval/Handling tool" and the "Classic Information Retrieval/Handling tool".

By the end of 1999, the WebKB server will be able to exploit a database management system (DBMS) for allowing users to cooperatively build a single knowledge base, thus ensuring a maximal interconnection and sharing of knowledge from the different users. To implement this tool, we will use the protocols we have described in the annexe 2 of [Martin, 1996].

3.2  Interfaces

Whenever possible, we have used HTML and Javascript for implementing the interface of our tools. Thus, users may customize these interfaces and their behavior by modifying the HTML&Javascript sources. As opposed to CGI servers, Javascript programs are directly executed on client machines (and presently seem more secure than Java applets).

For the CG graphic editor [WebKB-GE], where substantial direct manipulation of graphics are required, Java has been used.

The current interface tools of WebKB are the following:

3.3  Knowledge bases

At present, knowledge, including ontologies, can only be stored in documents. The implementation of the warehouse will soon offer another alternative. As we have seen in section 2.3, WebKB presently proposes a top-level ontology and a few KADS task models. The warehouse will be initialised with the natural language ontology WordNet.

Documentation for the WebKB languages

This documentation is in a separate document. Here is its table of content.

  1. Document loading&execution commands
  2. Knowledge handling commands
    1. Knowledge assertion commands
      1. Languages based on the Conceptual Graph (CG) formalism
      2. Declaring, defining and ordering types
      3. Declaring individuals
      4. Usual CG linear notation
      5. Frame-like CG linear notation
      6. Document Elements (DEs) organisation
        1. HTML structures
        2. Indented text
      7. Formalised english
    2. Knowledge-based indexation commands
    3. Knowledge testing commands
    4. Knowledge querying commands
    5. Knowledge generation commands
    6. Knowledge deletion commands
  3. Text processing commands
  4. Control structures
  5. References

Exploitation of ontologies for knowledge modelling and retrieval

This section has to be written in a separate document. Meanwhile, this article presents a part of the subject, and the chapter 5 of my Ph.D thesis (PhD.html) details it in french.

Click here to see some links on current top-level ontologies.

Allowing different users to build a single knowledge base

This documentation is in a separate document. Here is its table of content.

  1. Goal: the reuse and integration of knowledge from multiple users
  2. Approach: integration of knowledge at its insertion
  3. Procedures
    1. Introduction: the structure of the KB
    2. Visualisation
    3. Notification of updates
    4. Creation or modification
    5. Destruction
  4. Conventions
    1. Lexical normalisation: use formatted nouns, give synonyms, make aliases
    2. Structural normalisation: use formalised english and basic binary relations
    3. Semantic normalisation: reuse and precise existing knowledge
  5. References

The procedures described in this separate document allow various users to incrementally build a same type hierarchy (the procedures will be extended for the building of a whole KB). The users do not have to reach a consensus on type names or their intended meanings: apart from allowing aliases and display filters, the main ideas are the following ones.

Applications and limitations

This section has to be written.

7.1  Knowledge organisation and retrieval

7.2  Warehouse or corporate memory

7.3  Document generation

7.4  Limitations


Comparison with related systems

It appeared from our review of Web-based tools for KA, IR or cooperation that these tools face similar design issues which lead to the implementation of a subset of the same basic elements. These issues may be first divided into four categories:
1) where and how the information sources are stored,
2) what kind of structure or language is used to represent the information sources,
3) what kind of functionnalities and interfaces are offered for representing, structuring and relating the information sources,
4) what kind of functionnalities and interfaces are offered for exploiting the information sources.

In the next four sub-sections, we detail these issues and the elements to which they lead, and situate WebKB features in the different approaches. In section 8.5, we use these criteria to compare WebKB with other typical tools in each category (cf. table 1). Figure 1 shows the possible architecture elements and their interactions.

Figure 1.  Possible architecture elements of Web-based IR/KA/Groupware systems

8.1  Integration of distributed systems, exploitation of documents, or warehousing

Web-based tools may integrate distributed systems, or search and exploit the content of distributed information sources, or allow users to store and exploit information in a warehouse.

The first case aims to logically unify heterogeneous distributed information systems (e.g. databases, form-based sources), and providing a user with the means to search and update information efficiently and transparently. Examples of such systems are AlephWeb , Carnot , Infomaster , Information Manifold , Hermes , SIMS and TSIMMIS (cf. [DSI tools]). These systems use a "mediator" that translates user queries into sub-queries for the different information systems and integrates the sub-answers (see the top boxes in Figure 1). The mediator exploits "wrappers" and content descriptions of information sources. Wrappers perform the conversion between languages or protocols. This approach may only be applied to information sources which have a stable syntactical structure that a wrapper can use to extract semantic information. This is not the case for most Web documents where knowledge must then be manually represented and either embedded into the documents or in other information sources refering to them.

In the second case, information is searched in different Web-accessible files or databases, and possibly translated into the same language. The search may be initiated and directed by a user query [], or done for collecting data and caching them in order to efficiently answer queries later []. In this last case, the search engine and the storage system may be integrated, as in a database management system (DBMS) (cf. figure 1). Besides, the information search may be directed by a list of registered/trusted sites (as in []) or done by programs that traverse the Web automatically [Web robots].

In the third case, information is stored at a same location and controls may be initiated when users enter the information. Relationships between information of different users may also be automatically created. Note that integrating distributed systems may be seen as creating a virtual warehouse (Keller & Genesereth, 1997). Besides, each distributed systems may be a warehouse itself, as for example in AlephWeb, a Web trader which federates various Web-accessible databases, each one specialised in one domain or for one application. However, since domains, sub-domains or application ressources often overlaps, it may be difficult for a user to know where s/he should put an information, and for the search engine to know where to search and how to combine information. A single warehouse where an information is tightly linked to all others when the user stores it, does not lead to ulterior combination problems and its access is not limited to a predefined domain. But indeed, if this warehouse has to be accessed by many users, mirrors for search purposes have to be built (and automatically updated frequently) in order to shorten the access delays.

8.2  Exploitation of free text or hypertext documents

Though the content of most Web documents cannot be automatically understood, they still contain a lot of a information. In order to retrieve it, e.g. for resource discovery or classification, statistical analysis or hypertext link maintenance, there are basically three kinds of approaches.

WebKB does not have a Web robot since it is oriented towards the retrieval of knowledge, not documents.
Though it does not offer an hypertext query language as precise as in the tools cited in the second approach, it proposes classic Unix utilities (e.g. grep, awk and diff), a basic link-exploration command ("accessibleFromDoc") and a script language to combine these commands. Hence for exploiting documents with a well-known structure, which is where precise hypertext query languages are really useful, the user may in most cases automate that task by writing scripts.
Finally, WebKB allow arbitrary complex metadata to be put in documents or (when implemented) in a warehouse.

8.3  Languages for storing knowledge in documents

We have presented in section 2.4.1 some advantages of allowing users to store knowledge in documents. The context we mainly had in mind was about the management of documents and knowledge for an application or for a community.

However, many researchers and industrials currently focus on the exploitation of knowledge in Web documents but they generally seem oriented toward more large-scale (and less precision-oriented) applications. This is why they do not talk about knowledge but more generally about "metadata", i.e. "machine-understandable information on the Web" [Metadata]. They expect it to be used for addressing many of the issues regarding managing, finding, accessing, filtering, personalizing, and aggregating Web content in a standard way. More precise examples of applications are the building of site maps, content ratings, stream channel definitions, search engine data collection, digital library collections, or distributed authoring solutions. Current efforts are on the definition of general-purpose metadata languages, and workbenches for these languages. The two emerging standards for metadata languages are the Extensible Markup Language [XML] and the Resource Description Framework [RDF]. Both are supported by the Word Wide Web Consortium [W3C].

The WebKB languages may be seen as precision oriented metadata languages. We now list some of these languages and discuss their features.

The WebKB approach has two kinds of advantages over the above approaches. First, entities and ontologies descriptions may be done with a genuine and concise KR language. Second, there is no arbitrary division between knowledge about ontologies and knowledge about instances of objects defined in the ontology. Any of these descriptions may be mixed with other DEs inside a document and/or index any DE in this document or other documents. The information to describe is often about types of entities, not about a special entity, and thus a knowledge base may essentially consists in an ontology. Two of WebKB example documents -- the documentation and representation of kinds of tasks according to the KADS methodology, the indexation and representation of the knowledge of a road accident analysis expert from an interview retranscription -- show how the construction of an ontology and its documentation may be done in WebKB. Let's also note that neither types, nor objects instances of them have to be represented by a URL.

The WebKB approach does not prevent the user to write its documents with XML (or any other markup language similar to HTML) and thus exploit its facilities and tools based on it. Moreover, since the WebKB processor ignores HTML/XML tags when processing commands or knowledge assertions, but still store their exact form (including tags) and use it when generating documents, any component of these commands or knowledge assertions may exploit the facilities of HTML or XML, e.g. it may be a source or a destination of XML hypertext links. However, if an image, or one of the current CG graphic viewers/editors, is used for displaying a CG, such links will not appear.

Since RDF is going to be a standard, one may regret that their language is not more logic-oriented and their notation more concise. For reason of efficiency, large-scale search engines would treat logic relations as any other relations, while more precision-oriented engine would exploit them for doing logical inferences. Instead, logical aspects are going to be added to RDF in different ways by different application developpers, or they will use different languages (then they may build wrappers to translate these languages into RDF for the wider community but the logical aspects would also be translated in different ways). Current tools for RDF [RDF tools] seem to be parsers, viewers and workbenchs that developpers are expected to use for developing specific applications. Thus, they may be compared to the CGs workbenchs used by WebKB except that these ones allow the search or match of CGs to take into account the constrainsts and specialisation relations associated to the types in the ontology.

8.4  Languages, tools and ontologies for representing and searching knowledge

Representing knowledge in a uniform and consistent way is a long, tedious and sometimes difficult task. Adequate languages, tools and ontologies are needed to ease or guide that task, by helping the user in (i) deciding what to represent and how to represent it, (ii) building the representations, and (iii) verifying the completeness and consistency of the representations.

8.4.1  Languages

We have seen in section 2 and section 8.3 some properties that we think desirable for languages to help the user to represent and search knowledge. We have complemented the CGs features and notations in order to reach these properties. CGs belong to the inheritance-based or logic-based semantic networks, e.g. the KL-One family [KRLs]. A graph structure enable CGs systems to propose graph tracing, matching or ordering procedures, notably the generalisation relation between "simple" CGs which is a logical entailment relation. A graph-based notation is generally more intuitive or synthetical than one where concepts and relations (or nodes and links) are linked via variables. However, for more complex representations, the usual logic notations are useful.

The WebKB languages do not enable complex formal representations usable for logical inferences, e.g. about time, space, sets, modalities or negations. However, concept types and relation types relative to these notions may be defined (some are proposed in the top-level ontology) and WebKB can exploit their organisation for information retrieval or knowledge merging purposes, via simple CGs matching or merging. This is sufficient for a lot of data/knowledge organisation purposes. WebKB may be seen as a knowledge-oriented hypertext system: the languages are intended to be quick and easy to read or use but the user may also be very precise when s/he desires so.

The problem of the vocabulary to use when building knowledge or queries, may be reduced either by allowing undeclared terms or enabling the user to quickly query, browse and select the authorised terms, their inter-relations and their possible associations. WebKB proposes both solutions.

As opposed to most hypertext query languages such as those in WebLog [IR tools] or MacWeb [MacWeb], the WebKB query language allows to retrieve isolated graphs (i.e. graphs as they are entered by the users) but doesn't yet allow to do path tracing, i.e. to search for parts of the global semantic network of the knowledge base. This facility will be added when the warehouse will be implemented. The WebKB query language doesn't have either modal operators such as "most" or "any", but the control structures it proposes allow the user to define most of these operators. It should however be noted that the commands provided by WebKB for querying Web documents on their hypertext structure, or the strings they may include, are less precise and powerful than the commands offered by some other Web-based IR tools such as WebLog, WebSQL or Harvest [IR tools].

8.4.2  Tools  Editors

Helping the users with the syntax of the language may be done via adapted editors or forms. Text-based editors and graphic-based editors are complementary. The first ones allow quick creation and updates but the second ones offer better views on interrelations between various elements. WebKB proposes the two kinds of editors. The GKB-Editor (Generic Knowledge Base Editor) is a tool for graphically browsing and editing knowledge bases across multiple Frame Representation Systems (FRSs) in a uniform manner. It offers an intuitive user interface, in which objects and data items are represented as nodes in a graph.  Ontology servers  Database

8.4.3  Ontologies

For helping users to decide what to represent and how to represent it in a consistent way, a tool may propose a queriable and browsable structured library of concept types, relation types and models (e.g. task models), or in another word, an ontology.

In current KA tools, the ontology is not hard-coded in the tools but generally take the form of a library of task models browsable and editable via specialised editors. Some KA tools, such as Cue, Protégé and WebKB [KA tools] are generic enough to allow user modified task models to be used for guiding the modelling of domain knowledge (in WebKB, this may be done by issuing queries for the specialisations of given task concepts or models; if necessary, a simple script may be written for checking the completeness of the instanciation of task models).

Web-based knowledge modelling (KM) tools, such as Ikarus and KSL's Ontolingua browser [KM tools], do not propose task ontologies but minimal top-level ontologies, as well as editors allowing various people to edit or complement these ontologies. Searchs are generally limited to navigation on the pieces of knowledge in these ontologies.

WebKB offers a top-level ontology which merge or connect the components of several other current top-level ontologies used in knowledge acquisition, knowledge representation, and cooperation-oriented hypertext tools. For example, the relation type ontology collects thematic, mathematical, spatial, temporal, rhetoric and argumentative relations types. These 200 relations types are signed on the proposed top-level concept types, themselves specialised by the 90.000 types of the WordNet ontology. CGKAT and WebKB seem to be the only current KA or KM tools to exploit a natural language ontology for easing knowledge retrieval, reuse and sharing. A few task models have also been represented in CGs, thus being retrievable and combinable via the WebKB language of commands.

The concept type ontology proposed by WebKB (WordNet included) may be compared, in size and structure, to the type hierarchy used in CYC [CYC]. However, CYC also has a large KB of axioms or facts using these types. CYC is a private product. It is updatable concurrently by the large team of its authors. Various inference engines may exploit the knowledge which is represented in CycL, a first-order logic based language. CYC has already been used for natural language processing, database integration, and for easing KA via selections of subsets of the KB relevant to the desired domains.

On the other side, the Knowledge Systems Laboratory (KSL) at Stanford University provides ontologies which are "theories", i.e. sets of terms with their associated rules and constraints but intended to be as small as possible and independent of each other in order to let the user choose between alternative modellings of a group of concepts [Ontologies]. Thus, a term (e.g. for a type of concept or relation) may be defined in different ways in different theories.
However, we think that such the various definitions of a term actually refer to different types of real-world objects, which should therefore be represented under different names in a same ontology where they may be connected (and thus interclassed) by various relations. Our procedures allowing a cooperative building of a KB my multiples users are based on this idea. Afterwards, as noted above for CYC, a subset relevant for a domain may be extracted from the ontology. But the less is retained, the less entry points are offered for accessing the knowledge. For example, fuzzy natural language concept types may be kept in an ontology in order to ease the access to knowledge, e.g. for educational purpose, maintenance and explanations, while only the essential concepts and rules may be retained to do problem solving tasks.
It should also be noted that the KSL theories [Ontologies], though limited in size and number, have much more precise definitions and rules associated to the terms than it is the case in other kinds of ontologies.

One effort of note is the attempt to construct a `standard' ontology for ANSI by merging the upper regions of SENSUS, CYC, EDR, MIKROKOSMOS, and other ontologies (in collaboration with researchers at IBM Santa Theresa, CYCorp., Stanford University, EDR Tokyo, and various others). This project is currently unfunded.

The Knowledge Sharing Effort proposes ontologies but not integration system.

8.5  Public annotations, knowledge integration and collaboration

Collaboration: Annotation, Knowledge Representation, Notification, and Links as First Class Objects.

groupware, CSCW

Allowing users to store knowledge in a warehouse
Allowing users to represent or link DEs
Expressive knowledge
languages and Deductive databases

FLORID is a deductive object-oriented database prototype employing F-Logic as data definition and query language.

Not objects

SHOE, OML, KA2: The address of the webpage which represents the actual object. This is a relative address if the containing collection has specified a base URL.

arbitrary DEs, Conceptual relations

8.5.1  Web private/shared annotation tools

The WebKB set of tools shares many design principles with current Web-based public annotation tools, e.g. ComMentor and HyperNews [Annotation tools], and Web-based traders, e.g. AlephWeb and AI-Trader [IR&KR tools]. However, such annotation tools are not intended to index DEs using knowledge and they cannot exploit the annotation to achieve IR outcomes. HyperNews, AlephWeb and AI-Trader only allow the user to index documents, not arbitrary parts of those documents. AI-Trader uses CGs for indexing documents, while NetRepository uses KIF [KRLs] for communication between knowledge servers. None of these tools can generate documents as answers to user queries.

8.6  Exploitation of knowledge

filter, doc. gener., Web search tools, traders and brokers


interface, ...

8.7  Summary and comparison with closest related systems

Comp. from Ontobroker team

See Table 1. No Distributed systems integration, wrappers, synchronous cooperation, automatic knowledge integr, data mining, graphical guiding interface (with unfixed ontology)
since none of the compared tools supports it. (-> cooperation tools restricted to annotation tools).

CGKAT features = MacWeb features


Table 1. Comparison of WebKB with other IR/KA/Annotation tools
("+" means that the tool supports the feature, "soon" that it will support it soon, "~" that the support is limited, and "?" that not enough information has been found to decide if the tool supports the feature or not)
  ? Onto-
Cue KSL Ika-
Document search&exploitation     +    +    +    +     +     +   +         +   +     +    +
    Web documents exploitation     +      +    +     +     +   +         +   +     +    ~
    Automatic indexation of files      +                   +      
    Web robot                       +   +    
    Cache        +    +       ?         +      
Warehouse  soon     +             +   +         +    +
    Knowledge storage  soon     +             +   +        
    Maximal knowledge integr.  soon                         
Knowledge modelling tool     +    +    +    +     +     +   +   +   +   +        
    Natural language analysis      +                        
    Graphical guiding interfaces     ~    +    ~         ?   +            
    Intuitive KRL     +    ~    +    +     ~       +   ~   ~        
    Logic-based KRL     +    ~    +       +       +   +   ~        
    Executable knowledge  soon          +       +            
    Consistency verifiable     ~    ~         ~       ~   ~          
    Completeness verifiable     +    +             +   +   +        
    Ontology easily extensible     +    +    +    +       +   +   +   +   +        
    Default ontology (library)     +               +   +   +        
        Task ontology and models     +               +            
        Top-level domain ontology     +               ~   +   +        
        Natural language ontology  soon                         
    Knowledge generation     +               ?            
    Knowledge mixable with DEs     +    ~      +     +     +   +              
Indexation of some DEs     +    +    +    +     +     +   +               +     +
    Indexation of any DE     +                          
Knowledge search language     +    +    +    +     +     ~   ?   +     ~        
    Large ontology exploitation  soon                         
    Indexed DE retrieval     +    +    +         ?              
Document generation language     +    +           ?           +    



This section has to be written.

7.1  Merge of IR and KA techniques: methods and benefits

7.2  Merge of knowledge from different users: methods and benefits

The WebKB set of tools is intended to combine various technologies for helping KA, IR and CSCW, notably the Web-related technologies, the databases and the knowledge representation languages and processors. To do so, we rely on our previous experience in the development of CGKAT. The current main installation site of the WebKB set of tools is at


8.1  Languages, ontologies and tools

Hypertext tools allowing to comment or index some parts of a document. In public-annotation tools, the comments are centralised in a warehouse.
Examples of Web-based public-annotation tools:
- ComMentor:
- HyperNews:
CYC A very large, multi-contextual knowledge base and its associated inference engine. CycL is the CYC Representation Language.
CGI Common Gateway Interface.
CGKAT Conceptual Graph Knowledge Acquisition Tool.
[ Martin, 1995] [ Martin & Alpay, 1996] [Martin, 1996]
CGs Conceptual Graphs (CGs).
CGs engines Workbenchs for Conceptual Graphs:
- CoGITo [ Haemmerlé, 1995]
- Peirce [ Ellis, 1995]:*/
- for other ones, see
DSI tools Distributed Systems Integration tools. Examples:
- AlephWeb:
- Carnot:
- Infomaster:
- Information Manifold:
- Hermes:
- SIMS:*/
- TSIMMIS: http://InfoLab.Stanford.EDU/tsimmis/tsimmis.html
HTML HyperText Markup Language.
IR tools Information Retrieval tools. Examples of knowledge-based IR tools are given in the next entry. Annotations tools (see above) are also IR tools.
Other examples of Web-based IR tools are:
- ht://Dig:
- Harvest:*/
- W3QS:
- WebLog:
- WebSQL:
IR&KR tools Information Retrieval tools exploiting knowledge representations. Metadata tools (see below) are such tools.
Examples of CG-based IR systems are:
- AI-Trader:
- RIME: see Kheirbek & Chiaramella (1995)
Java A language which code may be loaded and executed by some Web-browsers.
JavaScript A language mixable with HTML and interpretable by some Web-browsers.
KA tools Knowledge Acquisition tools. WebKB and CGKAT are KA tools. Ontology servers&editors (see below) are knowledge modelling tools and thus may be seen as restricted KA tools.
An extensive survey on old and recent KA tools (including Cue and Protégé) and their use of ontologies may be found in Heijst & al. (1996).
See also the archives of the Knowledge Acquisition Workshops:
KA2 The Knowledge Annotation Initiative of the KA Community.
KADS-II An Advanced and Comprehensive Methodology for Integrated KBS Development.
KRLs Knowledge Representation Languages. Examples:
- Knowledge Interchange Format (KIF):
- the KL-ONE family of KRLs:
- Concept Maps:*/ and
MacWeb A knowledge-based hypertext tool. See [Nanard & Nanard, 1993].
Metadata Knowledge representations in Web-accessible documents. Here are three entry points for this theme in the W3C site.
Design issues:
Metadata tools Tools exploiting metadata, generally for information retrieval. Examples:
- Ontobroker:
- RDF tools, RDF-like tools, WebKB (see the entries below)
Ontologies Set of formal terms, generally structured by semantic relations and accompanied by constraints and formal or informal definitions.

One view to allow knowledge sharing and reuse is to developp a set of ontologies which are minimal, precise and possibly alternative. See for example the ontologies developped by the Knowledge Systems Laboratory:

A more global or linguistic-oriented view is to group and interconnect various type of objects relevant to a domain into a single framework, generally around a "kind-of" hierarchy. The types may not have formal definitions or even natural language definitions. They are essentially place-holders for implicitely refered kinds of concepts or relations. Three kinds of those kinds of ontologies may be distinguished.
Web-accessible ontology editors or ontology warehouses. Examples:
- APECKS:*/ and
- Ikarus:*/
- KSL Ontology Server/Editor (also called "Ontolingua browser"):
- Ontosaurus:
- WebOnto:
RDF Resource Description Framework.
Older note on the underlying rationales:
RDF tools Tools for editing or exploiting knowledge in RDF. See the section "RDF Software and Projects" in
RDF-like tools Tools exploiting a language similar to RDF. Examples:
- Ontology Markup Language (OML):
- Conceptual Knowledge Markup Language (CKML): and
SGML Standard Generalized Markup Language.
URI/URL Uniform Resource Identifiers/Locators.
W3C Word Wide Web Consortium.
Warehouses Web-accessible databases where users can store and search information. Public-annotation tools use warehouses (see also the "KM tools" entry). Other examples are:
- AI-Trader:
- Aliweb:
Web Word Wide Web. and*/
Web robots Web robots/wanderers/crawlers/spiders. Databases, FAQs and articles about Web robots may be found in:*/
- Altavista:
- DR-LINK:*/
- Infoseek:
- Lycos:
- Yahoo!:
Note: MetaCrawler ( and Savvy Search (*/ are tools which call in parallel most of current Web indexes systems or other user-selected main information source providers (e.g. Usenet News, Software directories), and can integrate their results.
WebKB The WebKB set of tools.
WebKB-GE A Visual Editor for Canonical Conceptual Graphs.*/
See also Pollitt & al. (1998)
XML Extensible Markup Language., and

Definitions for the general AI terms we use may be found at
while the terms related to hypertext concepts or the Web Architecture are defined in the W3C glossary ( Ontology related definitions may be found at

8.2  Additionnal publications

B. Carbonneill & O. Haemmerlé. ROCK : Un système de Question/Réponse fondé sur le formalisme des Graphes Conceptuels. In Actes du 9eme congrès Reconnaissance des Formes et Intelligence Artificielle, pp 159-169, Paris, January 1994.  URL:

G. Ellis. Managing Complex Objects. Ph.D thesis, Queensland University (Department of Computer Sciences), Australia, 1995.  URL:

O. Haemmerlé. CoGITo: une plate-forme de développement de logiciels sur les graphes conceptuels. Ph.D thesis, Montpellier II University, France, January 1995.  URL:

G. van Heijst, A.Th. Schreiber & B.J. Wielinga. Using Explicit Ontologies in KBS Development. In the special Issue of IJHCS/HA (International Journal of Human-Computer Studies/Knowledge Acquisition) on "Using Explicit Ontologies in KBS Development", Fall 1996.  URL:

A. Kheirbek & Y. Chiaramella. Integrating Hypermedia and Information Retrieval with Conceptual Graphs. In Proceedings of HIM'95, Hypertext-Information Retrieval-Multimedia (Eds: R. Kuhlen & M. Rittberger), pp. 47-60, Konstanz, Germany, April 1995.  URL:

P. Martin. Using the WordNet Concept Catalog and a Relation Hierarchy for KA. In Proceedings of Peirce'95, 4th International Workshop on Peirce, Santa Cruz, California, August 18, 1995.  URL:

P. Martin & L. Alpay. Conceptual Structures and Structured Documents. In Proceedings of ICCS'96, 4th International Conference on Conceptual Structures (Eds: P.W. Eklund, G. Ellis and G.Mann), pp. 145-159, Springer Verlag, LNAI 1114, Sydney, Australia, August 19-22, 1996.  URL:

P. Martin. Exploitation de graphes conceptuels et de documents structurés et hypertextes pour l'acquisition de connaissances et la recherche d'informations. Ph.D thesis, University of Nice - Sophia Antipolis, France, October 14, 1996.  URL:

P. Martin. The WebKB set of tools: a common scheme for shared WWW Annotations, shared knowledge bases and information retrieval. In Proceedings of ICCS'97, 5th International Conference on Conceptual Structures (Eds: D. Lukose, H. Delugach, M. Keeler, L. Searle and J. Sowa), pp. 585-588, Springer Verlag, LNAI 1257, Seattle, August 4-8, 1997.  URL:

J. Nanard & M. Nanard. Should anchors be typed too ? An experiment with MacWeb. In Proceedings of HTX'93, 5th ACM Conference on Hypertext, pp. 51-62, ACM Press, Seattle, Novembre 1993.

S. Pollitt, A. Burrow and P. Eklund. WebKB-GE - A Graphical Editor for Canonical Conceptual Graphs. In Proceedings of ICCS'98, 6th International Conference on Conceptual Structures, (Eds: M.L. Mugnier and J.F. Sowa), Springer Verlag, LNAI 1453, pp 111-118, Montpellier, France, August 12-14, 1998.  URL:

V. Quint & I. Vatton. Combining Hypertext and Structured Documents in Grif. In Proceedings of ECHT'92 (Eds: D. Lucarella), pp. 23-32, ACM Press, Milan, December 1992.  URL:

Dr. Philippe A. MARTIN