PhD proposal

Subjet: Semi-automatic and collaborative knowledge-based indexation, structuring and sharing of textual or multimedia information for learning or information retrieval within a knowledge server or in a semantic grid.

Core idea. This PhD will explore how textual or multimedia information can and should be structured to ease information sharing, retrieval and comparison for various related purposes: learning, research and decision-making. Traditional solutions are based on document indexation and hence are limited to document retrieval. Knowledge-based solutions use on machine-understandable representations of parts of the semantic content of the information sources, from very lightweight representations to very formal and organised ones, from very distributed and independently created representations to those created within classic knowledge base management systems (KBMSs), from automatically created representations for information retrieval purposes only to those carefully crafted for problem-solving purposes. This thesis will begin by comparing the characteristics, advantages and rationales of the well-known softwares and techniques of these traditional and knowledge-based solutions. In order to be precise and genuinely permit the comparison of softwares and techniques this state of the art (concepts, statements, and semantic or argumentation relations between them) will itself be structured in a semi-formal way and usable for for information retrieval and learning via a knowledge server. This exercise will provide a starting point and the first test material for the main goals of this thesis: devising methods to enhance current approaches for representing, sharing, retrieving and comparing information. To that end, various research avenues will be explored:
1) the design and use of various more or less expressive (and hence more or less intuitive) textual or multimedia notations/interfaces for displaying or allowing to represent, index or query textual or multimedia information,
2) the re-use and alignment of various existing ontologies (e.g., SUMO, DOLCE, OpenCYC),
3) the initialisation of a knowledge base (KB) about a subject via the automatic extraction of basic conceptual relations between concepts or statements from textual documents (Wikipedia, course materials, research articles; examples of basic relations: generalisationOf, subprocessof, physicalPartOf, agentOf, purposeOf, annotationOf, argumentationOf and objectionOf),
4) protocols permitting people to cooperatively build/edit a same KB (and hence annotate or correct information that they think incorrect) while (i) avoiding lexical and semantic conflicts, (ii) forcing them to agree with each other, and (iii) encouraging knowledge re-use and structuring,
5) the adaptation of these protocols to permit people to cooperate in a semantic grid, assuming that the existing KBs (e.g., the KBs of each user or the various KBs about a same domain) feed from each other,
6) knowledge valuation mechanisms allowing any user to attribute values to certain characteristics (e.g., originality and accuracy) of any statement, and then select or filter out information based on these values (according to statistical functions possibly defined by the user), the creators of these values, and the content of the information itself,
7) user valuation mechanisms allowing any user to attribute values to certain characteristics of other users (e.g., originality and accuracy in a certain domain) based on their votes and the information they entered.
Dr. Martin has done some preliminary work on each of these points except the third, and has designed a knowledge server based on initial ideas for the first, second and fourth point (an implementation for the last two points is also in progress). This server, named WebKB-2, will therefore be used and extended during this thesis.

The details and references are accessible from
-   (in French)
-   (each of the files)