Thesis of Rami Harrathi


Subject:
Structured documents retrieval by semantic content.

Start date:

Advisor: Sylvie Calabretto

Summary:

A structured document is a collection of typed elements, organized by a set of logical relations defining a logical structure. This structure facilitates their presentation, like their interpretation and their exploitation in various contexts. Thus, it became important to find efficient methods to query and retrieve structured document by content and structure. In this context and especially with the advent of XML as the de facto standard for structured document representation and exchange over the Web, several XML retrieval approaches have been proposed. The majority of these approaches are based on keyword indexing systems. In these approaches, keyword lists are used to describe contents of structured document and the query. Keyword list is a description that does not say anything about semantic relationships between keywords. Such descriptions by a set of Keywords are generally incomplete and imprecise. A way to improve precision, is using semantic content-based structured information retrieval, queries as well as contents of documents are represented by expressions in a knowledge representation language (eg. Conceptual graphs) and using semantic resources (ontology, thesaurus) for indexing and retrieving structured documents. The aim of this thesis is to propose a method to retrieve and query structured document by semantic content and structure.