Thesis of Jennie Andersen


Subject:
From transparency of knowledge graphs to a general framework for defining assessment measures

Start date: 01/02/2021
End date (estimated): 01/02/2024

Advisor: Philippe Lamarre
Coadvisor: Sylvie Cazalens

Summary:

Many Knowledge Graphs (KG) are available on the Web, and it may be difficult to decide which one to work with. Various criteria may influence this choice, beyond the relevance of the domain and the content, the use of standards, the identification of the creators… are also important. Indeed, the opening up of more and more data, encouraged by data openness policies of governments and the growing importance of data in today’s society, comes with additional requirements in terms of quality and transparency.

To help users choose one KG over another, we aim to provide an estimate of the transparency of a given knowledge graph. When thinking about this notion, several questions naturally come to mind. Do we know who created the KG? From what source? How? For what purpose?... This kind of information is essential to build trust in the data and increase their reuse. In addition, provenance information enables their reproducibility and verification.  However, the notion of transparency does not have well-defined boundaries. To try to understand it, we first explore the notion of transparency and its related concepts (openness, accessibility, verifiability...). Then, given the lack of precise requirements for transparency as a whole, we focus on one of its closely related concepts and propose a measure of the accountability of KGs. We use our measure to evaluate hundreds of KGs available through SPARQL endpoints. While most of them do not provide any accountability information within their data, our measure allows to discriminate among the others. Finally, we compare our measure with others studying data quality or FAIRness of KGs.

This comparison highlights that each measure has its own particularities, but also shares similarities with many other existing measures. As a result, choosing the proper measure to evaluate KGs for a given task is not easy, since they are described in many different ways and places. Given that many of them rely on a hierarchical structure, we propose to define a formal basis for describing the measures in a common framework. It aims to facilitate their understanding, reuse, comparison, and sharing by defining operators to manipulate them, either to build new ones, or to compare them. We also propose a web application for designing and comparing measures defined in this way.