Thesis of Armelle Ndjafa


Subject:
Specialized and secured data spaces

Start date: 01/12/2008
End date (estimated): 01/12/2011

Advisor: Frédérique Laforest
Coadvisor: Lionel Brunie
Codirection: Harald Kosch
Cotutelle: Harald Kosch

Summary:

Application to information retrieval and aggregation from patient databases
All European countries have agreed on the principle of setting up shared patient files. By enforcing confidentiality constraints, a shared patient file enables secured sharing of multi-pathological medical information between all health care providers involved in the treatment of a patient. In concrete terms, the point is to enable fast and secured access to all information sources that concern the patient, regardless of whether they are hosted by hospitals, clinics, health care networks, general practitioners or specialist.
In the Rhône-Alpes region, such a project, DPPR (Dossier Patient Réparti et Partagé) is coordinated by the SISRA (Système d’Information de Santé Rhône-Alpes, Rhône-Alpes health information system). Based on tools that were previously developed in the framework on the ONCORA network (Rhône-Alpes cancer research network), the Rhône-Alpes DPPR relies on a shared index structure that for each data element related to a patient lists its identifier and the link to the storage device in which it is available.
This data structure proves especially pertinent in the case of “patient-oriented” queries (such as “Find the latest radiography of Patient X’s lungs”) or of queries related to the study of a specific theme in a patient’s file (follow-up of the development of a pathology for example). On the contrary, it is very ill-suited to “similarity” diagnostic queries such as: “Fill all patients with the same disease as Patient X with blood count results similar to those of Patient X”. Indeed, to process this query, all blood test results of all patients with the same pathology must be retrieved. This is a very cost intensive operation, and the cost may even become worse if computations of similarity between images are involved. For the same reason, the data structure makes it difficult to processing of epidemiological queries, which are based on the analysis of the data of a set of patients.
In this context, the goal of this thesis is to propose mechanisms and protocols for managing partially indexed distributed data that are adapted to the processing of similarity-based and data aggregation queries. The experimental data set will be supplied by the Rhône-Alpes DPPR and by the processing of epidemiological and diagnostic queries.
At a more theoretical level, the work focuses in particular on the following points:
•The notion of summary of information elements (data elements). Going beyond the traditional basic data descriptors (meta-data), the summary must enable going back to the “shared” level of the considered information;
•The notion of personalized data space. A data space is defined as a dynamic and integrated view on a set of data sources. Primarily used for integrating data collected by several sensors, data spaces have been recently adapted to the management of heterogeneous personal data (e-mails, files, personal schedule data,…) distributed over multiple devices (laptop, PDA, mail servers,…). The goal here is to propose mechanisms allowing defining personalized/specialized view encompassing all storage devices of the network (ex: data space of patients with diabetes type 1). With respect to the data spaces, the following points are in particular studied: how to specify them (What data must be extracted from the files and made available at the global level? What aggregated data should be dynamically computed?...), how to dynamically manage their data, how to merge them, how to secure them (notion of specialized/personalized and secured data space).
This thesis is carried out in collaboration with the ONCORA network http://oncoranet.lyon.fnclcc.fr/ and the above-mentioned SISRA. http://www.sante-ra.fr/