# Multistructured Documents for Digital Humanities We work on the representation and construction of multistructured documents. A document is multistructured when the diversity of its uses implies the co-existence of annotation layers on the content. Such documents often appear in the context of the Digital Humanities when scholars are involved in the critical edition of a work. Thus, we often work in a multidisciplinary context. # Challenges and Contributions We proposed the *Multi-Structured Document Model (MSDM)*[[1]](#p1) for representing and querying multistructured documents. MSDM can factor optimally the documentary fragments shared by multiple structures while still allowing individual structures to share the same content. Moreover, we proposed the MultiX formalism to represent a MSDM document in XML. Contrary to other solutions that often require deep modification of the core XML model, MultiX only makes use of XQuery functions. We also studied the problem of emerging structures during the construction of multistructured documents. We proposed a methodology for controlling this emergence[[2]](#p2)[[3]](#p3). The methodology is radically innovative since it makes use of occurrences of overlapping events between arborescent structures in order to let emerge new structures by the dynamic refactoring of annotation vocabularies. Before, overlaps were instead considered as the principal cause for the representation problem of multistructured documents. This approach is embodied in a philological platform named Dinah[[4]](#p4) and meant to help scholars in their critical work on manuscripts. Dinah has been tested in the context of various digital edition projects ([Jean-Toussaint Desanti archives](http://institutdesanti.ens-lyon.fr/), ["Bouvard et Pécuchet" project](http://www.dossiers-flaubert.fr/), a new international project for the edition of the "Encyclopédie" by Diderot and d'Alembert). We now take the challenge of designing an edition tool for collaborative data structuring such that the data structures are updatable, guarantee the consistency of the collective editorial project and reflect the different editors' needs in terms of expressivity. This work is based on a bidirectionalization of the Annotation Graph model[[5]](#p5). # Contributors * Pierre-Edouard Portier * Sylvie Calabretto * Vincent Barrellon * Elöd Egyed-Zsigmond * Jean-Marie Pinon * Noureddine Chatti # Grants * Regional Grant from ARC 5 for the thesis of Vincent Barrellon (2013-2016) ARCs (Academic Research Communities) are meant to structure research in Rhône-Alpes around societal issues, and to promote interdisciplinary and transversal research. ARC5 (Cultures, Sciences, Societies, and Mediations) is about cultural contents and the social practices that give them birth. ARC5 2nd axis (Numeric Cultures) is about interdisciplinary research with innovative information society uses. The grant for Vincent Barrellon tesis comes from ARC5 2nd axis. * Regional Grant from Cluster 13 for the thesis of Pierre-Edouard Portier (2007-2010) Clusters are ancestors of the ARCs, and they played a similar role. ## Selected publications <a name="p1"></a>[1] Pierre-Edouard Portier, Noureddine Chatti, Sylvie Calabretto, Elöd Egyed-Zsigmond, and Jean-Marie Pinon. Modeling, encoding and querying multi-structured documents. *Inf. Process. Manage., 48(5)* :931–955, 2012. [(PDF)](http://liris.cnrs.fr/Documents/Liris-5448.pdf) <a name="p2"></a>[2] Pierre-Edouard Portier and Sylvie Calabretto. Introduction of a dynamic assistance to the creative process of adding dimensions to multistructured documents. In *Matthew R. B. Hardy and Frank Wm. Tompa, editors, ACM Symposium on Document Engineering*, pages 167–170. ACM, 2011. [(PDF)](http://liris.cnrs.fr/Documents/Liris-5309.pdf) <a name="p3"></a>[3] Pierre-Edouard Portier and Sylvie Calabretto. Creation and maintenance of multi-structured documents. In *Uwe M. Borghoff and Boris Chidlovskii, editors, ACM Symposium on Document Engineering*, pages 181–184. ACM, 2009. [(PDF)](http://liris.cnrs.fr/Documents/Liris-4286.pdf) <a name="p4"></a>[4] Pierre-Edouard Portier and Sylvie Calabretto. Dinah, a philological platform for the construction of multi-structured documents. In *Mounia Lalmas, Joemon M. Jose, Andreas Rauber, Fabrizio Sebastiani, and Ingo Frommholz, editors, ECDL, volume 6273 of Lecture Notes in Computer Science*, pages 364–375. Springer, 2010. [(PDF)](http://liris.cnrs.fr/Documents/Liris-4722.pdf) <a name="p5"></a>[5] Vincent Barrellon. Collaborative Construction of Updatable Digital Critical Editions: A Generic Approach. Technical Report (http://liris.cnrs.fr/publis/?id=6855) published in *the Doctoral Symposium of DL 2014*. [(PDF)](http://liris.cnrs.fr/Documents/Liris-6855.pdf) ## Software * Dinah (https://github.com/peportier/dinah)