Thesis of Beatrix-Emőke Fülöp-Balogh

Multi-View Acquisition of Animated Scenes

Defense date: 13/12/2021

Advisor: Julie Digne
Coadvisor: Nicolas Bonneel

Recent technological breakthroughs have led to an abundance of consumer friendly video recording devices. Nowadays new smart phone models, for instance, are equipped not only with multiple cameras, but also depth sensors. This means that any event can easily be captured by several different devices and technologies at the same time, and it raises questions about how one can process the data in order to render a meaningful 3D scene. Most current solutions focus on static scenes only, LiDar scanners produce extremely accurate depth maps, and multi-view stereo algorithms can reconstruct a scene in 3D based on a handful of images. However, these ideas are not directly applicable in case of dynamic scenes. Depth sensors trade accuracy for speed, or vice versa, and color image based methods suffer from temporal inconsistencies or are too computationally demanding.
In this thesis we aim to provide consumer friendly solutions to fuse multiple, possibly heterogeneous, technologies to reconstruct and render 3D dynamic scenes.
Firstly, we introduce an algorithm that corrects distortions produced by small motions in time-of-flight acquisitions and outputs a corrected animated sequence. We do so by combining a slow but high-resolution time-of-flight LiDAR system and a fast but low-resolution consumer depth sensor. We cast the problem as a curve-to-volume registration, by seeing the LiDAR point cloud as a curve in the 4-dimensional spacetime and the captured low-resolution depth video as a 4-dimensional spacetime volume. We then advect the details of the high-resolution point cloud to the depth video using its optical flow.
Second, we tackle the case of the reconstruction and rendering of dynamic scenes captured by multiple RGB cameras. In casual settings, the two problems are hard to merge: structure from motion (SfM) produces spatio-temporally unstable and sparse point clouds, while the rendering algorithms that rely on the reconstruction need to produce temporally consistent videos. To ease the challenge, we consider the two steps together. First, for SfM, we recover stable camera poses, then we defer the requirement for temporally-consistent points across the scene and reconstruct only a sparse point cloud per timestep that is noisy in space-time. Second, for rendering, we present a variational diffusion formulation on depths and colors that lets us iii robustly cope with the noise by enforcing spatio-temporal consistency via per-pixel reprojection weights derived from the input views.
Overall, our work contributes to the understanding of the acquisition and rendering of casually captured dynamic scenes.   

James TompkinProfesseur(e) associé(e)Brown UniversityInvité(e)
Céline LoscosProfesseur(e)Université de ReimsRapporteur(e)
Adrien BousseauDirecteur(trice) de rechercheINRIA Sophia AntipolisRapporteur(e)
Edmond BoyerDirecteur(trice) de rechercheINRIA GrenoblePrésident(e)
Raphaëlle ChaineProfesseur(e)Université de LyonExaminateur​(trice)
Nicolas BonneelChargé(e) de RechercheCNRS - LyonCo-encadrant(e)
Julie DigneChargé(e) de RechercheCNRS - LyonDirecteur(trice) de thèse