Semantic Macro-Segmentation of Multimedia Sequences - Archive ouverte HAL Accéder directement au contenu
Pré-Publication, Document De Travail Année : 2006

Semantic Macro-Segmentation of Multimedia Sequences

Macro-segmentation sémantique des séquences multimédia

Résumé

This thesis deals with semantci structuration of video sequences also known as macro-segmentation. Tow approaches have been studied : an deterministic approach and a stochastic one. We propose a deterministic framework for the task of automatic video segmentation which is a sort of a finite automaton whose states relate to content units. The framework allows for multilevel hierarchical content structures which are generated recursively, beginning at the highest semantic level. At each semantic level the parsing automaton is governed by specific templates which cause state transitions according with grammar restrictions. These templates are defined as combinations of intermediate semantic features or short-term events connected by certain relationships in time. They allow one to express prior knowledge about particular characteristics of semantic segments, usually relying on specific production rules that are typically employed by video producers to convey semantic information to a viewer. An advantage of the proposed segmentation framework is in its expressiveness and low computational complexity. As it is based solely on prior knowledge, it does not require preliminarily learning and can be employed at once, without the need of tedious manual annotation of learning data. We apply and experimentally evaluate this framework on the task of tennis video segmentation where output content is naturally represented in a hierarchical manner so as an input tennis match at first is divided into sets and pauses between sets, or breaks, then each set is further decomposed into games and breaks etc. Automatically recognized score boards and tennis court views are used as intermediate events in this task. We also propose a video segmentation framework based on a stochastic automaton which is learned to deal properly with time constraints on semantic segments durations and ambiguity in observable features. Its probabilistic nature allows for fusion of multi-modal audio-visual evidences in a symmetrical, consistent and scalable manner. Instead of definitive rules of the deterministic approach, the posterior probability of semantic segment transitions is estimated first. Segment boundaries are then positioned so as to maximize the total posterior probability. It is shown that such a decision rule yields the maximal recall and precision which are commonly used segmentation performance measures. A computationally tractable algorithm for the corresponding task of constrained maximization is proposed. The posterior probability of segment boundaries is estimated using, in particular, a variable duration hidden Markov model which has been proved to be a powerful mean in modeling of time sequences. As an alternative to the posterior probability maximization total for the whole input video, a one-pass version of segmentation framework is proposed which selects each subsequent segment boundary as the most probable one assuming that the previous boundary is known definitively. This modification is particularly useful for real-time applications where segmentation is performed already before the end of video is attained. The proposed stochastic framework is applied and experimentally evaluated on the task of narrative film segmentation into scenes. The test results have shown enhancement of segmentation performance when multiple audio-visual segment evidences of segment boundaries are fused and time constraints are taken into account. The resulting performance was higher as compared to deterministic rule-based fusion techniques. Higher segmentation performance was also observed when our segmentation criterion that maximizes the total posterior probability of segment boundaries was applied instead of the Viterbi segmentation algorithm which is commonly used with hidden Markov models. In this work we are also concerned with the problem of video summarization – compact representation of the original video. A video summary can have an independent meaning aimed to quickly get acquainted a viewer with the content of video or it can be generated for each semantic segment of a content table forming so called digest. Pictorial digests provide a convenient interface for navigation with content tables where each unit is visually represented with one or just several key frames. We propose a versatile approach which can be used to create summaries that are customizable to specific user’s preferences to different type of video. A high versatility of the approach is based on a unified importance score measure of video segments which fuses multiple features extracted from both the audio and video streams. This measure provides the possibility to highlight the specific moments in a video and at the same time to select the most representative video shots. Its coefficients can be interactively tuned due to a high computational speed of the approach.
La segmentation de vidéo en unités sémantiques temporelles fournit des indices qui sont importants pour organiser un parcours et une navigation efficaces basés sur le contenu. Dans ce travail nous sommes concernés avec le problème de macro-segmentation visant de produire automatiquement des tables des matières de vidéos. Nous proposons une méthode déterministe qui est une sorte d'un automate fini et qui permet de formuler des règles de segmentation basées sur la connaissance à priori des principes de production de vidéo. La méthode a été adoptée et évaluée sur la vidéo de tennis. Nous proposons aussi une approche statistique où les règles de segmention sont choisies de manière à ce que les indices de performance, précision et rappel, soient maximisés. L'approche est appliquée à la tâche de segmentation de films en scènes sémantiques. Dans ce travail nous sommes aussi concernés avec le problème de création automatique de résumé vidéo.
Fichier non déposé

Dates et versions

hal-01458931 , version 1 (07-02-2017)

Identifiants

  • HAL Id : hal-01458931 , version 1

Citer

Viachaslau Parshyn. Semantic Macro-Segmentation of Multimedia Sequences. 2006. ⟨hal-01458931⟩
65 Consultations
0 Téléchargements

Partager

Gmail Facebook X LinkedIn More