A new mesh optimization framework for 3D triangular surface meshes is presented, which formulates the task as an energy minimization problem in the same spirit as in Hoppe et al. [1]. The desired mesh properties are controlled through a global energy function including data attached terms measuring the fidelity to the original mesh, shape potentials favoring high quality triangles and connectivity as well as budget terms controlling the sampling density. The optimization algorithm modifies mesh connectivity as well as the vertex positions. Solutions for the vertex repositioning step are obtained by a discrete graph cut algorithm examining global combinations of local candidates. Results on various 3D meshes compare favorably to recent state-of-the-art algorithms. Applications consist in optimizing triangular meshes and in simplifying meshes, while maintaining high mesh quality. Targeted areas are the improvement of the accuracy of numerical simulations, the convergence of numerical schemes, improvements of mesh rendering (normal field smoothness) or improvements of the geometric prediction in mesh compression techniques.
We present a new method for blind document bleed through removal based on separate Markov Random Field (MRF) regularization for the recto and for the verso side, where separate priors are derived from the full graph. The segmentation algorithm is based on Bayesian Maximum a Posteriori (MAP) estimation. The advantages of this separate approach are the adaptation of the prior to the contents creation process (e.g. superimposing two hand written pages), and the improvement of the estimation of the recto pixels through an estimation of the verso pixels covered by recto pixels; Moreover, the formulation as a binary labeling problem with two hidden labels per pixels naturally leads to an efficient optimization method based on the minimum cut/maximum flow in a graph. The proposed method is evaluated on scanned document images from the 18th century, showing an improvement of character recognition results compared to other restoration methods.
We introduce a new causal hierarchical belief network for image segmentation. Contrary to classical tree structured (or pyramidal) models, the factor graph of the network contains cycles. Each level of the hierarchical structure features the same number of sites as the base level and each site on a given level has several neighbors on the parent level. Compared to tree structured models, the (spatial) random process on the base level of the model is stationary which avoids known drawbacks, namely visual artifacts in the segmented image. We propose different parameterizations of the conditional probability distributions governing the transitions between the image levels. A parametric distribution depending on a single parameter allows the design of a fast inference algorithm on graph cuts, whereas for arbitrary distributions, we propose inference with loopy belief propagation. The method is evaluated on scanned documents, showing an improvement of character recognition results compared to other methods.
Evaluation of object detection algorithms is a non-trivial task: a detection result is usually evaluated by comparing the bounding box of the detected object with the bounding box of the ground truth object. The commonly used precision and recall measures are computed from the overlap area of these two rectangles. However, these measures have several drawbacks: they don't give intuitive information about the proportion of the correctly detected objects and the number of false alarms, and they cannot be accumulated across multiple images without creating ambiguity in their interpretation. Furthermore, quantitative and qualitative evaluation is often mixed resulting in ambiguous measures.
In this paper we propose a new approach which tackles these problems. The performance of a detection algorithm is illustrated intuitively by performance graphs which present object level precision and recall depending on constraints on detection quality. In order to compare different detection algorithms, a representative single performance value is computed from the graphs. The influence of the test database on the detection performance is illustrated by performance/generality graphs. The evaluation method can be applied to different types of object detection algorithms. It has been tested on different text detection algorithms, among which are the participants of the ICDAR 2003 text detection competition.
@Article{WolfIJDAR2006,
Author = {C. Wolf and J.-M. Jolion},
Title = {Object count/Area Graphs for the Evaluation of Object Detection and Segmentation Algorithms},
Journal = {International Journal on Document Analysis and Recognition},
year = {2006},
volume = {8},
number = {4},
pages = {280-296}
}
@Article{WolfPAA03,
Author = {C. Wolf and J.-M. Jolion},
Title = {Extraction and {R}ecognition of {A}rtificial {T}ext in {M}ultimedia {D}ocuments},
Journal = {Pattern {A}nalysis and {A}pplications},
year = {2003},
volume = {6},
number = {4},
pages = {309-326}
}
@InProceedings{WolfICPR2002V,
Author = {C. Wolf and J.-M. Jolion and F. Chassaing},
Title = {Text {L}ocalization, {E}nhancement and {B}inarization in {M}ultimedia {D}ocuments},
BookTitle = {Proceedings of the {I}nternational {C}onference on {P}attern {R}ecognition},
Volume = {2},
Pages = {1037-1040},
year = 2002,
}
@InProceedings{WolfICPR2002M,
Author = {C. Wolf and D. Doermann},
Title = {Binarization of {L}ow {Q}uality {T}ext using a {M}arkov {R}andom {F}ield {M}odel},
BookTitle = {Proceedings of the {I}nternational {C}onference on {P}attern {R}ecognition},
Volume = {3},
Pages = {160-163},
year = 2002,
}
Les systémes d'indexation ou de recherche par le contenu disponibles actuellement travaillent sans connaissance (systémes pré-attentifs). Malheureusement les requétes construites ne correspondent pas toujours aux résultats obtenus par un humain qui interpréte le contenu du document. Le texte présent dans les vidéos représente une caractéristique é la fois riche en information et cependant simple, cela permet de compléter les requétes classiques par des mots clefs.
Nous présentons dans cet article un projet visant é la détection et la reconnaissance du texte présent dans des images ou des séquences vidéo. Nous proposons un schéma de détection s'appuyant sur la mesure du gradient directionnel cumulé. Dans le cas des séquences vidéo, nous introduisons un processus de fiabilisation des détections et l'amélioration des textes détectés par un suivi et une intégration temporelle.
Interest point detectors are used in computer vision to detect image points with special properties, which can be geometric (corners) or non-geometric (contrast etc.). Gabor functions and Gabor filters are regarded as excellent tools for feature extraction and texture segmentation. This article presents methods how to combine these methods for content based image retrieval and to generate a textural description of images. Special emphasis is devoted to distance measure texture descriptions. Experimental results of a query system are given.
This work was supported in part by the Austrian Science Foundation (FWF) under grant S-7002-MAT.
@InProceedings{WolfICPR2000,
Author = {C. Wolf and J.M. Jolion and W. Kropatsch and H. Bischof},
Title = {Content {B}ased {I}mage {R}etrieval using {I}nterest {P}oints and {T}exture {F}eatures},
BookTitle = {Proceedings of the {I}nternational {C}onference on {P}attern {R}ecognition},
Volume = {4},
Pages = {234-237},
year = 2000,
}
Dans cet article nous abordons la probléme de la binarisation de "boites", i.e. sous-image, contenant du texte. Nous montrons que la spécificité des contenus vidéos améne é la conception d'une nouvelle approche de cette étape de binarisation en regard des techniques habituelles tant du traitement d'image au sens large, que du domaine de l'analyse de documents écrits.
We present in this paper some researches on thresholding of "text boxes" (sub-images containing artificial texts and extracted from videos). We show that the particular context of videos leads to the formalization of a new approach of this step regarding the usual and wellknow techniques used in image analysis and more particularly for segmentation of written documents.
Evaluation of object detection algorithms is a non-trivial task: a detection result is usually evaluated by comparing the bounding box of the detected object with the bounding box of the ground truth object. The commonly used precision and recall measures are computed from the overlap area of these two rectangles. However, these measures have several drawbacks: they don't give intuitive information about the proportion of the correctly detected objects and the number of false alarms, and they cannot be accumulated across multiple images without creating ambiguity in their interpretation. Furthermore, quantitative and qualitative evaluation is often mixed resulting in ambiguous measures.
In this paper we propose an approach to evaluation which tackles these problems. The performance of a detection algorithm is illustrated intuitively by performance graphs which present object level precision and recall depending on constraints on detection quality. In order to compare different detection algorithms, a representative single performance value is computed from the graphs. The evaluation method can be applied to different types of object detection algorithms. It has been tested on different text detection algorithms, among which are the participants of the Image Eval text detection competition.
Our team from the University of Maryland and INSA de Lyon participated in the feature extraction evaluation with overlay text features and in the search evaluation with a query retrieval and browsing system. For search we developed a weighted query mechanism by integrating 1) text (OCR and speech recognition) content using full text and n-grams through the MG system, 2) color correlogram indexing of image and video shots reported last year in TREC, and 3) ranked versions of the extracted binary features. A command line version of the interface allows users to formulate simple queries, store them and use weighted combinations of the simple queries to generate compound queries.
One novel component of our interactive approach is the ability for the users to formulate dynamic queries previously developed for database applications at Maryland. The interactive interface treats each video clip as visual object in a multi-dimensional space, and each "feature" of that clip is mapped to one dimension. The user can visualize any two dimensions by placing any two features on the horizontal and vertical axis with additional dimensions visualized by adding attributes to each object.
This work situates itself within the framework of image and video indexation. The systems currently available for the content based image and video retrieval work without semantic knowledge, i.e. they use image processing methods to extract low level features of the data. The similarity obtained by these approaches does not always correspond to the similarity a human user would expect. A way to include more semantic knowledge into the indexing process is to use the text included in the images and video sequences. It is rich in information but easy to use.
Existing methods for text detection are simple: most of them are based on texture estimation or edge detection followed by an accumulation of these characteristics. Geometrical contraints are enforced by most of the methods. However, it is done in a morphological post-processing step only. It is obvious, that a weak detection is very difficult --- up to impossible --- to correct in a post-processing step. We propose to take into account the geometrical constraints directly in the detection phase. Unfortunately, this is a chicken-egg problem: in order to estimate geometrical constraints, we first need to detect text. Consequently, we suggest a two-step algorithm: a first coarse detection calculates a text "probability" image. Afterwards, for each pixel we calculate geometrical properties of the eventual surrounding text rectangle. These features are added to the features of the first step and fed into a support vector machine classifier.
For the application to video sequences, we propose an algorithm which detects text on a frame by frame basis, tracking the found text rectangles accross multiple frames. For each text appearance, a single enhanced image is robustly created by multiple frame integration.
We tackle the character segmentation problem and suggest two different methods: the first algorithm maximizes a criterion based on the local contrast in the image. The second approach exploits a priori knowledge on the spatial distribution of the text and non-text pixels in the image in order to enhance the segmentation decisions. The a priori knowledge is learned from training images and stored in a statistical Markov random field model. This model is integrated into Bayesian estimation framework in order to obtain an estimation of the original binary image.
We address the video indexing challenge with a method integrating several features extracted from the video. Among others, text extracted with the method mentioned above, is one of the informations sources for the indexing algorithm.
@PhdThesis{WolfPhD2003,
author = {C. Wolf},
title = {Text {D}etection in {I}mages taken from {V}ideos {S}equences for {S}emantic {I}ndexing},
school = {INSA de Lyon},
year = {2003},
address = {20, rue Albert Einstein, 69621 Villeurbanne Cedex, France},
}
Graphs and hyper-graphs are frequently used to recognize complex and often non-rigid patterns in computer vision, either through graph matching or point-set matching with graphs. Most formulations resort to the minimization of a difficult energy function containing geometric or structural terms, frequently coupled with data attached terms involving appearance information. Traditional methods solve the minimization problem approximately, for instance with spectral techniques. In this paper we deal with data embedded in a 3D "space-time", for instance in action recognition applications. We show that, in this context, we can take advantage of special properties of the time domain, in particular causality and the linear order of time. We show that the complexity of the exact matching problem is far inferior to the complexity of the general problem and we derive an algorithm calculating the exact solution. As a second contribution, we propose a new graphical structure which is elongated in time. We argue that, instead of approximately solving the original problem, a better solution can be obtained by exactly solving an approximated problem. An exact minimization algorithm is derived for this structure and successfully applied to action recognition in videos.
We present a new machine learning-based algorithm capable of classifying individual human activities from very short sequences. Our method is based on a "deep" multi-stage architecture where each layer is learned independently of the other layers. Low-level shape features are extracted from short sequences of binary shapes and fed to a sequential probabilistic model (a conditional deep belief network), which learns the evolution of the low-level features through time through interactions with binary latent variables. No appearance model is needed. Actions are classified using an SVM trained on the posterior probabilities of the latent features extracted by the motion model. The method is capable of not only recognizing actions but also localizing them in space and time. We evaluated the algorithm on two different databases, the well known Weizmann dataset and our own, more challenging, dataset.
This paper presents a global mesh optimization framework for 3D triangular meshes of arbitrary topology. The mesh optimization task is formulated as an energy minimization problem including data attached terms measuring the fidelity to the original mesh as well as a shape potential favoring high quality triangles. Since the best solution for vertex relocation is strongly related to the mesh connectivity, our approach iteratively modifies this connectivity (edge and vertex addition/removal) as well as the vertex positions. Good solutions for the energy function minimization are obtained by a discrete graph cut algorithm examining global combinations of local candidates. Results on various 3D meshes compare favorably to recent state-of-the-art algorithms regarding the trade-off between triangle shape improvement and surface fidelity. Applications of this work mainly consist in regularizing meshes for numerical simulations, for improving mesh rendering or for improving the geometric prediction in mesh compression techniques.
We introduce a new causal hierarchical belief network for image segmentation. Contrary to classical tree structured (or pyramidal) models, the factor graph of the network contains cycles. Each level of the hierarchical structure features the same number of sites as the base level and each site on a given level has several neighbors on the parent level. Compared to tree structured models, the (spatial) random process on the base level of the model is stationary which avoids known drawbacks, namely visual artifacts in the segmented image. We propose different parameterizations of the conditional probability distributions governing the transitions between the image levels. A parametric distribution depending on a single parameter allows the design of a fast inference algorithm on graph cuts, whereas for arbitrary distributions, we propose inference with loopy belief propagation. The method is evaluated on scanned document images from the 18th century, showing an improvement of character recognition results compared to other methods.
In a previous publication we presented a double MRF model capable of separatly regularizing the recto and verso side of a document suffering from ink bleed through. In this paper we show that this model naturally leads to an efficient optimization method based on the minimum cut/maximum flow in a graph. The proposed method is evaluated on scanned document images from the 18th century, showing an improvement of character recognition results compared to other restoration methods.
We present a new method for blind document bleed through removal based on separate Markov Random Field (MRF) regularization for the recto and for the verso side. The segmentation algorithm is based on Bayesian Maximum a Posteriori (MAP) estimation, where the prior model is made of two conditionally independent MRFs with a single observation field. The advantages of this separate approach are the adaptation of the prior to the contents creation process (e.g. superimposing two hand written pages), and the improvement of the estimation of the verso pixels through an estimation of the verso pixels covered by recto pixels. Optimization is carried out with the simulated annealing algorithm. The labels of the initial recto and verso clusters are recognized without using any color or gray value information. The proposed method is evaluated on synthetic images as well as scanned document images. The results on real scanned data have been evaluated using statistical evaluation on an empirical test performed by 16 people.
Evaluation of object detection algorithms is a non-trivial task: a detection result is usually evaluated by comparing the bounding box of the detected object with the bounding box of the ground truth object. The commonly used precision and recall measures are computed from the overlap area of these two rectangles. However, these measures have several drawbacks: they don't give intuitive information about the proportion of the correctly detected objects and the number of false alarms, and they cannot be accumulated across multiple images without creating ambiguity in their interpretation. Furthermore, quantitative and qualitative evaluation is often mixed resulting in ambiguous measures.
In this paper we propose a new approach which tackles these problems. The performance of a detection algorithm is illustrated intuitively by performance graphs which present object level precision and recall depending on constraints on detection quality. In order to compare different detection algorithms, a representative single performance value is computed from the graphs. The influence of the test database on the detection performance is illustrated by performance/generality graphs. The evaluation method can be applied to different types of object detection algorithms. It has been tested on different text detection algorithms, among which are the participants of the ICDAR 2003 text detection competition.