1 hr

02 - Apprentissage semi-supervisé pour la classification d'images - Cordelia SCHMID & Jakob VERBEEK SAMOS - Colloquium "Statistiques pour le traitement de l'image" (Conférences, 2009)

- Courses

In the first part we are interested in finding images of people on the web, and more specifically within large databases of captioned news images. It has recently been shown that visual analysis of the faces in images returned on a text-based query over captions can significantly improve search results. The underlying idea to improve the text-based results is that although this initial result is imperfect, it will render the queried person to be relatively frequent as compared to other people, so we can search for a large group of highly similar faces. The performance of such methods depends strongly on this assumption: for people whose face appears in less than about 40% of the initial text-based result, the performance may be very poor. I will present a method to improve search results by exploiting faces of other people that co-occur frequently with the queried person. We refer to this process as `query expansion'. In the face analysis we use the query expansion to provide a query-specific relevant set of `negative' examples which should be separated from the potentially positive examples in the text-based result set. We apply this idea to a recently-proposed method which filters the initial result set using a Gaussian mixture model, and apply the same idea using a logistic discriminant model. We evaluate the methods on a database of captioned news stories from Yahoo!News. The results show that (i) query expansion improves both methods, (ii) that our discriminative models outperform the generative ones, and (iii) our best results surpass the state-of-the-art results by 10% precision on average. In the second part we are interested in Conditional Random Fields (CRFs), which are an effective tool for a variety of different data segmentation and labelling tasks including visual scene interpretation, which seeks to partition images into their constituent semantic-level regions and assign appropriate class labels to each region. For accurate labelling it is important to capture the global context of the image as well as local information. We introduce a CRF based scene labelling model that incorporates both local features and features aggregated over the whole image or large sections of it. Secondly, traditional CRF learning requires fully labelled datasets. Complete labellings are typically costly and troublesome to produce. We introduce an algorithm that allows CRF models to be learned from datasets where a substantial fraction of the nodes are unlabeled. It works by marginalizing out the unknown labels so that the log-likelihood of the known ones can be maximized by gradient ascent. Loopy Belief Propagation is used to approximate the marginals needed for the gradient and log-likelihood calculations and the Bethe free-energy approximation to the log-likelihood is monitored to control the step size. Our experimental results show that incorporating top-down aggregate features significantly improves the segmentations and that effective models can be learned from fragmentary labellings. The resulting methods give scene segmentation results comparable to the state-of-the-art on three different image databases. Références : T. Mensink & J. Verbeek, Improving People Search Using Query Expansions: How Friends Help To Find People, European Conference on Computer Vision, 2008. J. Verbeek & B. Triggs, Scene Segmentation with CRFs Learned from Partially Labeled Images, Advances in Neural Information Processing Systems, 2007. Cordelia Schmid & Jakob Verbeek. INRIA Rhône-Alpes. Vous pouvez entendre l'intervention, tout en visualisant le Power Point, en cliquant sur ce lien : http://epn.univ-paris1.fr/modules/ufr27statim/UFR27STATIM-20090123-Verbeek/UFR27STATIM-20090123-Verbeek.html. Ecouter l'intervention : Bande son disponible au format mp3 Durée : 1H01 mn