12 Folgen

L'apprentissage statistique joue de nos jours un rôle croissant dans de nombreux domaines scientifiques et doit de ce fait faire face à des problèmes nouveaux. Il est par conséquent important de proposer des méthodes d'apprentissage statistique adaptées aux problèmes modernes posés par les différents champs d'application. Outre l'importance de la précision des méthodes proposées, elles devront également apporter une meilleure compréhension des phénomènes observés. Afin de faciliter les contacts entre les différentes communautés et de faire ainsi germer de nouvelles idées, un colloquium d'audience internationale (en langue anglaise) sur le thème «Challenging problems in Statistical Learning» a été organisé à l'Université Paris 1 les 28 et 29 janvier 2010.
Ce colloquium a été organisé par C. Bouveyron (Laboratoire SAMM, Paris 1) et G. Celeux (Select, INRIA Saclay) avec le soutien de la SFdS. Vous trouverez ci-dessous les enregistrements des exposés donnés lors de ce colloquium.
Recommandé à : étudiant de la discipline, chercheur - Catégorie : cours podcast - Année de réalisation : 2012

StatLearn 2010 - Workshop on "Challenging problems in Statistical Learning" Université Paris 1 Panthéon-Sorbonne

    • Kurse

L'apprentissage statistique joue de nos jours un rôle croissant dans de nombreux domaines scientifiques et doit de ce fait faire face à des problèmes nouveaux. Il est par conséquent important de proposer des méthodes d'apprentissage statistique adaptées aux problèmes modernes posés par les différents champs d'application. Outre l'importance de la précision des méthodes proposées, elles devront également apporter une meilleure compréhension des phénomènes observés. Afin de faciliter les contacts entre les différentes communautés et de faire ainsi germer de nouvelles idées, un colloquium d'audience internationale (en langue anglaise) sur le thème «Challenging problems in Statistical Learning» a été organisé à l'Université Paris 1 les 28 et 29 janvier 2010.
Ce colloquium a été organisé par C. Bouveyron (Laboratoire SAMM, Paris 1) et G. Celeux (Select, INRIA Saclay) avec le soutien de la SFdS. Vous trouverez ci-dessous les enregistrements des exposés donnés lors de ce colloquium.
Recommandé à : étudiant de la discipline, chercheur - Catégorie : cours podcast - Année de réalisation : 2012

    • video
    1.1 Ultrametric wavelet regression of multivariate time series: application to Colombian conflict analysis (Fionn Murtagh)

    1.1 Ultrametric wavelet regression of multivariate time series: application to Colombian conflict analysis (Fionn Murtagh)

    We first pursue the study of how hierarchy provides a well-adapted tool for the analysis of change. Then, using a time sequence-constrained hierarchical clustering, we develop the practical aspects of a new approach to wavelet regression. This provides a new way to link hierarchical relationships in a multivariate time series data set with external signals. Violence data from the Colombian conflict in the years 1990 to 2004 are used throughout. We conclude with some proposals for further study on the relationship between social violence and market forces, viz. between the Colombian conflict and the US narcotics market.

    • 47 Min.
    • video
    1.2 On the regularization of Sliced Inverse Regression (Stéphane Girard)

    1.2 On the regularization of Sliced Inverse Regression (Stéphane Girard)

    Sliced Inverse Regression (SIR) is an effective method for dimension reduction in highdimensional regression problems. The original method, however, requires the inversion of the predictors covariance matrix. In case of collinearity between these predictors or small sample sizes compared to the dimension, the inversion is not possible and a regularization technique has to be used. Our approach is based on an interpretation of SIR axes as solutions of an inverse regression problem. A prior distribution is then introduced on the unknown parameters of the inverse regression problem in order to regularize their estimation. We show that some existing SIR regularizations can enter our framework, which permits a global understanding of these methods. Three new priors are proposed, leading to new regularizations of the SIR method, and compared on simulated data. An application to the estimation of Mars surface physical properties from hyperspectral images is provided.

    • 49 Min.
    • video
    1.3 Simultaneous Gaussian Model-Based Clustering for Samples of Multiple Origins (Christophe Biernacki)

    1.3 Simultaneous Gaussian Model-Based Clustering for Samples of Multiple Origins (Christophe Biernacki)

    Mixture model-based clustering usually assumes that the data arise from a mixture population in order to estimate some hypothetical underlying partition of the dataset. In this work, we are interested in the case where several samples have to be clustered at the same time, that is when the data arise not only from one but possibly from several mixtures. In the multinormal context, we establish a linear stochastic link between the components of the mixtures wich allows to estimate jointly their parameter ? estimations are performed here by Maximum of Likelihood ? and to classsify simultaneously the diverse samples. We propose several useful models of constraint on this stochastic link, and we give their parameter estimators. The interest of those models is highlighted in a biological context where some birds belonging to several species have to be classified according to their sex. We show firstly that our simultaneous clustering method does improve the partition obtained by clustering independently each sample. We show then that this method is also efficient in order to assess the cluster number when assuming it is ignored. Some additional experiments are finally performed for showing the robustness of our simultaneous clustering method to one of its main assumption relaxing.

    • 1 Std.
    • video
    2.1 Mixed-Membership Stochastic Block-Models for Transactional Data (Hugh Chipman)

    2.1 Mixed-Membership Stochastic Block-Models for Transactional Data (Hugh Chipman)

    Transactional network data arise in many fields. Although social network models have been applied to transactional data, these models typically assume binary relations between pairs of nodes. We develop a latent mixed membership model capable of modelling richer forms of transactional data. Estimation and inference are accomplished via a variational EM algorithm. Simulations indicate that the learning algorithm can recover the correct generative model. We further present results on a subset of the Enron email dataset. This is a joint work with Mahdi Shafiei.

    • 55 Min.
    • video
    2.2 Visualization of graphs by organized clustering : application to social and biological networks (Nathalie Villa-Vialaneix)

    2.2 Visualization of graphs by organized clustering : application to social and biological networks (Nathalie Villa-Vialaneix)

    A growing number of applicative fields generate data that are pairwise relations between the objects under study instead of attributes associated to every object : social networks (relations between persons), biology (interactions between genes, proteins), www (relations between websites or blogs), marketing (relations between customers and services). To help understanding and interpreting such data, specific data analysis tools have been extended from the classical multivariate data analysis : visualization, clustering, classification This talk deals with an exploratory methodology : a common way to help understanding a graph is to cluster its vertices into relevant groups and then to represent the (simplified) graph of clusters. As will be explained, these two objectives (clustering and representation) can be somehow contradictory. Two approaches related to self-organizing maps will be presented and compared on real-world data to solve this issue. This is a joint work with Fabrice Rossi (LTCI, Télécom ParisTech).

    • 57 Min.
    • video
    2.3 A Mixture of Experts Latent Position Cluster Model for Social Network Data (Claire Gormley)

    2.3 A Mixture of Experts Latent Position Cluster Model for Social Network Data (Claire Gormley)

    Social network data represent the interactions between a group of social actors. Interactions between colleagues and friendship networks are typical examples of such data. The latent space model for social network data locates each actor in a network in a latent (social) space and models the probability of an interaction between two actors as a function of their locations. The latent position cluster model extends the latent space model to deal with network data in which clusters of actors exist ? actor locations are drawn from a finite mixture model, each component of which represents a cluster of actors. A mixture of experts model builds on the structure of a mixture model by taking account of both observations and associated covariates when modeling a heterogeneous population. Herein, a mixture of experts extension of the latent position cluster model is developed. The mixture of experts framework allows covariates to enter the latent position cluster model in a number of ways, yielding different model interpretations. Estimates of the model parameters are derived in a Bayesian framework using a Markov Chain Monte Carlo algorithm. The algorithm is generally computationally expensive ? surrogate proposal distributions which shadow the target distributions are derived, reducing the computational burden. The methodology is demonstrated through an illustrative example detailing relations between a group of lawyers in the USA.

    • 49 Min.

Top‑Podcasts in Kurse

Mehr von Université Paris 1 Panthéon-Sorbonne