StatLearn 2012 - Workshop on "Challenging problems in Statistical Learning"

Statlearn2012

L'apprentissage statistique joue de nos jours un rôle croissant dans de nombreux domaines scientifiques et doit de ce fait faire face à des problèmes nouveaux. Il est par conséquent important de proposer des méthodes d'apprentissage statistique adaptées aux problèmes modernes posés par les différents champs d'application. Outre l'importance de la précision des méthodes proposées, elles devront également apporter une meilleure compréhension des phénomènes observés. Afin de faciliter les contacts entre les différentes communautés et de faire ainsi germer de nouvelles idées, un colloquium d'audience internationale (en langue anglaise) sur le thème «Challenging problems in Statistical Learning» a été organisé à l'Université Paris 1 les 5 et 6 avril 2012. Vous trouverez ci-dessous les enregistrements des exposés donnés lors de ce colloquium. Ce colloquium a été organisé par C. Bouveyron, Christophe Biernacki , Alain Célisse , Serge Iovle & Julien Jacques (Laboratoire SAMM, Paris 1, Laboratoire Paul Painlevé, Université Lille 1, CNRS & Modal, INRIA), avec le soutien de la SFdS. Recommandé à : étudiant de la discipline, chercheur - Catégorie : cours podcast - Année de réalisation : 2012

الحلقات

  1. ٠٣‏/١٢‏/٢٠١٤ · فيديو

    1.1 Dimension reduction based on finite mixture modeling of inverse regression (Luca Scrucca)

    Consider the usual regression problem in which we want to study the conditional distribution of a response Y given a set of predictors X. Sufficient dimension reduction (SDR) methods aim at replacing the high-dimensional vector of predictors by a lower-dimensional function R(X) with no loss of information about the dependence of the response variable on the predictors. Almost all SDR methods restrict attention to the class of linear reductions, which can be represented in terms of the projection of X onto a dimension-reduction subspace (DRS). Several methods have been proposed to estimate the basis of the DRS, such as sliced inverse regression (SIR; Li, 1991), principal Hessian directions (PHD; Li, 1992), sliced average variance estimation (SAVE; Cook and Weisberg, 1991), directional regression (DR; Li et al., 2005) and inverse regression estimation (IRE; Cook and Ni, 2005). A novel SDR method, called MSIR, based on finite mixtures of Gaussians has been recently proposed (Scrucca, 2011) as an extension to SIR. The talk will present the MSIR methodology and some recent advances. In particular, a BIC criterion for the selection the dimensionality of DRS will be introduced, and its extension for the purpose of variable selection. Finally, the application of MSIR in classification problems, both supervised and semi-supervised, will be discussed.

    ١ من الساعات
  2. ٠٣‏/١٢‏/٢٠١٤ · فيديو

    3.3 Complexity control in overlapping stochastic block models (Pierre Latouche)

    Networks are highly used to represent complex systems as sets of interactions between units of interest. For instance, regulatory networks can describe the regulation of genes with transcriptional factors while metabolic networks focus on representing pathways of biochemical reactions. In social sciences, networks are commonly used to represent relational ties between actors. Numerous graph clustering algorithms have been proposed since the earlier work of Moreno [2]. Most of them partition the vertices into disjoint clusters depending on their connection profiles. However, recent studies showed that these techniques were too restrictive since most existing networks contained overlapping clusters. To tackle this issue, we proposed the Overlapping Stochastic Block Model (OSBM) in [1]. This approach allows the vertices of a network to belong to multiple classes and can be seen as a generalization of the stochastic block model [3]. In [1], we developed a variational method to cluster the vertices of networks and showed that the algorithm had good clustering performances on both simulated and real data. However, no criterion was proposed to estimate the number of classes from the data, which is a major issue in practice. Here, we tackle this limit using a Bayesian framework. Thus, we introduce some priors over the model parameters and consider variational Bayes methods to approximate the full posterior distribution. We show how a model selection criterion can be obtained in order to estimate the number of (overlapping) clusters in a network. On both simulated and real data, we compare our work with other approaches.

    ٥٤ من الدقائق
  3. ٠٣‏/١٢‏/٢٠١٤ · فيديو

    4.3 Transfer to an Unlabeled Task using kernel marginal predictors (Gilles Blanchard)

    We consider a classification problem: the goal is to assign class labels to an unlabeled test data set, given several labeled training data sets drawn from different but similar distributions. In essence, the goal is to predict labels from (an estimate of) the marginal distribution (of the unlabeled data) by learning the trends present in related classification tasks that are already known. In this sense, this problem belongs to the category of so-called "transfer learning" in machine learning. The probabilistic model used is that the different training and test distributions are themselves i.i.d. realizations from a distribution on distributions. Conceptually, this setting can be related to traditional random effects models in statistics, although here the approach is nonparametric and distribution-free. This problem arises in several applications where data distributions fluctuate because of biological, technical, or other sources of variation. We develop a distribution-free, kernel-based approach to the problem. This approach involves identifying an appropriate reproducing kernel Hilbert space and optimizing a regularized empirical risk over the space. We present generalization error analysis, describe universal kernels, and establish universal consistency of the proposed methodology. Experimental results on flow cytometry data are presented.

    ٥٢ من الدقائق

حول

L'apprentissage statistique joue de nos jours un rôle croissant dans de nombreux domaines scientifiques et doit de ce fait faire face à des problèmes nouveaux. Il est par conséquent important de proposer des méthodes d'apprentissage statistique adaptées aux problèmes modernes posés par les différents champs d'application. Outre l'importance de la précision des méthodes proposées, elles devront également apporter une meilleure compréhension des phénomènes observés. Afin de faciliter les contacts entre les différentes communautés et de faire ainsi germer de nouvelles idées, un colloquium d'audience internationale (en langue anglaise) sur le thème «Challenging problems in Statistical Learning» a été organisé à l'Université Paris 1 les 5 et 6 avril 2012. Vous trouverez ci-dessous les enregistrements des exposés donnés lors de ce colloquium. Ce colloquium a été organisé par C. Bouveyron, Christophe Biernacki , Alain Célisse , Serge Iovle & Julien Jacques (Laboratoire SAMM, Paris 1, Laboratoire Paul Painlevé, Université Lille 1, CNRS & Modal, INRIA), avec le soutien de la SFdS. Recommandé à : étudiant de la discipline, chercheur - Catégorie : cours podcast - Année de réalisation : 2012

المزيد من Université Paris 1 Panthéon-Sorbonne