StatLearn 2010 - Workshop on "Challenging problems in Statistical Learning"

Statlearn2010
StatLearn 2010 - Workshop on "Challenging problems in Statistical Learning"

L'apprentissage statistique joue de nos jours un rôle croissant dans de nombreux domaines scientifiques et doit de ce fait faire face à des problèmes nouveaux. Il est par conséquent important de proposer des méthodes d'apprentissage statistique adaptées aux problèmes modernes posés par les différents champs d'application. Outre l'importance de la précision des méthodes proposées, elles devront également apporter une meilleure compréhension des phénomènes observés. Afin de faciliter les contacts entre les différentes communautés et de faire ainsi germer de nouvelles idées, un colloquium d'audience internationale (en langue anglaise) sur le thème «Challenging problems in Statistical Learning» a été organisé à l'Université Paris 1 les 28 et 29 janvier 2010. Ce colloquium a été organisé par C. Bouveyron (Laboratoire SAMM, Paris 1) et G. Celeux (Select, INRIA Saclay) avec le soutien de la SFdS. Vous trouverez ci-dessous les enregistrements des exposés donnés lors de ce colloquium. Recommandé à : étudiant de la discipline, chercheur - Catégorie : cours podcast - Année de réalisation : 2012

Выпуски

  1. 04.12.2014

    2.3 A Mixture of Experts Latent Position Cluster Model for Social Network Data (Claire Gormley)

    Social network data represent the interactions between a group of social actors. Interactions between colleagues and friendship networks are typical examples of such data. The latent space model for social network data locates each actor in a network in a latent (social) space and models the probability of an interaction between two actors as a function of their locations. The latent position cluster model extends the latent space model to deal with network data in which clusters of actors exist ? actor locations are drawn from a finite mixture model, each component of which represents a cluster of actors. A mixture of experts model builds on the structure of a mixture model by taking account of both observations and associated covariates when modeling a heterogeneous population. Herein, a mixture of experts extension of the latent position cluster model is developed. The mixture of experts framework allows covariates to enter the latent position cluster model in a number of ways, yielding different model interpretations. Estimates of the model parameters are derived in a Bayesian framework using a Markov Chain Monte Carlo algorithm. The algorithm is generally computationally expensive ? surrogate proposal distributions which shadow the target distributions are derived, reducing the computational burden. The methodology is demonstrated through an illustrative example detailing relations between a group of lawyers in the USA.

    50 мин.
  2. 04.12.2014

    3.2 Regularization Methods for Categorical Predictors (Gerhard Tutz)

    The majority of regularization methods in regression analysis has been designed for metric predictors and can not be used for categorical predictors. A rare exception is the group lasso which allows for categorical predictors or factors. We will consider alternative approaches based on penalized likelihood and boosting techniques. Typically the operating model will be a generalized linear model. We will start with ordered categorical predictors which unfortunately are often treated as metric variables because software is available. It is shown how difference penalties on adjacent dummy coefficients can be used to obtain smooth effect curves that can be estimated also in cases where simple maximum likelihood methods fail. The difference penalty turns out to be highly competitive when compared to methods often seen in practice, namely simple linear regression on the group labels and pure dummy coding. In a second step L1-penalty based methods that enforce variable selection and clustering of categories are presented and investigated. It is distinguished between ordered predictors where clustering refers to the fusion of adjacent categories and nominal predictors for which arbitrary categories can be fused. The methods allow to identify which categories do actually differ with respect to the dependent variable. Finally interaction effects are modeled within the framework of varying coefficients models. For the proposed methods properties of the estimators are investigated. Methods are illustrated and compared in simulation studies and applied to real world data.

    54 мин.
  3. 04.12.2014

    4.2 Statistical analysis of bio-molecular data and combinatorial difficulties : two examples (Stéphane Robin)

    Combinatorial issues are often raised by statistical model inference and selection, in particular when dealing with high-dimensional data. In such cases, asymptotic approximations or Monte-Carlo type methods are often used to approximate the quantities of interest. In this talk, we will present two examples dealing with bio-molecular data. In both of them exacts results can be obtained based on specific combinatorics and algorithmics developments. We will first consider the typical multiple testing issued that is faced when dealing with high-throughput data. In this framework, most multiple testing procedures require a precise estimation of the proportion of true null hypotheses. This estimation problem can be rephrased as an histogram selection problem, which can be solved via leave-p-out (LpO) cross-validation. We will present explicit results that allow us to manage this model selection problem, avoiding the computational burden inherent to LpO. We will then consider a segmentation problem encountered when looking for chromosomal aberrations based one microarray data. The detection of breakpoints and the estimation of their number is an old statistical problem. As for the precision of their localisation, only asymptotic results are available. We will present a dynamic programming type algorithm that allows us to explore the whole segmentation space. It provides information on the localisation precision. It furthermore provides a new model selection criterion for the number of breakpoints.

    52 мин.

Об этом подкасте

L'apprentissage statistique joue de nos jours un rôle croissant dans de nombreux domaines scientifiques et doit de ce fait faire face à des problèmes nouveaux. Il est par conséquent important de proposer des méthodes d'apprentissage statistique adaptées aux problèmes modernes posés par les différents champs d'application. Outre l'importance de la précision des méthodes proposées, elles devront également apporter une meilleure compréhension des phénomènes observés. Afin de faciliter les contacts entre les différentes communautés et de faire ainsi germer de nouvelles idées, un colloquium d'audience internationale (en langue anglaise) sur le thème «Challenging problems in Statistical Learning» a été organisé à l'Université Paris 1 les 28 et 29 janvier 2010. Ce colloquium a été organisé par C. Bouveyron (Laboratoire SAMM, Paris 1) et G. Celeux (Select, INRIA Saclay) avec le soutien de la SFdS. Vous trouverez ci-dessous les enregistrements des exposés donnés lors de ce colloquium. Recommandé à : étudiant de la discipline, chercheur - Catégorie : cours podcast - Année de réalisation : 2012

Еще от провайдера «Université Paris 1 Panthéon-Sorbonne»

Чтобы прослушивать выпуски с ненормативным контентом, войдите в систему.

Следите за новостями подкаста

Войдите в систему или зарегистрируйтесь, чтобы следить за подкастами, сохранять выпуски и получать последние обновления.

Выберите страну или регион

Африка, Ближний Восток и Индия

Азиатско-Тихоокеанский регион

Европа

Латинская Америка и страны Карибского бассейна

США и Канада