
About | Research | Events | People | Reports | Alumni | Contact | Home
|
Statistics and Information Modelling Seminars Overview talks 2003 2004 2005 2006 2007 2008 2009 Informal Meetings Statisticians & Probabilists
Ismaël Castillo, (Universities Paris VI & VII), September 14, 2010 Bayesian semiparametrics using Gaussian process priors In this talk, estimation in a class of semiparametric models
using a Bayesian method is considered. The focus is on models parametrized by a
pair $(\theta,f)$, where $\theta$ is a finite-dimensional parameter of interest
and $f$ an unknown nuisance function. Marios Pavlides (Frederick
University, Nicosia, Cyprus), April 19, 2010 Two Statistical Vignettes: Simpson's Paradox and Shaved Dice 1. Simpson's Paradox occurs for events A, B, and C if A and B are positively correlated given B, positively correlated given not-B, but are negatively correlated in the aggregate. If a 2x2x2 table is chosen "at random", what is the probability that it will exhibit Simpson's Paradox? 2. Persi Diaconis has fascinated audiences at all levels with the following question: If one face of a standard gaming die is shaved uniformly by a specified fraction s, express the new face probabilities as a function of s. This apparently simple problem appears to be intractable. However, this leads to an interesting statistical question: if the shaved dice are thrown in pairs, as typical in the game of craps, what is the most efficient die design for accurate estimation of the new face probabilities?
Gerhard Winkler (Helmholtz-Zentrum München), July 9, 2008 Complexity penalised M-estimators for time-series and image data We sketch Bayesian image analysis and argue that it may be unreliable for the (micro) biologial data we are concerned with. Then we introduce a variational approach based on a most simple functional. Focus is on the extraction of primitive morphological features from time-series. Some brief remarks address two-dimensional data. Behind these considerations is the general paradigm of parsimony, a topic which has recently been revived in view of new challenges, statistics is presently faced with. Rui M. Castro (Dept. of Electrical and Computer Engneering, Madison, USA), April 11, 2008 Learning to Discover: Adaptive Data Selection for Classification and Estimation Science is arguably the pinnacle of human intellectual achievement, yet the scientific discovery process itself remains an art. Human intuition and experience is still the driving force of the high-level discovery process: we determine which hypotheses and theories to entertain, which experiments to conduct, how data should be interpreted, when hypotheses should be abandoned, and so on. Meanwhile machines are limited to low-level tasks such as gathering and processing data. A grand challenge for scientific discovery in the 21st century is to devise machines that directly participate in the high-level discovery process. The work presented in this talk is a first step towards this goal. Common statistical inference and learning theories often assume that all data are collected prior to analysis. Alternatively, one can envision sequential, adaptive data collection procedures that use information gleaned from previous samples to guide the selection of future samples. This is extremely important for many pattern classification applications where the task of collecting/labeling data is often painstaking and costly, and therefore one would like only to collect the data that provides the most relevant information. We refer to such feedback-driven processes as active learning methods. In this talk I present a characterization of the achievable performance limits in active learning. Using minimax analysis techniques I describe the behavior of the classification error as the number of samples increases for broad classes of distributions, characterized by decision boundary regularity and noise conditions. The results clearly indicate situations under which one can achieve dramatic improvements, in terms of rates of error convergence, through active learning. I will also briefly discuss applications of active learning arising in sensing, networking and systems biology. Shota Gugushvili (Korteweg-de Vries Institute for Mathematics, Universiteit van Amsterdam), December 12, 2007 Decompounding under Gaussian noise Assuming that
a stochastic process Ildar Ibragimov (Russian Academy of Sciences, Steklov Institute of Mathematics), September 19, 2007 On the Estimation of Analytic Functions The most part of the talk will
be devoted to the following problem. We consider a Gaussian stationary process
with an entire analytic spectral density
Ruilin Li (Sun Yat-Sen University, China), January 25, 2007 The application of Markov model in health insurance actuary Abstract We use the knowledge of stochastic process, statistics and actuary to explore some problems related to health insurance. The backgrounds of applying Markov model in health insurance are introduced in the first chapter, and in the second chapter, a migration—illness—death process is proposed and results in the formulas for the expectations of sub-populations in various states, which are stable if the transition intensities are constant. In chapter 3, the modeling for three types of health insurance and related formulas is worked out based on Markov process. We get the actuarial formulas of long term health insurance on the fundaments of Cordeiro’s multi-states model of the insurance. And we set up the multi-states model for the critical illness insurance, and get the actuarial formulas about the insurance. In chapter 4, an example on cancer insurance is presented by using a multi-states Markov model. Key words: Markov model, multi-states model, health insurance actuary Marlos Viana (University of Illinois at Chicago), December 13, 2006 Data Analytic Aspects of the Canonical Decomposition Theorem for Finite Groups Abstract: We will discuss the synthetic relation among symmetry arguments in experimental designs, the canonical decomposition theorem and the identification, interpretation and statistical inference of experimental hypotheses consistent with and derived from those arguments. The algebraic tools are those of group algebras, representations and Fourier analysis over finite groups. The classes of experimental applications include those of data indexed by finite sets and groups, within which particular examples will be discussed, including symmetry studies for voting preferences, chart designs in visual testing, handedness (chirality) of elementary planar patterns and refraction profiles in linear optics. Florence
d'Alché-Buc (CNRS, Genopole & Université d'Evry, France) Kernelizing output tree-based methods: application to biological network completion We extend tree-based methods to the prediction of structured outputs using a
kernel in the output feature space. Ambedkar Dukkipati (Dept. of Computer Science and Automation, Indian Institute of Science, Bangalore), October 25, 2006 Some results on Generalized Measures of Information and corresponding Maximum and Minimum Entropy Prescriptions In light of their well-known axiomatic and operational justifications, I
present some results pertaining to the mathematical significance of `generalized'
measures of information in the sense of `Renyi' and `nonextensive' or Tsallis (while
Renyi entropy is additive like Shanoon, Tsallis is not). Pan, Guangming (Department of Applied Mathematics, National Sun Yat Sen University, Taiwan), October 25, 2006 Asymptotics of eigenvectors of large sample covariance matrices The eigenvectors of sample covariance matrices play an important role in principal component, factor analysis and some other fields. But, relative less work was done regarding the asymptotic behavior of eigenvectors in the research of large dimensional sample covariance matrices, compared to the eigenvalues,. In this talk, we define a new form of empirical spectral distribution, which involves the eigenvectors and the eigenvalues. Surprisingly, it is shown that this empirical spectral distribution and the classical empirical spectral distribution converge to the same limiting spectral distribution. Based on this new empirical spectral distribution, the central limit theorem of linear spectral statistics involving the eigenvectors and eigevalues are also established. Finally, we demonstrate how to apply large sample covariance matrix theory to wireless communication area. Ms. Efang Kong (Department of Statistics and Applied Probability, National University of Singapore), October 2, 2006 Variable selection for the single-index model We consider variable selection
in the single-index model. We prove that the popular leave- Estate Khamaladze (Victoria University of Wellington), February 3, 2006 Distribution free method of testing exponentiality, with application to curious historic data On the firts glance it may look as if we have in statistical theory plenty of tests to test whether the sample follows exponential distribution. However, this is, unfortunately, not correct. We will present the version of empirical process which under the hypothesis of exponentiality converges to very convenient limiting process - the standard Brownian motion. Therefore, its distribution does not depend neither on the exponential family itself, nor on the way we estimate parameter of this family. It is the only one process we know of, excepting the process of Koul (1978), Angus(1982), with these properties. On the way of doing this we clarify some misunderstandings accumulated over time in the literature. In the second half we consider
duration of reign of Roman Emperors and also of Chinese Emperors. Humans dye,
mostly, as the result of aging. If a reign stops as a result of accumulated
tensions of political, social, economic or Estate Khamaladze (Victoria University of Wellington), February 1, 2006 Differentiation of sets and applications to probability and statistics We know that in statistical problems involving finite-dimensional parameter the local properties of likelihood ratio with respect to this parameter and, especially, the differentiability in this parameter, play key role in all asymptotic analysis. There is, however, wide class of spatial statistical problems where the parameter of interest is a set. For sets similar analysis is much more complicated and we do not seem to have appropriate tools. In the talk will present new notion of differentiability of a set-valued functions, analogous to directional derivatives of functions, and useful for statistical applications. As a particular such application we consider local point processes in the neighbourhood of a given set. Using the notion of differentiability we construct the limiting process for it as the neighbourhood shrinks while the intensity of the point process increases. We hope, at the same time, that the concept of "differentiation of sets" can prove important in wider range of statistical and probabilistic problems. To give one example of the statements proved, consider A_t, which is a Borel subset of R^d for each t > 0. If it is differentiable in t and if P is an absolutely continuos measure in R^d, then there exists anopther measure Q, explicitly deifned and such that dP(A_t)/dt = Q(dA_t/dt) The left hand side may be simple, but the right hand side is non-trivial. Tijl De Bie (K.U.Leuven), December 20, 2005 Optimal experiment design for kernel ridge regression, and the minimum volume covering ellipsoid Optimal experiment design (OED) and ellipsoid estimation are an issue of primary importance in areas such as statistics, system control and identication, visual/video tracking, sensor management, active learning, data mining and novelty detection. In this talk I will present a new approach to OED for ridge regression, as well as for its kernel version. This allows one to optimally design experiments for nonparametric nonlinear regression, whereas in the past generally parametric techniques with a fixed set of nonlinear basis functions had to be used. The resulting optimal design is sparse, in the sense that measurements should be taken at only a limited set of points. Interestingly, the optimization problem that is dual to the OED for ridge regression corresponds to finding the minimum volume covering ellipsoid in a kernel induced feature space with an additional regularization term. I will show how this result is of use for novelty detection applications. Santiago Vidal Puig (Technical University of Valencia (ES), November 8, 2005 Benefits of Using the MEGA Statistical Process Control Mega Statistical Process Control: why we need it and how to use it for fault diagnosis The communication is divided in two parts: In the first part we will introduce and describe different approaches for monitoring multivariate industrial processes. We present the advantages of Mega SPC (Megavariate Statistical Process Control) compared with the Univariate Charts SPC (USPC) and the Standard Multivariate SPC based in the T2 of Hotelling. In the second part we will discuss about diagnosis of faults which is an essential step for the monitoring of the process. Once a fault has been detected we need to know what are the original measured variables responsible for the detected fault. This is even more important for the MegaSPC that uses latent variables that differ from the original ones. In the last 15 years several strategic spproaches have been proposed. From strategies based on the space of the original variables as Doganaksoy, Hawkins, Mason et al. to strategies based on the latent space such as contribution plots, fault reconstruction or fault signature. In this second part i will make a short review to the most important methods and present some illustrative examples of the different methods in action. Monika Meise (Universität Duisburg-Essen), June 22, 2005 Approximating Data with Splines Regression: Many regression methods depend on the appropriate choice of locally defined smoothing parameters. The approach taken here is to base the choice of local smoothing parameters on a multiresolution analysis of the residuals. This will be illustrated using weighted splines and compared to other locally adaptive methods such as wavelets. Densities: Given data y_1,..., y_n we look for an approximating model of the form Y_i=X_i+Z_i, i=1,..., n (1) where the (X_i)_1^n and (Z_i)_1^n are respectively i.i.d random variables and the distribution of the (Z_i)_1^n is given. The problem is to decide whether there exists an approximation of the form (1) and, if so, to specify a distribution of the (X_i)_1^n. We do this by choosing the distribution of X so as to minimize the Kolmogorov distance d_{ko}(F_n,F^Y) where F_n is the empirical distribution of the data and F^Y the distribution of the random variables Y. Variations include minimizing the total variation of the first and second derivative of the density f^X of X and the use of higher order Kuiper metrics.
Marie Husková, (Charles University, Prague), June 14, 2005 Control Charts Based on Alternative Hypotheses We present statistical models in terms of hypothesis testing for practical out-of-control situations in Statistical Process Control that extend the traditional mean shift or linear trend situations. Based on these explicit alternative hypotheses, we derive likelihood ratio tests. Simulations are used to obtain critical values and to study the performance (in terms of both mean and standard deviation of detection delays) of our procedures. We compare our control charts with a control chart proposed by Chang and Fricker. It turns out that smaller mean delays are not always preferable. L. Birgé (Laboratoire de Probabilités Université Paris VI), May 23, 2005 Lucien Birgé has been awarded the 2005 Brouwer memorial medal by the Dutch Royal Mathematical Society (Koninklijk Wiskundig Genootschap). The Brouwer medal is granted once every three years and is a very prestigious prize in mathematics. This time, the Society chose the field mathematical statistics. Professor Birgé will receive the medal and give a lecture at a joint BeNeLuxFra mathematical congress in Gent (Belgium). As the winner of the 2005 Brouwer medal, professor Lucien Birgé will give several talks in The Netherlands including Eindhoven, VU Amsterdam, and Delft. Professor Birgé has accepted EURANDOM's invitation and he will give a lecture on model selection targeted to a broad audience of statisticians, probabilists and mathematicians. A general approach to model selection via testing We want to present a general approach to model selection for statisitcal estimation based on penalized M-estimators on some countable sets and their generalizations. The method applies to various stochastic situations (independent observations, Gaussian vectors or sequences, some regression frameworks with fixed or random design,\dots) and aims at estimating an unknown parameter $s$ which charaterizes the distribution $P_s$ of the observations and belongs to some given metric space $(M,d)$. Link to the full abstract in pdf format Nadia, Lalam (EURANDOM), May 3, 2005 Statistical modelling of gene expression data from confocal scans of Drosophila embryos Confocal laser scanning microscopy is a powerful tool for the imaging of gene expression in a developing embryo. We model the experimental gene expression data obtained by this methodology when considering the particular case of the formation of segment patterns in the early development of the Drosophila embryo. Segmentation and more generally developmental processes result from the interaction of genes in a regulatory network. Reinitz and Sharp (1995) proposed to model the genetic regulatory network responsible for the segmentation mechanism by a set of nonlinear ordinary differential equations satisfied by the gene product concentrations. Relying on this modelling and the quantitative gene expression data from the scans of the embryos, we propose a new statistical approach to construct efficient estimators of the parameters arising in this model of differential equations. Our estimators should entail a better summary of the information contained in the gene expression data than the currently used least squares estimators. Jelle Goeman (Department of Medical Statistics and Bioinformatics - Leiden University), April 21, 2005 Testing against a high-dimensional alternative As the dimensionality of the alternative increases, the power of classical tests tends to diminish quite rapidly. This is especially true for high-dimensional data in which there are more parameters than observations. In this paper we discuss a score test in an empirical Bayesian model as an alternative to these classical tests. It gives a general test statistic which can be used to test a point null hypothesis against a possibly high-dimensional alternative, even when the number of parameters exceeds the number of samples. This test will be shown to have optimal slope of the power function on average in all directions from the null, which makes it a proper generalization of the locally most powerful test to multiple dimensions. To illustrate the locally most powerful test we investigate the case of testing the global null hypothesis in a linear regression model in more detail. The empirical Bayes score test is shown to have significantly more power than the F-test when under the alternative the large-variance principal components of the design matrix explain significantly more of the variance of the outcome than the low-variance principal components. The locally most powerful test is also useful for detecting sparse alternatives in truly high-dimensional data, where its power is comparable to the test based on the maximum absolute t-statistic. This is joint work with Hans van Houwelingen. Monia Lupparelli (Department of Statistics - University of Florence, Italy), April 7, 2005 Bi-directional graph models for contingency tables Bi-directional graph models, also called covariance graph models, are used to encode marginal independence (Richardson and Spirtes, 2002). The parametrization for discrete distributions is in general still an open problem. Log-linear and logit models are widely used in graphical modelling, but they do not fit in the framework of bi-directed graphs because they do not easily allow to model marginal distributions and joint response variables. Recently Drton and Richardson (2005) introduced a method for binary variables based on a M\"obius parametrization. The aim of our work is to show how marginal log-linear models (Bergsma and Rudas, 2002) can be used to parametrize bi-directional graph models for general categorical variables. We illustrate with some examples that it is always possible to find a hierarchical marginal log-linear parametrization that fulfills the measure of independence imposed by the connected set Markov property in any bi-directed graph. Moulinath Banerjee ( Department of Statistics - University of Michigan, USA), April 5, 2005 Inference for conditionally parametric response models Conditionally parametric response models provide flexible (and consequently useful) strategies for nonparametric modelling. Formally, consider a sequence of i.i.d. observations from the distribution of $(X,Z)$ where $X$ is a response variable and $Z$ a covariate with some unknown distribution and the conditional distribution of $X$ given $Z = z$ is $p(.,\psi(z))$, where $p(.,\theta)$ is a regular parametric family of densities. We are interested in making inference on the unknown "dependence function" $\psi$. I will talk about the utility of such models in applications and present inference problems regarding $\psi$. Key themes will be (a) Inference for $\psi$ under shape restrictions and (b) threshold estimation for $\psi$. I will present several unified theorems and, time permitting, talk about extensions of the ideas to more complex (semiparametric) models. Edwin van der Heuvel (Statistics Dept., NV Organon, Oss, the Netherlands), March 10, 2005 Evaluation of an Affymetrix High-density Oligonucleotide Microarray Platform as a Measurement System An Affymetrix High-density Oligonucleotide microarray platform is used for routine experimentation in search of genes that may explain biological differences between medical treatments and/or diseases. Before the abundance level of thousands of genes in one biological sample can be measured simultaneously, several processing steps are involved. An experimental design was set-up to investigate the contribution of these specific variation sources on the measurement error or technological variation. A mixed effects analysis of variance model was applied to estimate these contributions in terms of variance components. From these variance components the microarray platform is evaluated as a measurement system for the purpose of gene selection in future microarray studies. Furthermore, the statistical model is evaluated for its goodness-of-fit to describe such microarray data. Wicher P. Bergsma (EURANDOM), February, 2005 On a new type of correlation, its orthogonal decomposition and associated tests of independence For some applications a possible drawback of the ordinary correlation coefficient $\rho$ between two real random variables $X$ and $Y$ is that $\rho=0$ does not imply independence. Hence, a test of independence based on the correlation has only power against narrow alternatives. In this talk, an alternative coefficient is introduced, which is closely related to the correlation but which equals zero if and only if the two variables are independent. It is shown that the new coefficient can be written as an infinite sum of squared correlations, and details of these component correlations are given. The asymptotic distribution of the U and V statistic estimators of the coefficient, which is a mixture of chi-squares, is derived. It is shown that as a special case, a generalization of the Cramer-von Mises test is obtained to the case of $K$ ordered samples. Richard Gill (Utrecht/EURANDOM), February 8, 2005 Missing data and biased sampling versus quantum non-locality I'll discuss some new results concerning the optimal design of Bell experiments; these are experiments which are supposed to establish "quantum non-locality": a code-word for "classically impossible correlations between distant parts of a physical system". According to the classical picture, correlations are explained by what the physicists call hidden variables. In the language of statistics these are just "missing data", and I'll explain how statistical methods for dealing with missing data can be used to construct maximally powerful experiments. I'll describe many open problems and surprising findings. Actual experiments are plagued by all kinds of difficulties. The best known is called the "detection loophole" but it is no more and no less than biased sampling, or in the language of Monte Carlo simulation, "distributed rejection sampling". It turns out that an even more severe form of biased sampling afflicts most experiments done to date, which we have christened the "coincidence loophole". It seems we may have to wait a long time before anyone does a conclusive experiment. Fabio Rigat (EURANDOM), January 20, 2005 Binary Neuronal Networks A statistical framework is proposed to model a network of binary random variables. Both the network structure and the strenghts of the existing pair wise connections are jointly estimated from the data within a fully Bayesian approach by employing the stochastic search variable selection method (George and McCullogh [1993]) and the Metropolis-Hastings algorithm (Hastings [1970]). Predictions for future outcome states are obtained through the posterior predictive success probabilities for each node. The framework is employed to model complex interactions arising within networks of spiking neurons. Examples will be provided illustrating the model performance in fitting and predicting both simulated and real data. Farida Enikeeva, (EURANDOM), December 7, 2004 Empirical Bayesian Test of the Smoothness Parameter In adaptive nonparametric curve estimation, one commonly estimates a function from a nested family of functional classes that are parameterized by a smoothness-like quantity. It has already been realized by many that estimating the smoothness parameter is not sensible. What can then be inferred about the smoothness? We attempt to answer this question. Implications for the relevant hypothesis testing are presented: due to the nested model structure, a consistent test can be constructed only for the one-sided hypothesis. The test statistic is based on the marginalized maximum likelihood estimator of the smoothness for an appropriate prior distribution on the unknown signal. This is the joint work with Eduard Belitser (Utrecht University).
Juri Lember, (Institute of Mathematical Statistics), October 5, 2004 On fluctuation of the length of the longest common subsequence We consider 2 finite sequences (words) over a finite alphabet (DNA sequences, for example). The aim is to measure their similarity. A common measure is the length of the longest common subsequence. To understand, whether the similarity is caused by chance, one is interested on the asymptotic behaviour of the length of the LCS of 2 independent iid sequences. Let X_1,..X_n,Y_1,...,Y_n be 2 independent iid Bernoulli sequences, let the random variable L_n be the length of their LCS. It is well known that L_n/n tends to a constant a.s., the question of the behaviour of the variance of L_n arises. We consider the special case when the sequence Y_1,...Y_n is non-random and periodic, and we show that there exists consatnts 0<k<K so that kn<VAR L_n<Kn. Hence L_n-EL_n is typically of order sqr(n) as conjectured by Waterman.
T. Rudas, (Eotvos Lorand University) May 28, 2004 Log-linear models for multidimensional contingency tables: interpretation and estimation The talk discusses log-linear (including graphical) models for multiway contingency tables and considers various interpretations, i.e. characteristic properties of these models, including their relationship with conditional odds ratios and a canonical representation. In this canonical representation, every log-linear model is represented as the intersection of several simpler log-linear models, each of them belonging to either one of two types of such models. Log-linear models are exponential families and maximum likelihood estimates are usually computed using the iterative proportional scaling algorithm. In fact, this algorithm computes the minimum discrimination information estimate in the dual linear family. This is obtained as the result of iterated projections into simpler linear families, the intersection of which is the dual to the log-linear model. A dual algorithm is discussed, that iteratively computes maximum likelihood estimates utilizing that the log-linear model is the intersection of several simpler log-linear models. Jordan Stoyanov ( University of Newcastle (UK), June 4, 2004 Moment Analysis of Distributions We study distributions with finite moments and such that the classical problem of moments for them has a non-unique solution. We start with brief comments on frequently used criteria for uniqueness or for non-uniqueness of distributions in terms of their moments (Stieltjes, Carleman, Hamburger, Hausdorff, Cramer, Krein). Then we concentrate on some recent developments. We describe a method for constructing Stieltjes Classes = families of distributions all with the same moments. For some Stieltjes classes we give the value of the Index of Dissimilarity. The illustrations include functional transformations of random data involving popular distributions such as normal, inverse Gaussian, lognormal, generalized gamma, logistic. Results about the distributions of some stochastic processes will also be presented. If time permits, some open questions will be outlined. The speaker will address his talk not only to professionals in Probability/Statistics, but also to PhD students in this area. Wiliam Rey (Philips Research Laboratories), May 26, 2004 Karhunen-LoХve, Principal components and SVD Karhunen-LoХve transforms (KL) and Principal component analysis (PCA) are two techniques that are closely related although, conceptually, there objectives have little in common. KL build up low-cost optimal approximations of curves; the objective is to approximate. The telecommunication engineers are typical users of KL. PCA is a method to visualize what takes place in a high dimension space with the help of optimal projections; the objective is visualisation. PCA is used to explore data sets. Seen from the algorithmic side, KL and PCA are closely connected by Singular Value Decomposition (SVD); except for minor details, two linear algebraic dual spaces are of concern and, whether you place the accent on one or on the other, you think in terms of KLE or PCA. The talk is introductory and covers KL, PCA and SVD in their principles but, as well, with respect to some very applied aspects. Iryna Snihir, (EURANDOM), April 16, 2004 Life testing: can we predict the battery life? Having for an object to get a reliable Battery Management there is a goal for us to work out a proper battery model. Our research work concentrates on the development and refining of the mathematical methods for battery modelling and estimation of parameters of the model. The aim of my talk is to focus on the "black box" approach to the battery model, and more precisely, on several exercises based on statistical methods, which were applied on life test data where cells have been submitted to repeated identical cycles. Analysing life test data, we developed a method for predicting the maximal internal gas pressure (P) based on the measurements of battery's voltage (V) and temperature (T), as well as on evaluation of the empirical relations among V, T and P. This technique can be of help to underlie "pressure control" charging algorithms, to improve safety/life time of the batteries and cycle life performance. The next step is a modelling of the separate battery cycles and forecasting the next cycles, when already a few have been seen, with help of regression model and of first principal components. It was done in the context of the life test data investigated under the laboratory conditions. That contributes to a mastery of the battery management. Andreas Christmann, (University of Dortmund), June 11, 2004 On a combination of convex risk minimization methods to analyze data > from insurance companies The goals of the talk are twofold: we describe common features in data > sets from > motor vehicle insurance companies and we investigate a general strategy > which exploits the knowledge of such features to detect and to model > hidden information. > The results of the strategy are a basis to develop insurance tariffs. > The strategy is applied to a data set from 15 motor vehicle insurance > companies containing > information from more than 4 million customers. > We use a nonparametric approach based on a combination of kernel > logistic regression and > $\varepsilon-$support vector regression. Both methods belong to the > class of > statistical machine learning methods based on convex risk minimization. > Some recent results of robustness properties of such methods are also > given. Crisitina Butucea (Université Paris X, Nanterre and Paris VI ), February 24, 2004 Quadratic functional estimation in the convolution model We
consider We
assume that the unknown density is Sobolev-smooth with regularity
If
the underlying density
Leila Mohammadi (Leiden University), February 13, 2004 On the statistical theory of classification This lecture concerns an approach to statistical learning problems in the nonparametric setting. Suppose we are given n i.i.d. copies of a random variable (X,Y), where X is an instance and Y is a label, -1 or 1. We define a classifier h as a function with values -1 and 1 and we denote a class of classifiers by H. For the case that X is one dimensional and for some parametric cases of H such as the classifiers with K thresholds, we estimate the parameters by the minimizer of the classification error in the sample and we show the asymptotic distribution and the rate of convergence of the empirical risk minimizer which is cube root n. If one of the thresholds is on the boundary of the space of X, then the asymptotic result is different and convergence is quicker. We also consider the case that X is multidimensional and show that similar results hold when the classifiers are 1 on halfspaces. In a simple case, we show that the rate of convergence of the empirical risk minimizer is optimal. We also propose an algorithm to find the empirical risk minimizers in the one dimensional case. For a reference see Mohammadi and van de Geer (2003). References Mohammadi, Leila and van de Geer, Sara. A. (2003). On threshold-based classification rules. Institute of Mathematical Statistics, Lecture Notes Monograph Series, Mathematical Statistics and Applications: Festschrift for Constance van Eeden. V. 42. p. 261–280. Jamy Robins Towards A Unified Theory of First and Higher Order Statistics Modern semiparametric root-n theory has its foundations in likelihood. Non-root-n function and functional estimation does not. Is there a common story. That is one likelihood theory for all that results in the use of higher order influence functions to do near optimal non-root-n estimation (in near exact analogy to the root-n case.) We describe the building blocks of the theory. Jüri Lember, (University of Tartu), January 13, 2004 Empirical Measures in Adjusted Vitrebi Training We investigate the so-called Viterbi training for Hidden Markov Model (HMM) parameter estimation. This training is based on (Viterbi) alignment for a finite string of observations. We show that the alignment can be naturally generalized for almost every (infinite) realization of the HMM. Such an infinite alignment gives an encoded process - the alignment process. We study the properties of the alignment process; we show that the limiting frequencies of that process exist, an Peter van de Ven, (EURANDOM), December 12, 2003 On the Equivalence of Algorithms for Computing Effects in Factorial Designs The Yates algorithm, Good interaction algorithm, symbolic algorithm and least squares estimation are different algorithms for computing effects in factorial designs. It is folklore that these algorithms all give the same results. Rigorous proofs of the equivalence of these algorithms does not seem to be available in literature except for the Yates and Good algorithms. We present a rigorous proof for the equivalence of all algorithms, including precise definitions of the notions involved. We will pay attention to inconsistent definitions of effects and the interpretation and importance of different ways of coding factor levels. Dmitry Danilov, (EURANDOM), December 5, 2003 Modeling the Li-ion rechargeable batteries A mathematical model describing behavior of the rechargeable Li-ion batteries is developed. The model simulates behavior of the battery in a single charge-discharge cycle but also explains difficult process of battery degradation in a long time span (hundreds of charge-discharge cycles). The core of the model consists of a system of coupled partial and ordinary differential equations related to the main storage reaction and basic side reactions, such as the Solid Electrolyte Interface formation, and decomposition of the active electrode material. The model is tested on available data and appears to provide an adequate fit. Richard Gill (University of Utrecht / EURANDOM), November 21, 2003 Problems in Quantum Statistical Information I will give an introduction to "quantum statistics" - statistical problems for experiments involving quantum data - and discuss open problems in the field. Daniel Herrmann (The Bosch Group), November 14, 2003 Kernel Based Algorithms and Statistical Learning Theory In this talk we give a short introduction to statistical learning theory and explain the idea of kernel based algorithms like support vector machines (SVM) which has become very successful in real world applications. The advantage of this type of learning algorithm is that it yields a convex optimization problem and its generalization ability can be measured by the capacity of the function class which the learning algorithm can implement. We explain how to measure the capacity by the concept VC dimension (Vapnik-Chervonenkis) and explain the idea of structural risk minimization. The success of SVM can be attributed to the joint use of a robust classification procedure (large margin hyperplane) and of a convenient and versatile way of (nonlinear) preprocessing the data (kernels). It turns out that with such a decomposition of the learning process into preprocessing and linear classification, the performance highly depends on the preprocessing and much less on the linear classification algorithm to be used. It is thus of high importance to have a criterion to choose the suitable kernel for a given problem. Ideally, this choice should be dictated by the data itself and the kernel should be 'learned' from the data. We propose to use a gradient based procedures for optimizing the coefficients kernel and give theoretical bounds on the corresponding generalization error for different classes of kernels. Yves Rozen, (Paris Jussieu), November 7, 2003 Testing nullity in regression framework We introduce a new testing procedure based on symmetrization to construct a test of "f=0" against "f~=0" in the regression model. This procedure works for errors non necessary gaussian. We prove that our adaptive multi-test is optimal in Holder class for gaussian error and keep a good speed in non gaussian case. Peter Grünwald, (CWI Amsterdam), November 7, 2003 Updating Probabilities As examples such as the Monty Hall and the 3-prisoners puzzle show, > applying conditioning to update a probability distribution on a ``naive > space'', which does not take into account the protocol used, can often > lead to counterintuitive results. We give a detailed explanation of > this phenomenon. A criterion known as CAR (``coarsening at random'') in > the statistical literature characterizes when ``naive'' conditioning in > a naive space works. We provide two new characterizations of CAR. First > we show that in many situations, CAR essentially *cannot* hold, so that > naive conditioning must give the wrong answer. Second, we provide a > procedural characterization of CAR, giving a randomized algorithm that > generates all and only distributions for which CAR holds. Both results > complement earlier work by Gill, van der Laan and Robins. > We also consider more generalized notions of update such as Jeffrey > conditioning and minimizing relative entropy (MRE). We give a > generalization of the CAR condition that characterizes when Jeffrey > conditioning leads to > appropriate answers, and show that there exist some very simple > settings in which MRE essentially never gives the right results. This > generalizes and interconnects previous results obtained in the > literature on CAR and MRE. Alexei Koloidenko, (EURANDOM), October 10, 2003 Adjusted Viterbi Training I will be talking about our joint work with Juri Lember, a former Eurandom postdoc. Motivated by the broad use of Hidden Markov Models (HMM) in speech processing and recognition, natural language modelling, image analysis, and bioinformatics, we consider the problem of estimating parameters of the emission distribution. It is well-known that the EM algorithm computes a Maximum Likelihood Estimator (MLE) for HMM parameters, and in many such situations MLEs are consistent. However, computational considerations often lead to less intensive alternatives. The Viterbi Training (VT) algorithm is widely used instead of EM despite the inconsistency of its estimators. Our work aims to ``interpolate'' between EM and VT: We propose a principled approach to alleviating the bias-related drawbacks of VT at a minimal increase of computations. Our work relies on the concept of infinite Viterbi alignment and on a limiting probability distribution associated with this alignment. We explain why in general this latter distribution cannot be computed exactly, and we also discuss appropriate approximations. Moreover, we show that in the case of mixture models, an important special case of HMM, this distribution can be computed exactly. The experimental part of the paper focuses on the mixture of two univariate normal distributions with unit variance and unknown means. This example illustrates that the adjusted algorithms are still computationally less intensive than EM, and in contrast with VT, enjoy the property of asymptotically fixing the true parameters. We therefore suggest replacing VT by our adjusted procedures in applications that can afford the additional computations. Alexei Koloidenko, (EURANDOM), May 20, 2003 Algebraic Aspects of Statistical Modeling Motivated by probability models for distributions on small square subimages of digitized photographs, we will consider a somewhat more general situation, in which the state space is a real vector space $\mathbb{R}^{n^2}$ (e.g. representing gray scale intensities in $n\time n$ subimages) and the measures of interest posses special types of invariance: Namely, they are invariant under an action of a finite group $G$ that admits a linear (matrix) representation on $\mathbb{R}^{n^2}$. In our central example, the group is the full symmetry group of the square-based parallelepiped (embedded in $\mathbb{R}^{n^2}$ with its center mapped to the origin). The theory of algebraic invariants tells us that there exists a finite set of polynomials invariant under the same action with the following property: Any polynomial in $n^2$ indeterminates with the same type of invariance can be written as a polynomial in terms of these special polynomials (called for their special role {\emph fundamental invariants} or {\emph fundamental generators}). In this talk, I will present two results that are basically specializations and extensions of the general "problem of moments". The first one is the "Extended Carleman theorem for $G$-invariant moments" and it gives us a sufficient condition under which a $G$-invariant measure can be uniquely determined by the expected values of "mixed fundamental generators". The second result is about convergence of $G$-invariant constrained maximum entropy estimators, and we will also discuss its practical significance. Some of this material is still a work in progress, hence your comments and suggestions will be especially appreciated. Gabriele Brondino, May 20, 2003 "Zero-Point" in the Evaluation of Martens Hardness Uncertainty Hardness measurements have a significant role in mechanical metrology, as they are frequently used to characterise materials properties relevant to industrial processes. A recently introduced method, called Martens Hardness, is based on force and indentation records obtained during a test cycle; the Force/Depth Curve, which describes the indentation pattern, is typically formed by two parts having a zero-point in common. A segmented regression model is proposed in this paper, based on the introduction of a threshold parameter in order to estimate the unknown zero-point. The problem is not trivial, since the relationship between observed force and indentation depth is structural and, moreover, the number of nuisance parameters grows with the number of measured data. The asymptotic likelihood theory leads to an estimate of the unknown parameters of the model. Monte Carlo simulations are resorted to in order to analyse the properties of estimators under different hypotheses about measurement errors, and to establish the applicability conditions of the method proposed. Wicher Bergsma, (Universiteit van Tilburg), May 19, 2003 Testing conditional independence with a continuous control variable A common statistical problem is the testing of independence of two (response) variables conditionally on a third (control) variable. It is shown that, when the control variable is continuous, the methods that have been proposed in the literature either depend on strong distributional assumptions or suffer from low power. In the first part of this talk, the theoretical difficulties involved in testing conditional independence with a continuous control variable are made precise. In particular, the concept of testability of degree r is introduced, and it is shown that without assumptions, independence is testable of degree 2, while conditional independence with a continuous control variable is not testable of any degree. However, we proceed to show that, if appropriate assumptions about the marginal (conditional) response distributions are made, both hypotheses are testable of degree 1. This shows the fundamental difficulty of testing conditional independence with a continuous control variable: assumptions about the conditional marginal responses must be made. In the second part of this talk, a practically feasible solution to the testing problem is given. The concept of partial copula is introduced, which is a certain average of bivariate conditional copulas and is thus based on marginal ranks. It is shown that by estimating the partial copula, a general and practically feasible class of tests can be obtained for the testing of conditional independence. Professor J.K. Lindsey, (University of Liege, Belgium), April 29, 2003 What does pharmacokinetics model? Pharmacokinetics studies the flow of some substance through the body using compartment models. At a first level, these model the movement of molecules in the organism using the assumptions of Markov chains. At a second level, differences among organisms and perturbations over time must be taken into account. Here, nonlinear random effects and autoregression can be useful. Madalin Guta, (EURANDOM), March 25 , 2003 An invitation to quantum tomography We describe quantum tomography as an inverse statistical problem and show how entropy methods can be used to study the behaviour of sieved maximum likelihood estimators. Fabio Rigat ( Institute of Statistics and Decision Sciences at Duke University, USA), February 23, 2003 Bayesian CART modelling I present a new modelling approach for Bayesian CART models. First I will introduce a tree definition focussed on the statistical models generated by the tree rather than on the tree structure. Second I will derive a likelihood function which explicitly takes into account the tree structure and the leaf distributions. Third I will propose a hierarchical tree prior which includes in a coherent framework all the essential features of the model. Then I will describe a Markov chain Monte Carlo technique to explore the tree space. Efficient exploration of the posterior is made possible by devising appropriate moves to wander in the space in order to cope with the inherent multimodality of the posterior. Simulated tempering is employed in order to enrich the set of possible moves in the tree space and improve mixing. Finally the model averaging framework is adopted in order to derive robust predictions. Throughout the talk I will mainly focus on Weibull survival trees. I will produce data analysis examples for both simulated datasets and for right censored cancer survival times. |
|||||||||||||||||||
P.O. Box 513, 5600 MB Eindhoven, The Netherlands |
|||||||||||||||||||