logo

European Institute for Statistics, Probability, Stochastic Operations Research and its Applications

About | Research | Events | People | Reports | Alumni | ContactHome


Statistics and Information Modelling Seminars

Overview talks 2003 2004 2005  2006 2007 2008 2009

Informal Meetings Statisticians & Probabilists


2010

 

Ismaël Castillo, (Universities Paris VI & VII), September 14, 2010

Bayesian semiparametrics using Gaussian process priors

In this talk, estimation in a class of semiparametric models using a Bayesian method is considered. The focus is on models parametrized by a pair $(\theta,f)$, where $\theta$ is a finite-dimensional parameter of interest and $f$ an unknown nuisance function.
One puts a Gaussian process prior distribution on $f$. We derive the limiting distribution for the posterior marginal in the parameter of interest, obtaining the so-called Bernstein-von Mises theorem, under some conditions.
One of the conditions involves simultaneous approximation of the unknown $f$ and "the least favorable direction" of the model by elements of the RKHS of the Gaussian prior. Such a condition appears to be necessary in general, as we illustrate on a few examples.


Marios Pavlides (Frederick University, Nicosia, Cyprus), April 19, 2010
Joint work with Michael Perlman

Two Statistical Vignettes: Simpson's Paradox and Shaved Dice

1. Simpson's Paradox occurs for events A, B, and C if A and B are positively correlated given B, positively correlated given not-B, but are negatively correlated in the aggregate. If a 2x2x2 table is chosen "at random", what is the probability that it will exhibit Simpson's Paradox?

2. Persi Diaconis has fascinated audiences at all levels with the following question: If one face of a standard gaming die is shaved uniformly by a specified fraction s, express the new face probabilities as a function of s. This apparently simple problem appears to be intractable. However, this leads to an interesting statistical question: if the shaved dice are thrown in pairs, as typical in the game of craps, what is the most efficient die design for accurate estimation of the new face probabilities?

 


2008

Gerhard Winkler (Helmholtz-Zentrum München), July 9, 2008

Complexity penalised M-estimators for time-series and image data

We sketch Bayesian image analysis and argue that it may be unreliable for the (micro) biologial data we are concerned with. Then we introduce a variational approach based on a most simple functional. Focus is on the extraction of primitive morphological features from time-series. Some brief remarks address two-dimensional data. Behind these considerations is the general paradigm of parsimony, a topic which has recently been revived in view of new challenges, statistics is presently faced with.


Rui M. Castro (Dept. of Electrical and Computer Engneering, Madison, USA), April 11, 2008

Learning to Discover: Adaptive Data Selection for Classification and Estimation

Science is arguably the pinnacle of human intellectual achievement, yet the scientific discovery process itself remains an art. Human intuition and experience is still the driving force of the high-level discovery process: we determine which hypotheses and theories to entertain, which experiments to conduct, how data should be interpreted, when hypotheses should be abandoned, and so on. Meanwhile machines are limited to low-level tasks such as gathering and processing data. A grand challenge for scientific discovery in the 21st century is to devise machines that directly participate in the high-level discovery process. The work presented in this talk is a first step towards this goal. Common statistical inference and learning theories often assume that all data are collected prior to analysis. Alternatively, one can envision sequential, adaptive data collection procedures that use information gleaned from previous samples to guide the selection of future samples. This is extremely important for many pattern classification applications where the task of collecting/labeling data is often painstaking and costly, and therefore one would like only to collect the data that provides the most relevant information. We refer to such feedback-driven processes as active learning methods. In this talk I present a characterization of the achievable performance limits in active learning. Using minimax analysis techniques I describe the behavior of the classification error as the number of samples increases for broad classes of distributions, characterized by decision boundary regularity and noise conditions. The results clearly indicate situations under which one can achieve dramatic improvements, in terms of rates of error convergence, through active learning. I will also briefly discuss applications of active learning arising in sensing, networking and systems biology.


2007

Shota Gugushvili (Korteweg-de Vries Institute for Mathematics, Universiteit van Amsterdam), December 12, 2007

Decompounding under Gaussian noise

Assuming that a stochastic process  is a sum of a compound Poisson process  with known intensity  and unknown jump size density  and an independent Brownian motion  we consider the problem of nonparametric estimation of  from low frequency observations from  The estimator of  is constructed via Fourier inversion and kernel smoothing. Our main result deals with asymptotic normality of the proposed estimator at a fixed point. We will also briefly discuss its implementation in practice.


Ildar Ibragimov (Russian Academy of Sciences, Steklov Institute of Mathematics), September 19, 2007

On the Estimation of Analytic Functions

The most part of the talk will be devoted to the following problem. We consider a Gaussian stationary process with an entire analytic spectral density  and we study the problem of its estimation. The process  is not observable. Instead of it we are observing a linear transformation  of  with a transfer function  if  belongs to an interval , and  if . We study how far from  consistent estimation of  is possible as .


Ruilin Li (Sun Yat-Sen University, China), January 25, 2007

The application of Markov model in health insurance actuary

Abstract We use the knowledge of stochastic process, statistics and actuary to explore some problems related to health insurance. The backgrounds of applying Markov model in health insurance are introduced in the first chapter, and in the second chapter, a migration—illness—death process is proposed and results in the formulas for the expectations of sub-populations in various states, which are stable if the transition intensities are constant. In chapter 3, the modeling for three types of health insurance and related formulas is worked out based on Markov process. We get the actuarial formulas of long term health insurance on the fundaments of Cordeiro’s multi-states model of the insurance. And we set up the multi-states model for the critical illness insurance, and get the actuarial formulas about the insurance. In chapter 4, an example on cancer insurance is presented by using a multi-states Markov model. Key words: Markov model, multi-states model, health insurance actuary


2006


Marlos Viana (University of Illinois at Chicago), December 13, 2006

Data Analytic Aspects of the Canonical Decomposition Theorem for Finite Groups

Abstract: We will discuss the synthetic relation among symmetry arguments in experimental designs, the canonical decomposition theorem and the identification, interpretation and statistical inference of experimental hypotheses consistent with and derived from those arguments. The algebraic tools are those of group algebras, representations and Fourier analysis over finite groups. The classes of experimental applications include those of data indexed by finite sets and groups, within which particular examples will be discussed, including symmetry studies for voting preferences, chart designs in visual testing, handedness (chirality) of elementary planar patterns and refraction profiles in linear optics.

PRESENTATION


Florence d'Alché-Buc (CNRS, Genopole & Université d'Evry, France)
Joint work with Pierre Geurts (IBISC and Université de Ličge, Belgium)

Kernelizing output tree-based methods: application to biological network completion

We extend tree-based methods to the prediction of structured outputs using a kernel in the output feature space.
The resulting algorithm called OK3 (output kernel tree), generalizes classification and regression trees as well as ensemble methods in a principled way. It opens the door to prediction in structured output spaces where possible outputs can be linked with complex relations. Moreover, we show that, when using only the Gram matrix over the outputs of the training data, OK3 is able to learn the output kernel as a function of inputs.
Finally, we present applications of how this new family of algorithms behaves on an image reconstruction task and on two biological network completion tasks.


Ambedkar Dukkipati (Dept. of Computer Science and Automation, Indian Institute of Science, Bangalore), October 25, 2006

Some results on Generalized Measures of Information and corresponding Maximum and Minimum Entropy Prescriptions

In light of their well-known axiomatic and operational justifications, I present some results pertaining to the mathematical significance of `generalized' measures of information in the sense of `Renyi' and `nonextensive' or Tsallis (while Renyi entropy is additive like Shanoon, Tsallis is not).

I discuss measure-theoretic formulations for generalized information measures and extend Gelfand-Yaglom-Pereze (GYP) theorem for Kullback-Leibler relative-entropy to generalized case. GYP-theorem for KL-entropy is a fundamental theorem which plays an important role in extending discrete case definitions of various classical information measures to the measure-theoretic case.

The other results I present in this talk are related to maximum entropy prescriptions of nonextensive entropy. Though relative-entropy is not a metric, in cases involving distributions resulting from relative-entropy minimization, one can bring forth certain geometrical formulations. These are reminiscent of squared Euclidean distance and satisfy an analogue of the Pythagoras' theorem and plays a fundamental role in geometrical approaches to statistical estimation theory like information geometry. In this talk I present the equivalent of Pythagoras' theorem in the nonextensive formalism.


Pan, Guangming (Department of Applied Mathematics, National Sun Yat Sen University, Taiwan), October 25, 2006

Asymptotics of eigenvectors of large sample covariance matrices

The eigenvectors of sample covariance matrices play an important role in principal component, factor analysis and some other fields. But, relative less work was done regarding the asymptotic behavior of eigenvectors in the research of large dimensional sample covariance matrices, compared to the eigenvalues,. In this talk, we define a new form of empirical spectral distribution, which involves the eigenvectors and the eigenvalues. Surprisingly, it is shown that this empirical spectral distribution and the classical empirical spectral distribution converge to the same limiting spectral distribution. Based on this new empirical spectral distribution, the central limit theorem of linear spectral statistics involving the eigenvectors and eigevalues are also established. Finally, we demonstrate how to apply large sample covariance matrix theory to wireless communication area.


Ms. Efang Kong (Department of Statistics and Applied Probability, National University of Singapore), October 2, 2006

Variable selection for the single-index model

We consider variable selection in the single-index model. We prove that the popular leave--out cross validation method has different behavior in the single-index model from that in linear regression models or nonparametric regression models. A new consistent variable selection method, called separated cross validation, is proposed. Further analysis suggests that the method has better finite sample performance and is computationally easier than leave--out cross validation. Separated cross validation, applied to the Swiss banknotes data and the ozone concentration data, leads to single-index models with selected variables that have better prediction capability than models based on all the covariates.


Estate Khamaladze (Victoria University of Wellington), February 3, 2006

Distribution free method of testing exponentiality, with application to curious historic data

On the firts glance it may look as if we have in statistical theory plenty of tests to test whether the sample follows exponential distribution. However, this is, unfortunately, not correct. We will present the version of empirical process which under the hypothesis of exponentiality converges to very convenient limiting process - the standard Brownian motion. Therefore, its distribution does not depend neither on the exponential family itself, nor on the way we estimate parameter of this family. It is the only one process we know of, excepting the process of Koul (1978), Angus(1982), with these properties. On the way of doing this we clarify some misunderstandings accumulated over time in the literature.

In the second half we consider duration of reign of Roman Emperors and also of Chinese Emperors. Humans dye, mostly, as the result of aging. If a reign stops as a result of accumulated tensions of political, social, economic or
personal nature, then the duration of reign can not follow the exponential distribution. But in both cases we consider, they (the durations) do! Relatively delicate analysis is necessary and we show how did we do ti.


Estate Khamaladze (Victoria University of Wellington), February 1, 2006

Differentiation of sets and applications to probability and statistics

We know that in statistical problems involving finite-dimensional parameter the local properties of likelihood ratio with respect to this parameter and, especially, the differentiability in this parameter, play key role in all asymptotic analysis.

There is, however, wide class of spatial statistical problems where the parameter of interest is a set. For sets similar analysis is much more complicated and we do not seem to have appropriate tools.

In the talk will present new notion of differentiability of a set-valued functions, analogous to directional derivatives of functions, and useful for statistical applications.

As a particular such application we consider local point processes in the neighbourhood of a given set. Using the notion of differentiability we construct the limiting process for it as the neighbourhood shrinks while the intensity of the point process increases.

We hope, at the same time, that the concept of "differentiation of sets" can prove important in wider range of statistical and probabilistic problems.

To give one example of the statements proved, consider A_t, which is a Borel subset of R^d for each t > 0. If it is differentiable in t and if P is an absolutely continuos measure in R^d, then there exists anopther measure Q, explicitly deifned and such that

dP(A_t)/dt = Q(dA_t/dt)

The left hand side may be simple, but the right hand side is non-trivial.


2005


Tijl De Bie (K.U.Leuven), December 20, 2005
Joint work with Alexander Dolia, Chris Harris, John Shawe-Taylor, and Michael Titterington

Optimal experiment design for kernel ridge regression, and the minimum volume covering ellipsoid

Optimal experiment design (OED) and ellipsoid estimation are an issue of primary importance in areas such as statistics, system control and identication, visual/video tracking, sensor management, active learning, data mining and novelty detection.

In this talk I will present a new approach to OED for ridge regression, as well as for its kernel version. This allows one to optimally design experiments for nonparametric nonlinear regression, whereas in the past generally parametric techniques with a fixed set of nonlinear basis functions had to be used. The resulting optimal design is sparse, in the sense that measurements should be taken at only a limited set of points.

Interestingly, the optimization problem that is dual to the OED for ridge regression corresponds to finding the minimum volume covering ellipsoid in a kernel induced feature space with an additional regularization term. I will show how this result is of use for novelty detection applications.

presentatie


Santiago Vidal Puig (Technical University of Valencia (ES), November 8, 2005

Benefits of Using the MEGA Statistical Process Control 

Mega Statistical Process Control: why we need it and how to use it for fault diagnosis 

The communication is divided in two parts:

In the first part we will introduce and describe different approaches for monitoring multivariate industrial processes. We present the advantages of Mega SPC (Megavariate Statistical Process Control) compared with the Univariate Charts SPC (USPC) and the Standard Multivariate SPC based in the T2 of Hotelling. 

In the second part we will discuss about diagnosis of faults which is an essential step for the monitoring of the process. Once a fault has been detected we need to know what are the original measured variables responsible for the detected fault. This is even more important for the MegaSPC that uses latent variables that differ from the original ones. In the last 15 years several strategic spproaches have been proposed. From strategies based on the space of the original variables as Doganaksoy, Hawkins, Mason et al. to strategies based on the latent space such as contribution plots, fault reconstruction or fault signature. In this second part i will make a short review to the most important methods and present some illustrative examples of the different methods in action. 

Presentation


Monika Meise (Universität Duisburg-Essen), June 22, 2005

Approximating Data with Splines

Regression: Many regression methods depend on the appropriate choice of locally defined smoothing parameters. The approach taken here is to base the choice of local smoothing parameters on a multiresolution analysis of the residuals. This will be illustrated using weighted splines and compared to other locally adaptive methods such as wavelets. Densities: Given data y_1,..., y_n we look for an approximating model of the form Y_i=X_i+Z_i, i=1,..., n (1) where the (X_i)_1^n and (Z_i)_1^n are respectively i.i.d random variables and the distribution of the (Z_i)_1^n is given. The problem is to decide whether there exists an approximation of the form (1) and, if so, to specify a distribution of the (X_i)_1^n. We do this by choosing the distribution of X so as to minimize the Kolmogorov distance d_{ko}(F_n,F^Y) where F_n is the empirical distribution of the data and F^Y the distribution of the random variables Y. Variations include minimizing the total variation of the first and second derivative of the density f^X of X and the use of higher order Kuiper metrics.

 


Marie Husková, (Charles University, Prague), June 14, 2005

Control Charts Based on Alternative Hypotheses

We present statistical models in terms of hypothesis testing for practical out-of-control situations in Statistical Process Control that extend the traditional mean shift or linear trend situations. Based on these explicit alternative hypotheses, we derive likelihood ratio tests. Simulations are used to obtain critical values and to study the performance (in terms of both mean and standard deviation of detection delays) of our procedures. We compare our control charts with a control chart proposed by Chang and Fricker. It turns out that smaller mean delays are not always preferable.


L. Birgé (Laboratoire de Probabilités Université Paris VI), May 23, 2005

Lucien Birgé has been awarded the 2005 Brouwer memorial medal by the Dutch Royal Mathematical Society (Koninklijk Wiskundig Genootschap). The Brouwer medal is granted once every three years and is a very prestigious prize in mathematics. This time, the Society chose the field mathematical statistics. Professor Birgé will receive the medal and give a lecture at a joint BeNeLuxFra mathematical congress in Gent (Belgium). As the winner of the 2005 Brouwer medal, professor Lucien Birgé will give several talks in The Netherlands including Eindhoven, VU Amsterdam, and Delft. Professor Birgé has accepted EURANDOM's invitation and he will give a lecture on model selection targeted to a broad audience of statisticians, probabilists and mathematicians.

A general approach to model selection via testing

We want to present a general approach to model selection for statisitcal estimation based on penalized M-estimators on some countable sets and their generalizations. The method applies to various stochastic situations (independent observations, Gaussian vectors or sequences, some regression frameworks with fixed or random design,\dots) and aims at estimating an unknown parameter $s$ which charaterizes the distribution $P_s$ of the observations and belongs to some given metric space $(M,d)$.

Link to the full abstract in pdf format


Nadia, Lalam (EURANDOM), May 3, 2005

Statistical modelling of gene expression data from confocal scans of Drosophila embryos

Confocal laser scanning microscopy is a powerful tool for the imaging of gene expression in a developing embryo. We model the experimental gene expression data obtained by this methodology when considering the particular case of the formation of segment patterns in the early development of the Drosophila embryo. Segmentation and more generally developmental processes result from the interaction of genes in a regulatory network. Reinitz and Sharp (1995) proposed to model the genetic regulatory network responsible for the segmentation mechanism by a set of nonlinear ordinary differential equations satisfied by the gene product concentrations. Relying on this modelling and the quantitative gene expression data from the scans of the embryos, we propose a new statistical approach to construct efficient estimators of the parameters arising in this model of differential equations. Our estimators should entail a better summary of the information contained in the gene expression data than the currently used least squares estimators.


Jelle Goeman (Department of Medical Statistics and Bioinformatics - Leiden University), April 21, 2005

Testing against a high-dimensional alternative

As the dimensionality of the alternative increases, the power of classical tests tends to diminish quite rapidly. This is especially true for high-dimensional data in which there are more parameters than observations. In this paper we discuss a score test in an empirical Bayesian model as an alternative to these classical tests. It gives a general test statistic which can be used to test a point null hypothesis against a possibly high-dimensional alternative, even when the number of parameters exceeds the number of samples. This test will be shown to have optimal slope of the power function on average in all directions from the null, which makes it a proper generalization of the locally most powerful test to multiple dimensions. To illustrate the locally most powerful test we investigate the case of testing the global null hypothesis in a linear regression model in more detail. The empirical Bayes score test is shown to have significantly more power than the F-test when under the alternative the large-variance principal components of the design matrix explain significantly more of the variance of the outcome than the low-variance principal components. The locally most powerful test is also useful for detecting sparse alternatives in truly high-dimensional data, where its power is comparable to the test based on the maximum absolute t-statistic. This is joint work with Hans van Houwelingen.


Monia Lupparelli (Department of Statistics - University of Florence, Italy), April 7, 2005

Bi-directional graph models for contingency tables

Bi-directional graph models, also called covariance graph models, are used to encode marginal independence (Richardson and Spirtes, 2002). The parametrization for discrete distributions is in general still an open problem. Log-linear and logit models are widely used in graphical modelling, but they do not fit in the framework of bi-directed graphs because they do not easily allow to model marginal distributions and joint response variables. Recently Drton and Richardson (2005) introduced a method for binary variables based on a M\"obius parametrization.

The aim of our work is to show how marginal log-linear models (Bergsma and Rudas, 2002) can be used to parametrize bi-directional graph models for general categorical variables. We illustrate with some examples that it is always possible to find a hierarchical marginal log-linear parametrization that fulfills the measure of independence imposed by the connected set Markov property in any bi-directed graph.


Moulinath Banerjee ( Department of Statistics - University of Michigan, USA), April 5, 2005

Inference for conditionally parametric response models

Conditionally parametric response models provide flexible (and consequently useful) strategies for nonparametric modelling. Formally, consider a sequence of i.i.d. observations from the distribution of $(X,Z)$ where $X$ is a response variable and $Z$ a covariate with some unknown distribution and the conditional distribution of $X$ given $Z = z$ is $p(.,\psi(z))$, where $p(.,\theta)$ is a regular parametric family of densities. We are interested in making inference on the unknown "dependence function" $\psi$. I will talk about the utility of such models in applications and present inference problems regarding $\psi$. Key themes will be (a) Inference for $\psi$ under shape restrictions and (b) threshold estimation for $\psi$. I will present several unified theorems and, time permitting, talk about extensions of the ideas to more complex (semiparametric) models.


Edwin van der Heuvel (Statistics Dept., NV Organon, Oss, the Netherlands), March 10, 2005

Evaluation of an Affymetrix High-density Oligonucleotide Microarray Platform as a Measurement System 

An Affymetrix High-density Oligonucleotide microarray platform is used for routine experimentation in search of genes that may explain biological differences between medical treatments and/or diseases. Before the abundance level of thousands of genes in one biological sample can be measured simultaneously, several processing steps are involved. An experimental design was set-up to investigate the contribution of these specific variation sources on the measurement error or technological variation. A mixed effects analysis of variance model was applied to estimate these contributions in terms of variance components. From these variance components the microarray platform is evaluated as a measurement system for the purpose of gene selection in future microarray studies. Furthermore, the statistical model is evaluated for its goodness-of-fit to describe such microarray data.


Wicher P. Bergsma (EURANDOM), February, 2005

On a new type of correlation, its orthogonal decomposition and associated tests of independence 

For some applications a possible drawback of the ordinary correlation coefficient $\rho$ between two real random variables $X$ and $Y$ is that $\rho=0$ does not imply independence. Hence, a test of independence based on the correlation has only power against narrow alternatives. In this talk, an alternative coefficient is introduced, which is closely related to the correlation but which equals zero if and only if the two variables are independent. It is shown that the new coefficient can be written as an infinite sum of squared correlations, and details of these component correlations are given. The asymptotic distribution of the U and V statistic estimators of the coefficient, which is a mixture of chi-squares, is derived. It is shown that as a special case, a generalization of the Cramer-von Mises test is obtained to the case of $K$ ordered samples.


Richard Gill (Utrecht/EURANDOM), February 8, 2005

Missing data and biased sampling versus quantum non-locality 

I'll discuss some new results concerning the optimal design of Bell experiments; these are experiments which are supposed to establish "quantum non-locality": a code-word for "classically impossible correlations between distant parts of a physical system". According to the classical picture, correlations are explained by what the physicists call hidden variables. In the language of statistics these are just "missing data", and I'll explain how statistical methods for dealing with missing data can be used to construct maximally powerful experiments. I'll describe many open problems and surprising findings. Actual experiments are plagued by all kinds of difficulties. The best known is called the "detection loophole" but it is no more and no less than biased sampling, or in the language of Monte Carlo simulation, "distributed rejection sampling". It turns out that an even more severe form of biased sampling afflicts most experiments done to date, which we have christened the "coincidence loophole". It seems we may have to wait a long time before anyone does a conclusive experiment.


Fabio Rigat (EURANDOM), January 20, 2005

Binary Neuronal Networks

A statistical framework is proposed to model a network of binary random variables. Both the network structure and the strenghts of the existing pair wise connections are jointly estimated from the data within a fully Bayesian approach by employing the stochastic search variable selection method (George and McCullogh [1993]) and the Metropolis-Hastings algorithm (Hastings [1970]). Predictions for future outcome states are obtained through the posterior predictive success probabilities for each node. The framework is employed to model complex interactions arising within networks of spiking neurons. Examples will be provided illustrating the model performance in fitting and predicting both simulated and real data. 


2004


Farida Enikeeva, (EURANDOM), December 7, 2004

Empirical Bayesian Test of the Smoothness Parameter

In adaptive nonparametric curve estimation, one commonly estimates a function from a nested family of functional classes that are parameterized by a smoothness-like quantity. It has already been realized by many that estimating the smoothness parameter is not sensible. What can then be inferred about the smoothness? We attempt to answer this question. Implications for the relevant hypothesis testing are presented: due to the nested model structure, a consistent test can be constructed only for the one-sided hypothesis. The test statistic is based on the marginalized maximum likelihood estimator of the smoothness for an appropriate prior distribution on the unknown signal. This is the joint work with Eduard Belitser (Utrecht University).


 

Juri Lember, (Institute of Mathematical Statistics), October 5, 2004

On fluctuation of the length of the longest common subsequence

We consider 2 finite sequences (words) over a finite alphabet (DNA sequences, for example). The aim is to measure their similarity. A common measure is the length of the longest common subsequence. To understand, whether the similarity is caused by chance, one is interested on the asymptotic behaviour of the length of the LCS of 2 independent iid sequences. Let X_1,..X_n,Y_1,...,Y_n be 2 independent iid Bernoulli sequences, let the random variable L_n be the length of their LCS. It is well known that L_n/n tends to a constant a.s., the question of the behaviour of the variance of L_n arises. We consider the special case when the sequence Y_1,...Y_n is non-random and periodic, and we show that there exists consatnts 0<k<K so that kn<VAR L_n<Kn. Hence L_n-EL_n is typically of order sqr(n) as conjectured by Waterman.


 

T. Rudas, (Eotvos Lorand University) May 28, 2004

Log-linear models for multidimensional contingency tables: interpretation and estimation

The talk discusses log-linear (including graphical) models for multiway contingency tables and considers various interpretations, i.e. characteristic properties of these models, including their relationship with conditional odds ratios and a canonical representation. In this canonical representation, every log-linear model is represented as the intersection of several simpler log-linear models, each of them belonging to either one of two types of such models. Log-linear models are exponential families and maximum likelihood estimates are usually computed using the iterative proportional scaling algorithm. In fact, this algorithm computes the minimum discrimination information estimate in the dual linear family. This is obtained as the result of iterated projections into simpler linear families, the intersection of which is the dual to the log-linear model. A dual algorithm is discussed, that iteratively computes maximum likelihood estimates utilizing that the log-linear model is the intersection of several simpler log-linear models.


Jordan Stoyanov ( University of Newcastle (UK), June 4, 2004

Moment Analysis of Distributions

We study distributions with finite moments and such that the classical problem of moments for them has a non-unique solution. We start with brief comments on frequently used criteria for uniqueness or for non-uniqueness of distributions in terms of their moments (Stieltjes, Carleman, Hamburger, Hausdorff, Cramer, Krein).

Then we concentrate on some recent developments. We describe a method for constructing Stieltjes Classes = families of distributions all with the same moments. For some Stieltjes classes we give the value of the Index of Dissimilarity.

The illustrations include functional transformations of random data involving popular distributions such as normal, inverse Gaussian, lognormal, generalized gamma, logistic. Results about the distributions of some stochastic processes will also be presented.

If time permits, some open questions will be outlined.

The speaker will address his talk not only to professionals in Probability/Statistics, but also to PhD students in this area.


Wiliam Rey (Philips Research Laboratories), May 26, 2004

Karhunen-LoХve, Principal components and SVD

Karhunen-LoХve transforms (KL) and Principal component analysis (PCA) are two techniques that are closely related although, conceptually, there objectives have little in common.

KL build up low-cost optimal approximations of curves; the objective is to approximate. The telecommunication engineers are typical users of KL.

PCA is a method to visualize what takes place in a high dimension space with the help of optimal projections; the objective is visualisation. PCA is used to explore data sets.

Seen from the algorithmic side, KL and PCA are closely connected by Singular Value Decomposition (SVD); except for minor details, two linear algebraic dual spaces are of concern and, whether you place the accent on one or on the other, you think in terms of KLE or PCA. The talk is introductory and covers KL, PCA and SVD in their principles but, as well, with respect to some very applied aspects.


Iryna Snihir, (EURANDOM), April 16, 2004

Life testing: can we predict the battery life?

Having for an object to get a reliable Battery Management there is a goal for us to work out a proper battery model. Our research work concentrates on the development and refining of the mathematical methods for battery modelling and estimation of parameters of the model. The aim of my talk is to focus on the "black box" approach to the battery model, and more precisely, on several exercises based on statistical methods, which were applied on life test data where cells have been submitted to repeated identical cycles. Analysing life test data, we developed a method for predicting the maximal internal gas pressure (P) based on the measurements of battery's voltage (V) and temperature (T), as well as on evaluation of the empirical relations among V, T and P. This technique can be of help to underlie "pressure control" charging algorithms, to improve safety/life time of the batteries and cycle life performance. The next step is a modelling of the separate battery cycles and forecasting the next cycles, when already a few have been seen, with help of regression model and of first principal components. It was done in the context of the life test data investigated under the laboratory conditions. That contributes to a mastery of the battery management.


Andreas Christmann, (University of Dortmund), June 11, 2004

On a combination of convex risk minimization methods to analyze data > from insurance companies

The goals of the talk are twofold: we describe common features in data > sets from > motor vehicle insurance companies and we investigate a general strategy > which exploits the knowledge of such features to detect and to model > hidden information. > The results of the strategy are a basis to develop insurance tariffs. > The strategy is applied to a data set from 15 motor vehicle insurance > companies containing > information from more than 4 million customers. > We use a nonparametric approach based on a combination of kernel > logistic regression and > $\varepsilon-$support vector regression. Both methods belong to the > class of > statistical machine learning methods based on convex risk minimization. > Some recent results of robustness properties of such methods are also > given.


Crisitina Butucea (Université Paris X, Nanterre and Paris VI ),  February 24, 2004

Quadratic functional estimation in the convolution model

We consider i.i.d. random variables having common probability density . We observe noisy data: , that is  where the noise variables  are i.i.d., independent of  and having entirely known smooth distribution.

We assume that the unknown density is Sobolev-smooth with regularity  and that the noise is -ordinary smooth, . We estimate from noisy observations .

If the underlying density  is smoother enough than the noise density, parametric rates can be attained, otherwise, larger nonparametric rates are obtained and they are proven to be optimal in the minimax sense. As an application, we study nonparametric goodness-of-fit tests from noisy data in the minimax approach.


Leila Mohammadi (Leiden University), February 13, 2004

On the statistical theory of classification

This lecture concerns an approach to statistical learning problems in the nonparametric setting. Suppose we are given n i.i.d. copies of a random variable (X,Y), where X is an instance and Y is a label, -1 or 1. We define a classifier h as a function with values -1 and 1 and we denote a class of classifiers by H. For the case that X is one dimensional and for some parametric cases of H such as the classifiers with K thresholds, we estimate the parameters by the minimizer of the classification error in the sample and we show the asymptotic distribution and the rate of convergence of the empirical risk minimizer which is cube root n. If one of the thresholds is on the boundary of the space of X, then the asymptotic result is different and convergence is quicker. We also consider the case that X is multidimensional and show that similar results hold when the classifiers are 1 on halfspaces. In a simple case, we show that the rate of convergence of the empirical risk minimizer is optimal. We also propose an algorithm to find the empirical risk minimizers in the one dimensional case. For a reference see Mohammadi and van de Geer (2003).

References

Mohammadi, Leila and van de Geer, Sara. A. (2003). On threshold-based classification rules. Institute of Mathematical Statistics, Lecture Notes Monograph Series, Mathematical Statistics and Applications: Festschrift for Constance van Eeden. V. 42. p. 261–280. 


Jamy Robins

Towards A Unified Theory of First and Higher Order Statistics

Modern semiparametric root-n theory has its foundations in likelihood. Non-root-n function and functional estimation does not. Is there a common story. That is one likelihood theory for all that results in the use of higher order influence functions to do near optimal non-root-n estimation (in near exact analogy to the root-n case.) We describe the building blocks of the theory.


Jüri Lember, (University of Tartu), January 13, 2004

Empirical Measures in Adjusted Vitrebi Training

We investigate the so-called Viterbi training for Hidden Markov Model (HMM) parameter estimation. This training is based on (Viterbi) alignment for a finite string of observations. We show that the alignment can be naturally generalized for almost every (infinite) realization of the HMM. Such an infinite alignment gives an encoded process - the alignment process. We study the properties of the alignment process; we show that the limiting frequencies of that process exist, an


2003


Peter van de Ven, (EURANDOM), December 12, 2003

On the Equivalence of Algorithms for Computing Effects in Factorial Designs

The Yates algorithm, Good interaction algorithm, symbolic algorithm and least squares estimation are different algorithms for computing effects in factorial designs. It is folklore that these algorithms all give the same results. Rigorous proofs of the equivalence of these algorithms does not seem to be available in literature except for the Yates and Good algorithms. We present a rigorous proof for the equivalence of all algorithms, including precise definitions of the notions involved. We will pay attention to inconsistent definitions of effects and the interpretation and importance of different ways of coding factor levels.


Dmitry Danilov, (EURANDOM), December 5, 2003

Modeling the Li-ion rechargeable batteries

A mathematical model describing behavior of the rechargeable Li-ion batteries is developed. The model simulates behavior of the battery in a single charge-discharge cycle but also explains difficult process of battery degradation in a long time span (hundreds of charge-discharge cycles). The core of the model consists of a system of coupled partial and ordinary differential equations related to the main storage reaction and basic side reactions, such as the Solid Electrolyte Interface formation, and decomposition of the active electrode material. The model is tested on available data and appears to provide an adequate fit.


Richard Gill (University of Utrecht / EURANDOM), November 21, 2003

Problems in Quantum Statistical Information

I will give an introduction to "quantum statistics" - statistical problems for experiments involving quantum data - and discuss open problems in the field.


Daniel Herrmann (The Bosch Group), November 14, 2003

Kernel Based Algorithms and Statistical Learning Theory

In this talk we give a short introduction to statistical learning theory and explain the idea of kernel based algorithms like support vector machines (SVM) which has become very successful in real world applications. The advantage of this type of learning algorithm is that it yields a convex optimization problem and its generalization ability can be measured by the capacity of the function class which the learning algorithm can implement. We explain how to measure the capacity by the concept VC dimension (Vapnik-Chervonenkis) and explain the idea of structural risk minimization. The success of SVM can be attributed to the joint use of a robust classification procedure (large margin hyperplane) and of a convenient and versatile way of (nonlinear) preprocessing the data (kernels). It turns out that with such a decomposition of the learning process into preprocessing and linear classification, the performance highly depends on the preprocessing and much less on the linear classification algorithm to be used. It is thus of high importance to have a criterion to choose the suitable kernel for a given problem. Ideally, this choice should be dictated by the data itself and the kernel should be 'learned' from the data. We propose to use a gradient based procedures for optimizing the coefficients kernel and give theoretical bounds on the corresponding generalization error for different classes of kernels.


Yves Rozen, (Paris Jussieu), November 7, 2003

Testing nullity in regression framework

We introduce a new testing procedure based on symmetrization to construct a test of "f=0" against "f~=0" in the regression model. This procedure works for errors non necessary gaussian. We prove that our adaptive multi-test is optimal in Holder class for gaussian error and keep a good speed in non gaussian case.


Peter Grünwald, (CWI Amsterdam), November 7, 2003

Updating Probabilities

As examples such as the Monty Hall and the 3-prisoners puzzle show, > applying conditioning to update a probability distribution on a ``naive > space'', which does not take into account the protocol used, can often > lead to counterintuitive results. We give a detailed explanation of > this phenomenon. A criterion known as CAR (``coarsening at random'') in > the statistical literature characterizes when ``naive'' conditioning in > a naive space works. We provide two new characterizations of CAR. First > we show that in many situations, CAR essentially *cannot* hold, so that > naive conditioning must give the wrong answer. Second, we provide a > procedural characterization of CAR, giving a randomized algorithm that > generates all and only distributions for which CAR holds. Both results > complement earlier work by Gill, van der Laan and Robins. > We also consider more generalized notions of update such as Jeffrey > conditioning and minimizing relative entropy (MRE). We give a > generalization of the CAR condition that characterizes when Jeffrey > conditioning leads to > appropriate answers, and show that there exist some very simple > settings in which MRE essentially never gives the right results. This > generalizes and interconnects previous results obtained in the > literature on CAR and MRE.


Alexei Koloidenko, (EURANDOM), October 10, 2003

Adjusted Viterbi Training

I will be talking about our joint work with Juri Lember, a former Eurandom postdoc. Motivated by the broad use of Hidden Markov Models (HMM) in speech processing and recognition, natural language modelling, image analysis, and bioinformatics, we consider the problem of estimating parameters of the emission distribution. It is well-known that the EM algorithm computes a Maximum Likelihood Estimator (MLE) for HMM parameters, and in many such situations MLEs are consistent. However, computational considerations often lead to less intensive alternatives. The Viterbi Training (VT) algorithm is widely used instead of EM despite the inconsistency of its estimators. Our work aims to ``interpolate'' between EM and VT: We propose a principled approach to alleviating the bias-related drawbacks of VT at a minimal increase of computations. Our work relies on the concept of infinite Viterbi alignment and on a limiting probability distribution associated with this alignment. We explain why in general this latter distribution cannot be computed exactly, and we also discuss appropriate approximations. Moreover, we show that in the case of mixture models, an important special case of HMM, this distribution can be computed exactly. The experimental part of the paper focuses on the mixture of two univariate normal distributions with unit variance and unknown means. This example illustrates that the adjusted algorithms are still computationally less intensive than EM, and in contrast with VT, enjoy the property of asymptotically fixing the true parameters. We therefore suggest replacing VT by our adjusted procedures in applications that can afford the additional computations.


Alexei Koloidenko, (EURANDOM), May 20, 2003

Algebraic Aspects of Statistical Modeling

Motivated by probability models for distributions on small square subimages of digitized photographs, we will consider a somewhat more general situation, in which the state space is a real vector space $\mathbb{R}^{n^2}$ (e.g. representing gray scale intensities in $n\time n$ subimages) and the measures of interest posses special types of invariance: Namely, they are invariant under an action of a finite group $G$ that admits a linear (matrix) representation on $\mathbb{R}^{n^2}$. In our central example, the group is the full symmetry group of the square-based parallelepiped (embedded in $\mathbb{R}^{n^2}$ with its center mapped to the origin). The theory of algebraic invariants tells us that there exists a finite set of polynomials invariant under the same action with the following property: Any polynomial in $n^2$ indeterminates with the same type of invariance can be written as a polynomial in terms of these special polynomials (called for their special role {\emph fundamental invariants} or {\emph fundamental generators}). In this talk, I will present two results that are basically specializations and extensions of the general "problem of moments". The first one is the "Extended Carleman theorem for $G$-invariant moments" and it gives us a sufficient condition under which a $G$-invariant measure can be uniquely determined by the expected values of "mixed fundamental generators". The second result is about convergence of $G$-invariant constrained maximum entropy estimators, and we will also discuss its practical significance. Some of this material is still a work in progress, hence your comments and suggestions will be especially appreciated.


Gabriele Brondino, May 20, 2003

"Zero-Point" in the Evaluation of Martens Hardness Uncertainty

Hardness measurements have a significant role in mechanical metrology, as they are frequently used to characterise materials properties relevant to industrial processes. A recently introduced method, called Martens Hardness, is based on force and indentation records obtained during a test cycle; the Force/Depth Curve, which describes the indentation pattern, is typically formed by two parts having a zero-point in common. A segmented regression model is proposed in this paper, based on the introduction of a threshold parameter in order to estimate the unknown zero-point. The problem is not trivial, since the relationship between observed force and indentation depth is structural and, moreover, the number of nuisance parameters grows with the number of measured data. The asymptotic likelihood theory leads to an estimate of the unknown parameters of the model. Monte Carlo simulations are resorted to in order to analyse the properties of estimators under different hypotheses about measurement errors, and to establish the applicability conditions of the method proposed.


Wicher Bergsma, (Universiteit van Tilburg), May 19, 2003

Testing conditional independence with a continuous control variable

A common statistical problem is the testing of independence of two (response) variables conditionally on a third (control) variable. It is shown that, when the control variable is continuous, the methods that have been proposed in the literature either depend on strong distributional assumptions or suffer from low power. In the first part of this talk, the theoretical difficulties involved in testing conditional independence with a continuous control variable are made precise. In particular, the concept of testability of degree r is introduced, and it is shown that without assumptions, independence is testable of degree 2, while conditional independence with a continuous control variable is not testable of any degree. However, we proceed to show that, if appropriate assumptions about the marginal (conditional) response distributions are made, both hypotheses are testable of degree 1. This shows the fundamental difficulty of testing conditional independence with a continuous control variable: assumptions about the conditional marginal responses must be made. In the second part of this talk, a practically feasible solution to the testing problem is given. The concept of partial copula is introduced, which is a certain average of bivariate conditional copulas and is thus based on marginal ranks. It is shown that by estimating the partial copula, a general and practically feasible class of tests can be obtained for the testing of conditional independence.


Professor J.K. Lindsey, (University of Liege, Belgium), April 29, 2003

What does pharmacokinetics model?

Pharmacokinetics studies the flow of some substance through the body using compartment models. At a first level, these model the movement of molecules in the organism using the assumptions of Markov chains. At a second level, differences among organisms and perturbations over time must be taken into account. Here, nonlinear random effects and autoregression can be useful.


Madalin Guta, (EURANDOM), March 25 , 2003

An invitation to quantum tomography

We describe quantum tomography as an inverse statistical problem and show how entropy methods can be used to study the behaviour of sieved maximum likelihood estimators.


Fabio Rigat ( Institute of Statistics and Decision Sciences at Duke University, USA),  February 23, 2003

Bayesian CART modelling 

I present a new modelling approach for Bayesian CART models. First I will introduce a tree definition focussed on the statistical models generated by the tree rather than on the tree structure. Second I will derive a likelihood function which explicitly takes into account the tree structure and the leaf distributions. Third I will propose a hierarchical tree prior which includes in a coherent framework all the essential features of the model. Then I will describe a Markov chain Monte Carlo technique to explore the tree space. Efficient exploration of the posterior is made possible by devising appropriate moves to wander in the space in order to cope with the inherent multimodality of the posterior. Simulated tempering is employed in order to enrich the set of possible moves in the tree space and improve mixing. Finally the model averaging framework is adopted in order to derive robust predictions. Throughout the talk I will mainly focus on Weibull survival trees. I will produce data analysis examples for both simulated datasets and for right censored cancer survival times.

 

    P.O. Box 513, 5600 MB  Eindhoven, The Netherlands
tel. +31 40 2478100  fax +31 40 2478190  
  e-mail: office@eurandom.tue.nl