logo

European Institute for Statistics, Probability, Stochastic Operations Research
and their Applications

About | Research | Events | People | Reports | Alumni | ContactHome


From September 2011, the Informal Meetings will be integrated in the new STO(chastics) Seminar Series.
Please follow the link for further details.


Informal Meetings Statisticians and Probabilists

2010-2011

 

Contact A. Di Bucchianico, P. de Andrade Serra & R. Castro

Name and Affiliation                                                                                       Title  

Kees van Hee, (TU/e)
June 14, 2011
Discovering Characteristics of Stochastic Collections of Process Models
Rui Castro, (TU/e)
May 31, 2011
Sudoku and Sinkhorn Balancing
Ludwina Hobma, DAF Trucks, Eindhoven
May 24, 2011
Test Time Reduction of DAF Engines Using Six Sigma
Alessandro Di Bucchianico, (TU/e)
May 17, 2011
An estimation problem in software reliability
Francesca Nardi, Eindhoven University of Technology
April 12, 2011
Metastability for Kawasaki dynamics at low temperature with two types of particles
Carlo Lancia (Universita Tor Vergata (Rome))
April 5, 2011
Entropy-driven cutoff phenomenons
Remco van der Hofstad (TU/e - EURANDOM)
March 1, 2011
Update on State Space Estimation Problems Posed by Jan Friso Groote
Botond Szabo (TU/e - EURANDOM)
February 22, 2011
Empirical Bayes Method
Johan Lukkien (TU/e, Computer Science, System Architecture and Networks)
February 8, 2011
Vehicle to vehicle communication
Bart Janssen (TU/e)
February 1, 2011
Fitting linear regression models with Zernike polynomials
Paulo de Andrade Serra (TU/e - EURANDOM)
January 18, 2011
M-Estimation of the Period of a Cyclical Non-homogeneous Poisson Process
   
2010  
   
Subhasis Ghoshal (NCSU)
December 21, 2010
Reference Prior for Large Parameter Spaces
Bart Janssen and Alessandro Di Bucchianico (TU/e LIME) Overview of various industrial projects carried out or being carried out by LIME, with special emphasis on connections to mathematical research
Riu Castro (TU/e) Signal processing, learning theory and statistics
Kees van Hee and Natalia Sidorova (TU/e, Computer Science)
September 21,2010
How large should a log file be to discover its process model?
Kees van Hee, (TU/e)
September 7, 2010
Statistical and probabilistic challenges arising from the analysis of process workflows
Alessandro Di Bucchianico, (TU/e, LIME)
July 8, 2010
Update of a discussion on the use classification methods
Yvo Pokern, University College London
June 29, 2010
Nonparametric Drift Estimation for Stochastic Differential Equations
(joint work with O. Papaspiliopoulos, G. O. Roberts and A. M. Stuart)
 
Mario Pavlides,Frederick University, Nicosia, Cyprus
June 29, 2010
Two Statistical Vignettes: Simpson's Paradox and Shaved Dice
Marie-Colette van Lieshout, CWI - TU/e
June 15, 2010
Image Segmentation by Polygonal Markov Fields
Alessandro Di Bucchianico, TU/e
June 8, 2010
A problem on performing ROC (Receiver Operating Characteristic) analyses for three instead of two outcomes
Birgit Witte, Delft University of Technology
April 20, 2010
Consistent Estimators in the Current Status Continuous Mark Model
Arnoud den Boer, CWI
April 7, 2010
Simultaneously Learning and Optimizing
Daan Crommelin, CWI
March 21, 2010
 
Marc Aoun, Philips
March 24, 2010
Real-time Sensor Networking under overload Conditions
Jan Draisma, TU/e, Discrete Mathematics Group
March 3, 2010
Trek separation for Gaussian graphical models
Olaf Wittich, TU/e
January 27, 2010
Insect vision, signal detection and percolation
Rui Castro, Colombia, NY
January 6, 2010
Active Learning and Selective Sensing: closing the loop between data analysis and acquisition

2011

Kees van Hee (TU/e)

Discovering Characteristics of Stochastic Collections of Process Models

Process models in organizational collections are typically modelled by the same team and using the same conventions. As such, these models share many characteristic features like size range, type and frequency of errors. In most cases merely small samples of these collections are available due to e.g. the sensitive information they contain. Because of their sizes, these samples may not provide an accurate representation of the characteristics of the originating collection. This paper deals with the problem of constructing collections of process models, in the form of Petri nets, from small samples of a collection for accurate estimations of the characteristics of this collection. Given a small sample of process models drawn from a real-life collection, we mine a set of generation parameters that we use to generate arbitrarily large collections that feature the same characteristics of the original collection. In this way we can estimate the characteristics of the original collection on the generated collections. We extensively evaluate the quality of our technique on various sample datasets drawn from both research and industry.
This is joint work with Natalia Sidorova and Zheng Liu.


Rui Castro (TU/e)

Sudoku and Sinkhorn Balancing

In the informal spirit of the meetings I am going to give a (rather informal) talk on Sudoku puzzles: most people are quite familiar with these popular puzzles, which are similar to the classical Latin squares problem. There are many computational methods to find the solution of these puzzles (many involve combinatorial searches). In this talk I'll describe a very simple method that is based on relaxation of the discrete constraints, and uses a simple adaptation of Sinkhorn balancing to find a solution. Sinkhorn balancing is an iterative technique for transforming a matrix with positive entries into a doubly stochastic matrix (one whose rows and columns sum to one). In this talk I will discuss some properties of Sinkhorn balancing and show how it can be used to solve (almost) any Sudoku puzzle in a very simple way.
(This is not my own work, but rather an idea set forth by T. K Moon and co-authors, that was published in the IEEE Transactions on Information Theory, Vol. 55, No. 4, April 2009).


Ludwina Hobma (DAF Trucks)

Test Time Reduction of DAF Engines Using Six Sigma

Abstract: After a short introduction of Six Sigma as a project management methodology the challenging problem statement will be introduced. It will be shown how mathematical statistics helps a production company establishing results. Furthermore it will be shown where the theory of the statistician fails with respect to daily practice. Daily practice includes instability and outliers, besides convincing management and implementability. Please support me with new insight to realize the challenge!


Alessandro Di Bucchianico, (TU/e)

An estimation problem in software reliability

Stochastic models are being used in software testing to support decision making, e.g. by predicting the required additional testing effort to achieve a certain quality level. Most models used in practice can be described as an order statistics process or a nonhomogeneous Poisson process.

In this  talk we will give a brief introduction to these models and point out some issues in using these models. In particular, we will state an  estimation problem that arises from the practical interpretation of applying these models to software testing. This problem is either ignored or an ad-hoc solution . The audience is invited to  contribute to this estimation problem.


Francesca Nardi (TU/e)

Metastability for Kawasaki dynamics at low temperature with two types of particles

We study a two-dimensional lattice gas consisting of two types of particles subject to Kawasaki dynamics at low temperature in a large finite box with an open boundary. Each pair of particles occupying neighboring sites has a negative binding energy provided their types are different, while each particle has a positive activation energy that depends on its type. There is no binding energy between neighboring particles of the same type. We start the dynamics from the empty box and compute the transition time to the full box. This transition is triggered by a critical droplet appearing somewhere in the box.
We identify the region of parameters for which the system is metastable. For this region, in the limit as the temperature tends to zero, we show that the first entrance distribution on the set of critical droplets is uniform, compute the expected transition time up to and including a multiplicative factor of order one, and prove that the transition time divided by its expectation is exponentially distributed. These results are derived for a certain subregion of the metastable region.
The proof involves three model-dependent quantities: the energy, the shape and the number of the critical droplets.
The main motivation is to understand metastability of multi-type particle systems. It turns out that for two types of particles the geometry of subcritical and critical droplets is more complex than for one type of particle.
Consequently, it is a somewhat delicate matter to capture the proper mechanisms behind the growing and the shrinking of subcritical droplets until a critical droplet is formed.


Carlo Lancia (Universita Tor Vergata (Rome))

Entropy-driven cutoff phenomenons

Birth-and-death Markov chains exhibit a sharp cutoff in their convergence to equilibrium if suitable drift conditions are imposed on the transition rates. The cutoff behavior appears to be closely related to the fact that the stationary distribution is mostly concentrated on a region A whose diameter is much smaller than the size of the state space. Then the cutoff time is understood to be the effective amount of time necessary to reach A. The aim of this work is to extend this picture to the apparently unlike case of Markov chains with highly symmetric state space, for which the equilibrium measure is uniform. As a matter of fact, if it is possible to project the state space onto equivalence classes such that the entropy of the system is highly concentrated on a few of them, the behavior of the lumped chain will be analogous to the one of a birth and death process with the role of stationary distribution played by the entropy. I will review some applications of this result.


Botond Szabo (TU/e - EURANDOM)

Empirical Bayes Method

The Bernstein-van Mises Theorem says that in parametric models under some regularity conditions the posterior mass will contract around the true parameter θ0 with the optimal frequentist rate independently from the choice of the prior distribution. In nonparametric model case it was shown, that by bad choice of the prior distribution the posterior distribution won't contract at all, or even if it contracts around the true θ0 the contraction rate will be slower than the optimal frequentist rate. An attempt to solve this problem is to work with a family of prior distributions instead of a single one. It arises the question how to choose the optimal prior distribution out of the family of distributions in the Bayes method. One solution is to put a hyperpior on the family of prior distributions and work with this two level, hierarchical prior distribution. A more practical approach is to choose with an empirical method the optimal prior distribution, this method is called the empirical Bayes method.
We work with the well know white noise model under some regularity assumptions on the unknown, infinite dimensional parameter θ0. In the Bayes approach we put a family of infinite dimensional Gaussian priors on the parameter set Θ and we show that we can separate two regions according the smoothness of the parameter θ0, where in the first one the Empirical Bayes method gives the optimal contraction rate, while in the second one it gives a slower contraction rate than the optimal.


Johan Lukkien (TU/e, Computer Science, System Architecture and Networks)

Vehicle to vehicle communication

We will present work we did in the area of vehicle to vehicle communication for the purpose of early warnings. This communication comprises periodic broadcasting use the WAVE small messaging protocol as part of the IEEE 802.11p wireless communication standard. This standard is based on CSMA/CA. Simulations show unbalanced loss behavior when vehicle density increases.


Bart Janssen (TU/e)

Fitting linear regression models with Zernike polynomials

Zernike polynomials are an important tool in optics. Recently they appeared in various industrial projects carried out by Bart and colleagues. Bart will briefly mention the industrial projects in which Zernike polynomials appeared, discuss the background of Zernike polynomials and show R packages developed by him to fit linear regression models with Zernike polynomials.


 

Paulo de Andrade Serra (TU/e - EURANDOM), January 18, 2011

M-Estimation of the Period of a Cyclical Non-homogeneous Poisson Process

We present the construction of a (semi-parametric) M-estimator for the period of a non-homogeneous Poisson process with a periodical intensity function. We address the issues of identifiably of this parameter, consistency of estimator, rate of convergence and discuss the severity of the conditions under which these results hold further making some connections with ergodic theory. Some simulations will be shown to exemplify the workings of the estimator. We also make a short comparison with estimators proposed in the past for the period. Further, we present a quick sight at ongoing work to develop an iterative procedure for improving the rate of convergence of the estimator based on estimating large multiples of the period; if successful, this procedure will allow us to obtain convergence rates arbitrarily close to the  optimal rate which is known to be n^{3/2}.


2010

Subhasis Ghoshal (NCSU), December 21, 2010

Reference Prior for Large Parameter Spaces

The idea of a reference prior is a key concept in objective Bayesian analysis, originally introduced by Bernardo and further developed by Berger, Bernardo and many others. Reference prior is the result of an asymptotic maximization of the expected relative entropy distance between the posterior and the prior. In the absence of nuisance parameters, the procedure leads to Jeffreys' prior, but other priors emerge if nuisance parameters are present. Posterior asymptotic normality plays a key role in the asymptotic expansion of the expected relative entropy. In this talk, we investigate to what extent, the asymptotic expansion of relative entropy remains valid when the dimension of the parameter space in an exponential family increases to infinity with the sample size. We quantify the allowable rate of growth of the dimension in terms of certain characteristics of the model and the prior. We specifically discuss three examples --- independent normal location model, multinomial model and Dirichlet model. We find explicit growth rates in each model. We further explore the ideas to extend the notion of reference prior beyond parametrics. A popular approach is to consider a finite series approximation of a function of interest and induce a prior through the coefficients. Our results can be potentially applied in this setting. We shall discuss some partial results for density estimation using a spline basis.
 


Riu Castro (TU/e)

Signal processing, learning theory and statistics

My research interests are on the borderline of signal processing, learning theory and statistics. One of my major research focus is on active learning techniques, also known as sequential experimental design. These include learning/sampling procedures that are able to use information gleaned from previous samples to adapt the sampling procedure. Applications include, among others, network monitoring and measurement and effective spectrum analysis methods for opportunistic transmission in cognitive radio.

Kees van Hee and Natalia Sidorova (TU/e, Computer Science), September 21,2010

How large should a log file be to discover its process model?

Petri nets are a description formalism for describing processes with concurrency and synchronization. They are used workflow management systems to control business processes. These systems store all events in a so called log file. Now suppose we have a log file then the question is can we reconstruct the Petri net that produced it? This is a hot topic in computer science and it is called process mining as a special branch of data mining. Actually this can be seen as a statistical estimation problem where the log is the set of observations and the Petri net the parameter to be estimated. In particular it is interesting to know how large the log file (i.e. the set of observations) should be in order to limit the probability of making a wrong decision. The existing process mining techniques do not cover these statistical aspects yet.
In the talk we first sketch the necessary background of Petri nets and process mining. The we give a precise description of the problems and we will answer the question for a particular class of Petri nets.


Yvo Pokern (University College London), June 29, 2010

Nonparametric Drift Estimation for Stochastic Differential Equations"
(joint work with O. Papaspiliopoulos, G. O. Roberts and A. M. Stuart)

For scalar stochastic differential equations on the circle and the real line, a Bayesian estimator for the drift function based on observing a sample path over a finite time interval is constructed using Gaussian priors. We specify the Gaussian priors through their precision operators which are assumed to be given by differential operators.
Local time is essentially a sufficient statistic and we show that the posterior enjoys robustness against small deviations of the local time. We obtain error-control for a fixed random sample all the way from high-frequency discrete observations to the numerical computation of the posterior mean and covariance, via standard partial differential equation methodology.
An empirical Bayes procedure is suggested which allows automatic selection of the smoothness of the prior in a given family.
An application to molecular dynamics simulations is presented as well as some numerical results on asymptotic consistency."
 


Marios Pavlides (Frederick University, Nicosia, Cyprus), June 29, 2010

Two Statistical Vignettes: Simpson's Paradox and Shaved Dice


Marie-Colette van Lieshout (CWI/TUe), June 15, 2010

Image Segmentation by Polygonal Markov Fields
(includes joint work with R. Kluzczynski and T. Schreiber)

We discuss the use of polygonal Markov fields for model-based image segmentation. The formal construction of consistent multi-coloured polygonal Markov fields by Arak-Clifford-Surgailis and its dynamic representation are recalled and adapted. We then formulate image segmentation as a statistical estimation problem for a Gibbsian modification of an underlying polygonal Markov field, and discuss the choice of Hamiltonian. Monte Carlo techniques for estimating the model parameters and for finding the optimal partition of the image are developed. We shall also discuss a class of Markov random fields that can be understood as discrete versions of polygonal fields. The analogy with continuum polygonal Markov fields is exploited to define Hamiltonians that are such that desirable properties of these processes can be carried over to the discrete context. Moreover, the analogy gives rise to new attractive sampling schemes complementing the usual local Gibbs and Metropolis methods employed for Gibbs fields on finite graphs.


Alessandro Di Bucchianico (TU/e), June 8, 2010

A problem on performing ROC (Receiver Operating Characteristic) analyses for three instead of two outcomes


Birgit Witte (Delft University of Technology), April 20, 2010

Consistent Estimators in the Current Status Continuous Mark Model

We consider the problem of estimating the joint distribution function of the event time and a continuous mark variable. However, the event time is not observed directly but subject to interval censoring case 1 and the continuous mark variable is only observed in case the event occurred before time of inspection. A natural estimator for the distribution function is the nonparametric maximum likelihood estimator (MLE).
Maathuis and Wellner (2008) study the MLE in this so-called current status continuous mark model, and prove that it is inconsistent. We study two alternative estimators, the maximum smoothed likelihood estimator and the smooth plug-in inverse estimator. The first estimator is a likelihood based estimator, maximizing a smoothed log-likelihood.
The second estimator is based on the explicit (inverse) expression of the distribution function of interest in terms of the density of the observable vector. We consider the asymptotic behavior of both estimators, in particular showing their consistency.

References:
Eggermont, P. P. B. and LaRiccia, V. N. (2001), Maximum Penalized Likelihood Estimation, New York: Springer-Verlag.
Maathuis, M. H. and Wellner, J. A. (2008), Inconsistency of the MLE for the joint distribution of interval censored survival times and continuous marks, Scandinavian Journal of Statistics, 35: 83-103.

 

 

Last updated 14-10-11
Maintained by
PK

 

 

  P.O. Box 513, 5600 MB  Eindhoven, The Netherlands
tel. +31 40 2478100  fax +31 40 2478190  
  e-mail: 
info@eurandom.tue.nl