Clustering in a random graph model with a latent space
Antoine Channarond, Jean-Jacques Daudin, Stéphane Robin
One way to add heterogeneity to networks consists in allocating positions $Z$ to the nodes in an unobserved latent space. In our approach the edge between nodes $i$ and $j$ has a probability $f_n(d(Z_i,Z_j))$ to appear, where $n$ is the graph size, $d$ is a distance of the latent space and $f_n$ is a decreasing probability function. Thus an edge is even more likely when the nodes are close in the latent space.
Hoff, Raftery and Handcock (2002) provide methods of unsupervised node classification and parameter estimation. While they assume the density of the positions to be distributed according to a finite Gaussian mixture, we are interested in a non-parametric framework, so that there is no predefined constraint on the density shape or clustering structure. In this setting, the clusters are defined as connected components of a level set of the density (Hartigan, 1975). The question is to test whether the density is clustered, and more generally, what can be inferred from the observed graph about the clustering structure of the position density in the latent space.
We propose a simple algorithm for the estimation of the number of such clusters, which generalizes the procedure of Biau, Cadre and Pelletier (2007). It extracts a graph induced by the observed graph, which is proved to have as many connected components as the density has clusters, with probability tending to one when n goes to infinity. The algorithm is linear with respect to the number of edges and hence can process large graphs. A simulation study is carried out to illustrate the behavior of the algorithm as a function of n. In addition to that, theoretical arguments are provided to prove the consistency.