During the Workshop on Statistical Inference for Stochastic Process Models in Weather and Climate Science at Lorentz Center in Leiden, the question of (multivariate Gaussian) mixture modelling came up. I was asked to produce several references the participants may find useful. Here they are. Note that none of these methods is a fully automatic black box: appropriate tuning is required.
-
Gershman and Blei (2012). Remarks: This is a brief, but nice tutorial on Bayesian mixture modelling and clustering via Dirichlet process (and its relatives). A large number of references is given. In Table B.1 the authors list several R and Matlab packages one can use when performing data analysis. A more recent one I found is: dirichletprocess.
-
Dellaportas and Papageorgiou (2006). Remarks: Another Bayesian paper. Deals with finite mixtures of multivariate Gaussians with an unknown number of components. Computationally, the paper implements the reversible jump algorithm to sample from the posterior. The original Green (1995) paper where the latter method was introduced is less accessible to a casual reader. Instead, a better choice is e.g. Hastie and Green (2012), but I would also recommend Godsill (2001) for an interesting perspective. Note that sampling-based Bayesian approaches to finite mixture modelling suffer from the so-called label switching problem. In their examples, Dellaportas and Papageorgiou (2006) use a simple post-hoc correction for that. I also found an R package label.switching that can be used to process the MCMC output and account for label switching (the package is described in Papastamoulis (2016)). A general discussion on label switching is given in Jasra et al. (2006).
-
Scrucca et al. (2016). Remarks: Description of a popular R package mclust. Maximum likelihood fitting (via EM), with model selection done by one of the information criteria.