Mixture of Gaussians (MOG)
Classes and functions for modelling multivariate data as a Mixture of Gaussians. More...


class  itpp::MOG_diag
 Diagonal Mixture of Gaussians (MOG) class. More...
class  itpp::MOG_generic
 Generic Mixture of Gaussians (MOG) class. Used as a base for other MOG classes. More...


void itpp::MOG_diag_ML (MOG_diag &model_in, Array< vec > &X_in, int max_iter_in, double var_floor_in, double weight_floor_in, bool verbose_in)
void itpp::MOG_diag_kmeans (MOG_diag &model_in, Array< vec > &X_in, int max_iter_in, double trust_in, bool normalise_in, bool verbose_in)

Detailed Description

Classes and functions for modelling multivariate data as a Mixture of Gaussians.

Conrad Sanderson
The following example shows how to model data:
    Array<vec> X;
    // ... fill X with vectors ...
    int K = 3;     // specify the number of Gaussians
    int D = 10;    // specify the dimensionality of vectors
    MOG_diag model(K,D);
    MOG_diag_kmeans(model, X, 10, 0.5, true, true); // initial optimisation using 10 iterations of k-means
    MOG_diag_ML(model, X, 10, 0.0, 0.0, true);      // final optimisation using 10 iterations of ML version of EM
    double avg = model.avg_log_lhood(X);            // find the average log likelihood of X

See also the tutorial section for a more elaborate example.

Function Documentation

void itpp::MOG_diag_kmeans ( MOG_diag &  model_in,
Array< vec > &  X_in,
int  max_iter_in = 10,
double  trust_in = 0.5,
bool  normalise_in = true,
bool  verbose_in = false 

Conrad Sanderson
K-means based optimisation (training) of the parameters of an instance of the MOG_diag class. The obtained parameters are typically used as a seed by MOG_diag_ML().

model_in The model to optimise
X_in The training data
max_iter_in Maximum number of iterations. Default is 10.
trust_in The trust factor, where 0 <= trust_in <= 1. Default is 0.5.
normalise_in Use normalised distance measure (in effect). Default is true.
verbose_in Whether to print progress. Default is false.
The higher the trust factor, the more we trust the estimates of covariance matrices and weights. Set this to 1.0 only if you have plenty of training data. One rule of thumb is to have 10*D vectors per Gaussian, where D is the dimensionality of the vectors. For smaller amounts of data, a lower trust factor will help (but not completely avoid) the EM algorithm ( used in MOG_diag_ML() ) from getting stuck in a local minimum.

Setting normalise_in to true causes the the training data to be normalised to zero mean and unit variance prior to running the k-means algorithm. The data is unnormalised before returning. The normalisation helps clustering when the range of values varies greatly between dimensions. e.g. dimension 1 may have values in the [-1,+1] interval, while dimension 2 may have values in the [-100,+100] interval. Without normalisation, the distance between vectors is dominated by dimension 2.

void itpp::MOG_diag_ML ( MOG_diag &  model_in,
Array< vec > &  X_in,
int  max_iter_in = 10,
double  var_floor_in = 0.0,
double  weight_floor_in = 0.0,
bool  verbose_in = false 

Conrad Sanderson
Maximum Likelihood Expectation Maximisation based optimisation of the parameters of an instance of the MOG_diag class. The seed values (starting points) are typically first obtained via MOG_diag_kmeans(). See [CSB06] and the references therein for detailed mathematical descriptions.

model_in The model to optimise (MOG_diag)
X_in The training data (array of vectors)
max_iter_in Maximum number of iterations. Default is 10.
var_floor_in Variance floor (lowest allowable variance). Default is 0.0 (but see the note below)
weight_floor_in Weight floor (lowest allowable weight). Default is 0.0 (but see the note below)
verbose_in Whether progress in printed. Default is false.
The variance and weight floors are set to std::numeric_limits<double>::min() if they are below that value. As such, they are machine dependant. The largest allowable weight floor is 1/K, where K is the number of Gaussians.

