GPmix.UniGaussianMixtureEnsemble

class GPmix.UniGaussianMixtureEnsemble(n_clusters, init_method='kmeans', n_init=10, mom_epsilon=0.05)[source]

Bases: object

Consensus clustering using an ensemble of univariate Gaussian Mixture Models (GMMs).

This class fits univariate GMMs to multiple one-dimensional projections of the data, computes base clusterings, and combines them into a consensus clustering using spectral clustering on an affinity matrix built from binary membership matrices. Base clusterings are weighted by an estimate of their total misclassification probability.

Parameters:
  • n_clusters (int) – Number of mixture components (clusters) to fit in each GMM and in the consensus clustering.

  • init_method ({"kmeans", "k-means++", "random", "random_from_data", "mom"}, optional) – Initialization method for GMM parameters. Default is "kmeans". The "mom" option uses method-of-moments initialization.

  • n_init (int, optional) – Number of initializations to perform for each GMM fit. The best result is kept. Default is 10.

  • mom_epsilon (float, optional) – Lower bound for GMM weights (and related constraints) when using init_method="mom". Ignored otherwise. Default is 5e-2.

n_projs

Number of projections (base clusterings).

Type:

int

data_size

Number of samples in the data.

Type:

int

gmms

Fitted univariate GMMs for each projection.

Type:

tuple of sklearn.mixture.GaussianMixture

MoM_res

If init_method == 'mom', the method-of-moments solver results for each projection.

Type:

tuple

clustering_weights_

Weights assigned to each base clustering.

Type:

ndarray of shape (n_projs,)

labels_

Cluster labels assigned by the consensus clustering.

Type:

ndarray of shape (n_samples,)

max_cca_labels_

Permutation of predicted labels that yields the highest classification accuracy when compared to ground truth.

Type:

tuple

Notes

The affinity matrix is constructed as a weighted sum of outer products of binary membership matrices (one per projection), where the weights are proportional to the inverse of each GMM’s estimated total misclassification probability.

adjusted_mutual_info_score(true_labels)[source]
Return type:

float

adjusted_rand_score(true_labels)[source]
Return type:

float

binary_membership_matrix()[source]

Construct a binary membership indicator matrices from the cluster membership matrice.

Return type:

ndarray

correct_classification_accuracy(true_labels)[source]
Return type:

float

davies_bouldin_score(fdata)[source]
Parameters:

fdata (FDataGrid)

fit_gmms(projs_coeffs, n_jobs=-1, **kwargs)[source]

Fit projection coefficients to univariate Gaussian mixture models

Parameters:
  • projs_coeffs (array-like of shape (number of projections, number of samples)) – array of projection coefficients to fit to univariate GMMs.

  • kwargs – keyword arguments for joblib Parallel

fuzzy_membership_matrix()[source]

Construct the cluster membership matrices from GMM fits.

Return type:

ndarray

get_affinity_matrix(weighted_sum, precompute_gmms=None)[source]

Construct affinity matrix using binary membership matrices and clustering weights

Parameters:

weighted_sum (bool)

Return type:

ndarray

get_clustering(weighted_sum=True, precompute_gmms=None, **kwargs)[source]

Obtain the consensus clustering via Spectral clustering of Affinity matrix

Parameters:

weighted_sum (bool)

Return type:

ndarray

get_clustering_weights(weighted_sum, precompute_gmms=None)[source]

Compute weights for base clusterings

Return type:

ndarray

get_omega_map(weights, means, vars)[source]

Construct matrix of misclassification probabilities

Return type:

ndarray

get_omega_prob(dist_a, dist_b)[source]

Construct misclassification probability omega_{b|a} for univariate GMMs

Parameters:

dist_* (array-like of shape (3,)) – THe parameters of the mixture component *: [weight, mean, variance]

Return type:

float

get_total_omega(weights, means, vars, weighted_sum)[source]

Compute total misclassification probability for univariate GMM

Return type:

float

gmm_with_MoM_inits(data)[source]

Fit gmms with initialization from method of moment estimation

Parameters:

data (ndarray)

plot_clustering(fdata)[source]
Parameters:

fdata (FDataGrid)

plot_gmms(ncols=4, fontsize=12, fig_kws={}, **kwargs)[source]

Visualize GMM fits

Parameters:
  • ncols (int, optional) – Number of columns in the plot grid. Default is 4.

  • fontsize (int, optional) – Font size for axis labels. Default is 12.

  • fig_kws (dict, optional) – Additional keyword arguments for figure creation. Default is an empty dict.

  • kwargs (dict, optional) – Additional keyword arguments for seaborn’s histplot function. Default is an empty dict.

silhouette_score(fdata)[source]
Parameters:

fdata (FDataGrid)