GPmix.UniGaussianMixtureEnsemble

class GPmix.UniGaussianMixtureEnsemble(n_clusters, init_method='kmeans', n_init=10, mom_epsilon=0.05)[source]

Bases: object

Consensus clustering using an ensemble of univariate Gaussian Mixture Models (GMMs).

This class fits univariate GMMs to multiple one-dimensional projections of the data, computes base clusterings, and combines them into a consensus clustering using spectral clustering on an affinity matrix built from binary membership matrices. Base clusterings are weighted by an estimate of their total misclassification probability.

Parameters:

n_clusters (int) – Number of mixture components (clusters) to fit in each GMM and in the consensus clustering.
init_method ({"kmeans", "k-means++", "random", "random_from_data", "mom"}, optional) – Initialization method for GMM parameters. Default is "kmeans". The "mom" option uses method-of-moments initialization.
n_init (int, optional) – Number of initializations to perform for each GMM fit. The best result is kept. Default is 10.
mom_epsilon (float, optional) – Lower bound for GMM weights (and related constraints) when using init_method="mom". Ignored otherwise. Default is 5e-2.

n_projs

Number of projections (base clusterings).

Type:: int

data_size

Number of samples in the data.

Type:: int

gmms

Fitted univariate GMMs for each projection.

Type:: tuple of sklearn.mixture.GaussianMixture

MoM_res

If init_method == 'mom', the method-of-moments solver results for each projection.

Type:: tuple

clustering_weights_

Weights assigned to each base clustering.

Type:: ndarray of shape (n_projs,)

labels_

Cluster labels assigned by the consensus clustering.

Type:: ndarray of shape (n_samples,)

max_cca_labels_

Permutation of predicted labels that yields the highest classification accuracy when compared to ground truth.

Type:: tuple

Notes

The affinity matrix is constructed as a weighted sum of outer products of binary membership matrices (one per projection), where the weights are proportional to the inverse of each GMM’s estimated total misclassification probability.

adjusted_mutual_info_score(true_labels)[source]

Return type:: float

adjusted_rand_score(true_labels)[source]

Return type:: float

binary_membership_matrix()[source]

Construct a binary membership indicator matrices from the cluster membership matrice.

Return type:: ndarray

correct_classification_accuracy(true_labels)[source]

Return type:: float

davies_bouldin_score(fdata)[source]

Parameters:: fdata (FDataGrid)

fit_gmms(projs_coeffs, n_jobs=-1, **kwargs)[source]

Fit projection coefficients to univariate Gaussian mixture models

Parameters:

projs_coeffs (array-like of shape (number of projections, number of samples)) – array of projection coefficients to fit to univariate GMMs.
kwargs – keyword arguments for joblib Parallel

fuzzy_membership_matrix()[source]

Construct the cluster membership matrices from GMM fits.

Return type:: ndarray

get_affinity_matrix(weighted_sum, precompute_gmms=None)[source]

Construct affinity matrix using binary membership matrices and clustering weights

Parameters:: weighted_sum (bool)
Return type:: ndarray

get_clustering(weighted_sum=True, precompute_gmms=None, **kwargs)[source]

Obtain the consensus clustering via Spectral clustering of Affinity matrix

Parameters:: weighted_sum (bool)
Return type:: ndarray

get_clustering_weights(weighted_sum, precompute_gmms=None)[source]

Compute weights for base clusterings

Return type:: ndarray

get_omega_map(weights, means, vars)[source]

Construct matrix of misclassification probabilities

Return type:: ndarray

get_omega_prob(dist_a, dist_b)[source]

Construct misclassification probability omega_{b|a} for univariate GMMs

Parameters:: dist_* (array-like of shape (3,)) – THe parameters of the mixture component *: [weight, mean, variance]
Return type:: float

get_total_omega(weights, means, vars, weighted_sum)[source]

Compute total misclassification probability for univariate GMM

Return type:: float

gmm_with_MoM_inits(data)[source]

Fit gmms with initialization from method of moment estimation

Parameters:: data (ndarray)

plot_clustering(fdata)[source]

Parameters:: fdata (FDataGrid)

plot_gmms(ncols=4, fontsize=12, fig_kws={}, **kwargs)[source]

Visualize GMM fits

Parameters:

ncols (int, optional) – Number of columns in the plot grid. Default is 4.
fontsize (int, optional) – Font size for axis labels. Default is 12.
fig_kws (dict, optional) – Additional keyword arguments for figure creation. Default is an empty dict.
kwargs (dict, optional) – Additional keyword arguments for seaborn’s histplot function. Default is an empty dict.

silhouette_score(fdata)[source]

Parameters:: fdata (FDataGrid)