GPmix.misc

Functions

`davies_bouldin_score`(fd, y)	Compute the Davies-Bouldin score for clustering results on functional data.
`estimate_nclusters`(fdata[, ncluster_grid])	Employs a systematic search to identify the number of clusters that minimize the Akaike Information Criterion (AIC) or the Bayesian Information Criterion (BIC).
`gmms_fit_plot_`(weights, means, stdev[, ax])	Plot Gaussian mixture model (GMM) density curves.
`hybrid_representative_selection`(data, p, p1)	Select representative samples from a large dataset using a hybrid approach combining random sampling and KMeans clustering.
`match_labels`(cluster_labels, ...)	Permute cluster labels to match specified true class labels.
`silhouette_score`(fd, y, **kwargs)	Compute the silhouette score for clustering results on functional data.

GPmix.misc.davies_bouldin_score(fd, y)[source]

Compute the Davies-Bouldin score for clustering results on functional data.

Parameters:

fd (skfda.FDataGrid) – Functional data object containing the data matrix.
y (array-like) – Cluster labels for each sample.

Returns:

Davies-Bouldin score.

Return type:

float

GPmix.misc.estimate_nclusters(fdata, ncluster_grid=None)[source]

Employs a systematic search to identify the number of clusters that minimize the Akaike Information Criterion (AIC) or the Bayesian Information Criterion (BIC).

Parameters:

fdata (skfda.FDataGrid) – The functional dataset for which the number of clusters is to be estimated.
ncluster_grid (array-like, optional) – List or array specifying the grid within which the number of clusters is searched. Defaults to range(2, 15).

Returns:

The estimated number of clusters in the functional dataset.

Return type:

int

GPmix.misc.gmms_fit_plot_(weights, means, stdev, ax=None, **kwargs)[source]

Plot Gaussian mixture model (GMM) density curves.

Parameters:

weights (array-like) – Weights of each Gaussian component.
means (array-like) – Means of each Gaussian component.
stdev (array-like) – Standard deviations of each Gaussian component.
ax (matplotlib.axes.Axes, optional) – Axis to plot on. If None, uses the current axis.
**kwargs (dict) – Additional keyword arguments passed to matplotlib plot.

Return type:

None

GPmix.misc.hybrid_representative_selection(data, p, p1)[source]

Select representative samples from a large dataset using a hybrid approach combining random sampling and KMeans clustering.

This method first randomly samples a proportion p of the data, then applies KMeans clustering to this subset to select a smaller set of representative samples.

Parameters:

data (skfda.FDataGrid or np.ndarray) – Functional dataset or array to sample from.
p (float) – Proportion of data to randomly sample (0 < p < 1).
p1 (float) – Proportion of data to use as representative samples (0 < p1 < p).

Returns:

Representative samples selected by KMeans clustering.

Return type:

skfda.FDataGrid or np.ndarray

Raises:

AssertionError – If p1 is not less than p.
ValueError – If data is not of type skfda.FDataGrid or np.ndarray.

GPmix.misc.match_labels(cluster_labels, true_class_labels, cluster_class_labels_perm)[source]

Permute cluster labels to match specified true class labels.

Parameters:

cluster_labels (array-like) – Cluster labels assigned by a clustering algorithm.
true_class_labels (array-like) – True class labels to match.
cluster_class_labels_perm (array-like) – Permutation of cluster labels to match true labels.

Returns:

Array of matched labels.

Return type:

np.ndarray

GPmix.misc.silhouette_score(fd, y, **kwargs)[source]

Compute the silhouette score for clustering results on functional data.

Parameters:

fd (skfda.FDataGrid) – Functional data object containing the data matrix.
y (array-like) – Cluster labels for each sample.
**kwargs (dict) – Additional keyword arguments passed to sklearn.metrics.silhouette_score.

Returns:

Silhouette score.

Return type:

float