GPmix.misc

Functions

davies_bouldin_score(fd, y)

Compute the Davies-Bouldin score for clustering results on functional data.

estimate_nclusters(fdata[, ncluster_grid])

Employs a systematic search to identify the number of clusters that minimize the Akaike Information Criterion (AIC) or the Bayesian Information Criterion (BIC).

gmms_fit_plot_(weights, means, stdev[, ax])

Plot Gaussian mixture model (GMM) density curves.

hybrid_representative_selection(data, p, p1)

Select representative samples from a large dataset using a hybrid approach combining random sampling and KMeans clustering.

match_labels(cluster_labels, ...)

Permute cluster labels to match specified true class labels.

silhouette_score(fd, y, **kwargs)

Compute the silhouette score for clustering results on functional data.

GPmix.misc.davies_bouldin_score(fd, y)[source]

Compute the Davies-Bouldin score for clustering results on functional data.

Parameters:
  • fd (skfda.FDataGrid) – Functional data object containing the data matrix.

  • y (array-like) – Cluster labels for each sample.

Returns:

Davies-Bouldin score.

Return type:

float

GPmix.misc.estimate_nclusters(fdata, ncluster_grid=None)[source]

Employs a systematic search to identify the number of clusters that minimize the Akaike Information Criterion (AIC) or the Bayesian Information Criterion (BIC).

Parameters:
  • fdata (skfda.FDataGrid) – The functional dataset for which the number of clusters is to be estimated.

  • ncluster_grid (array-like, optional) – List or array specifying the grid within which the number of clusters is searched. Defaults to range(2, 15).

Returns:

The estimated number of clusters in the functional dataset.

Return type:

int

GPmix.misc.gmms_fit_plot_(weights, means, stdev, ax=None, **kwargs)[source]

Plot Gaussian mixture model (GMM) density curves.

Parameters:
  • weights (array-like) – Weights of each Gaussian component.

  • means (array-like) – Means of each Gaussian component.

  • stdev (array-like) – Standard deviations of each Gaussian component.

  • ax (matplotlib.axes.Axes, optional) – Axis to plot on. If None, uses the current axis.

  • **kwargs (dict) – Additional keyword arguments passed to matplotlib plot.

Return type:

None

GPmix.misc.hybrid_representative_selection(data, p, p1)[source]

Select representative samples from a large dataset using a hybrid approach combining random sampling and KMeans clustering.

This method first randomly samples a proportion p of the data, then applies KMeans clustering to this subset to select a smaller set of representative samples.

Parameters:
  • data (skfda.FDataGrid or np.ndarray) – Functional dataset or array to sample from.

  • p (float) – Proportion of data to randomly sample (0 < p < 1).

  • p1 (float) – Proportion of data to use as representative samples (0 < p1 < p).

Returns:

Representative samples selected by KMeans clustering.

Return type:

skfda.FDataGrid or np.ndarray

Raises:
GPmix.misc.match_labels(cluster_labels, true_class_labels, cluster_class_labels_perm)[source]

Permute cluster labels to match specified true class labels.

Parameters:
  • cluster_labels (array-like) – Cluster labels assigned by a clustering algorithm.

  • true_class_labels (array-like) – True class labels to match.

  • cluster_class_labels_perm (array-like) – Permutation of cluster labels to match true labels.

Returns:

Array of matched labels.

Return type:

np.ndarray

GPmix.misc.silhouette_score(fd, y, **kwargs)[source]

Compute the silhouette score for clustering results on functional data.

Parameters:
  • fd (skfda.FDataGrid) – Functional data object containing the data matrix.

  • y (array-like) – Cluster labels for each sample.

  • **kwargs (dict) – Additional keyword arguments passed to sklearn.metrics.silhouette_score.

Returns:

Silhouette score.

Return type:

float