GPmix.misc
Functions
|
Compute the Davies-Bouldin score for clustering results on functional data. |
|
Employs a systematic search to identify the number of clusters that minimize the Akaike Information Criterion (AIC) or the Bayesian Information Criterion (BIC). |
|
Plot Gaussian mixture model (GMM) density curves. |
|
Select representative samples from a large dataset using a hybrid approach combining random sampling and KMeans clustering. |
|
Permute cluster labels to match specified true class labels. |
|
Compute the silhouette score for clustering results on functional data. |
- GPmix.misc.davies_bouldin_score(fd, y)[source]
Compute the Davies-Bouldin score for clustering results on functional data.
- Parameters:
fd (skfda.FDataGrid) – Functional data object containing the data matrix.
y (array-like) – Cluster labels for each sample.
- Returns:
Davies-Bouldin score.
- Return type:
- GPmix.misc.estimate_nclusters(fdata, ncluster_grid=None)[source]
Employs a systematic search to identify the number of clusters that minimize the Akaike Information Criterion (AIC) or the Bayesian Information Criterion (BIC).
- Parameters:
fdata (skfda.FDataGrid) – The functional dataset for which the number of clusters is to be estimated.
ncluster_grid (array-like, optional) – List or array specifying the grid within which the number of clusters is searched. Defaults to range(2, 15).
- Returns:
The estimated number of clusters in the functional dataset.
- Return type:
- GPmix.misc.gmms_fit_plot_(weights, means, stdev, ax=None, **kwargs)[source]
Plot Gaussian mixture model (GMM) density curves.
- Parameters:
weights (array-like) – Weights of each Gaussian component.
means (array-like) – Means of each Gaussian component.
stdev (array-like) – Standard deviations of each Gaussian component.
ax (matplotlib.axes.Axes, optional) – Axis to plot on. If None, uses the current axis.
**kwargs (dict) – Additional keyword arguments passed to matplotlib plot.
- Return type:
None
- GPmix.misc.hybrid_representative_selection(data, p, p1)[source]
Select representative samples from a large dataset using a hybrid approach combining random sampling and KMeans clustering.
This method first randomly samples a proportion p of the data, then applies KMeans clustering to this subset to select a smaller set of representative samples.
- Parameters:
- Returns:
Representative samples selected by KMeans clustering.
- Return type:
skfda.FDataGrid or np.ndarray
- Raises:
AssertionError – If p1 is not less than p.
ValueError – If data is not of type skfda.FDataGrid or np.ndarray.
- GPmix.misc.match_labels(cluster_labels, true_class_labels, cluster_class_labels_perm)[source]
Permute cluster labels to match specified true class labels.
- Parameters:
cluster_labels (array-like) – Cluster labels assigned by a clustering algorithm.
true_class_labels (array-like) – True class labels to match.
cluster_class_labels_perm (array-like) – Permutation of cluster labels to match true labels.
- Returns:
Array of matched labels.
- Return type:
np.ndarray