pyagc.clusters.TorchKMeans

class TorchKMeans(metric: str = 'euclidean', init: Union[str, Tensor] = 'k-means++', random_state: Optional[int] = None, n_clusters: int = 8, n_init: int = 10, max_iter: int = 300, tol: float = 0.0001, distributed: bool = False, verbose: bool = False)[source]

Bases: object

A PyTorch-based KMeans clustering implementation supporting both Euclidean and Cosine distance metrics, with optional distributed training. This implementation is adapted from: Hzzone/torch_clustering.

Parameters:

metric (str, optional) – Distance metric to use: 'euclidean' or 'cosine'. (default: 'euclidean')
init (str or torch.Tensor, optional) – Method for initialization: 'k-means++', 'random' or user-specified tensor of shape (n_clusters, n_features). (default: 'k-means++')
random_state (int, optional) – Random seed for initialization. (default: None)
n_clusters (int, optional) – Number of clusters. (default: 8)
n_init (int, optional) – Number of times the algorithm will be run with different centroid seeds. (default: 10)
max_iter (int, optional) – Maximum number of iterations of the k-means algorithm for a single run. (default: 300)
tol (float, optional) – Relative tolerance with regards to inertia to declare convergence. (default: 1e-4)
distributed (bool, optional) – Whether to use distributed training. (default: False)
verbose (bool, optional) – Whether to print progress information. (default: False)

__init__(metric: str = 'euclidean', init: Union[str, Tensor] = 'k-means++', random_state: Optional[int] = None, n_clusters: int = 8, n_init: int = 10, max_iter: int = 300, tol: float = 0.0001, distributed: bool = False, verbose: bool = False)[source]

Methods

`__init__`([metric, init, random_state, ...])
`fit_predict`(X)	Performs k-means clustering on the input data and returns cluster labels.
`initialize`(X, random_state)	Initializes the cluster centers.
`predict`(X[, soft])	Assigns samples to clusters based on fixed cluster centers.

initialize(X: Tensor, random_state: int) → Tensor[source]

Initializes the cluster centers.

Parameters:

X (torch.Tensor) – The input data of shape (n_samples, n_features).
random_state (int) – The random seed.

Returns:

Tensor – Initialized cluster centers of shape (n_clusters, n_features).

fit_predict(X: Tensor) → Tensor[source]

Performs k-means clustering on the input data and returns cluster labels.

Parameters:: X (torch.Tensor) – The input data of shape (n_samples, n_features).
Returns:: Tensor – Cluster assignments of shape (n_samples,).

predict(X: Tensor, soft: bool = False) → Tensor[source]

Assigns samples to clusters based on fixed cluster centers.

This function computes the squared Euclidean distance to each center and returns either hard assignments or soft probabilities.

Parameters:

X (torch.Tensor) – Input tensor of shape (n_samples, n_features).
soft (bool, optional) – If True, returns the soft assignment matrix; if False, returns hard cluster assignments. (default: False)

Returns:

Tensor –:

If soft is False, (n_samples,) tensor of cluster indices.
If soft is True, (n_samples, n_clusters) tensor of probabilities.