pyagc.clusters.TorchKMeans

class TorchKMeans(metric: str = 'euclidean', init: Union[str, Tensor] = 'k-means++', random_state: Optional[int] = None, n_clusters: int = 8, n_init: int = 10, max_iter: int = 300, tol: float = 0.0001, distributed: bool = False, verbose: bool = False)[source]

Bases: object

A PyTorch-based KMeans clustering implementation supporting both Euclidean and Cosine distance metrics, with optional distributed training. This implementation is adapted from: Hzzone/torch_clustering.

Parameters:
  • metric (str, optional) – Distance metric to use: 'euclidean' or 'cosine'. (default: 'euclidean')

  • init (str or torch.Tensor, optional) – Method for initialization: 'k-means++', 'random' or user-specified tensor of shape (n_clusters, n_features). (default: 'k-means++')

  • random_state (int, optional) – Random seed for initialization. (default: None)

  • n_clusters (int, optional) – Number of clusters. (default: 8)

  • n_init (int, optional) – Number of times the algorithm will be run with different centroid seeds. (default: 10)

  • max_iter (int, optional) – Maximum number of iterations of the k-means algorithm for a single run. (default: 300)

  • tol (float, optional) – Relative tolerance with regards to inertia to declare convergence. (default: 1e-4)

  • distributed (bool, optional) – Whether to use distributed training. (default: False)

  • verbose (bool, optional) – Whether to print progress information. (default: False)

__init__(metric: str = 'euclidean', init: Union[str, Tensor] = 'k-means++', random_state: Optional[int] = None, n_clusters: int = 8, n_init: int = 10, max_iter: int = 300, tol: float = 0.0001, distributed: bool = False, verbose: bool = False)[source]

Methods

__init__([metric, init, random_state, ...])

fit_predict(X)

Performs k-means clustering on the input data and returns cluster labels.

initialize(X, random_state)

Initializes the cluster centers.

predict(X[, soft])

Assigns samples to clusters based on fixed cluster centers.

initialize(X: Tensor, random_state: int) Tensor[source]

Initializes the cluster centers.

Parameters:
  • X (torch.Tensor) – The input data of shape (n_samples, n_features).

  • random_state (int) – The random seed.

Returns:

Tensor – Initialized cluster centers of shape (n_clusters, n_features).

fit_predict(X: Tensor) Tensor[source]

Performs k-means clustering on the input data and returns cluster labels.

Parameters:

X (torch.Tensor) – The input data of shape (n_samples, n_features).

Returns:

Tensor – Cluster assignments of shape (n_samples,).

predict(X: Tensor, soft: bool = False) Tensor[source]

Assigns samples to clusters based on fixed cluster centers.

This function computes the squared Euclidean distance to each center and returns either hard assignments or soft probabilities.

Parameters:
  • X (torch.Tensor) – Input tensor of shape (n_samples, n_features).

  • soft (bool, optional) – If True, returns the soft assignment matrix; if False, returns hard cluster assignments. (default: False)

Returns:

Tensor –:

  • If soft is False, (n_samples,) tensor of cluster indices.

  • If soft is True, (n_samples, n_clusters) tensor of probabilities.