pyagc.clusters

The pyagc.clusters package provides modular Cluster Head implementations for attributed graph clustering. Following the Encode-Cluster-Optimize framework, a cluster head transforms latent node embeddings \(\mathbf{Z} \in \mathbb{R}^{N \times H}\) into a cluster assignment matrix \(\mathbf{P} \in \mathbb{R}^{N \times K}\):

\[\mathbf{P} = \mathcal{C}(\mathbf{Z}; \Theta_{\mathcal{C}})\]

All cluster heads inherit from BaseClusterHead, ensuring a consistent interface for clustering, loss computation, and assignment retrieval. This modular design allows any encoder backbone to be easily paired with different clustering mechanisms — simply swap the cluster head to isolate the effect of the clustering objective:

from pyagc.clusters import DECClusterHead, DMoNClusterHead, KMeansClusterHead

# Differentiable prototypical clustering:
dec_head = DECClusterHead(n_clusters=7, n_features=256)

# Differentiable discriminative clustering:
dmon_head = DMoNClusterHead(n_clusters=7, n_features=256)

# Post-hoc discrete clustering:
kmeans_head = KMeansClusterHead(n_clusters=7, backend="torch")

Based on the differentiability of the cluster projection, which dictates the feasibility of end-to-end optimization, the cluster heads are organized into two categories:

Differentiable Cluster Heads treat clustering as a differentiable layer within a neural network, allowing gradients to flow from the clustering loss back to the encoder. This includes discriminative approaches (e.g., DMoNClusterHead, MinCutClusterHead, NeuromapClusterHead) and prototypical approaches (e.g., DECClusterHead, DinkClusterHead, INCClusterHead).
Discrete Cluster Heads apply a non-differentiable clustering algorithm to fixed representations, decoupling clustering from representation learning (e.g., KMeansClusterHead).

Base Class 

BaseClusterHead

Base class for clustering heads in neural clustering models.

class BaseClusterHead[source]

Bases: Module, ABC

Base class for clustering heads in neural clustering models.

abstract cluster(*args: Any, **kwargs: Any) → Tensor[source]

Predicts cluster assignments.

Returns:

Tensor –:

If soft=False, (n_samples,) tensor of cluster indices.
If soft=True, (n_samples, n_clusters) tensor of probabilities.

property predict: Alias for cluster().

abstract forward(*args: Any, **kwargs: Any) → Tensor[source]

Runs the forward pass of the module.

Return type:: Tensor

training: bool

Differentiable Cluster Heads 

`DECClusterHead`	Neural Clustering Layer proposed in the "Unsupervised Deep Embedding for Clustering Analysis" paper (Xie et al., ICML 2016).
`DinkClusterHead`	Neural Clustering Layer proposed in the "Dink-Net: Neural Clustering on Large Graphs" paper (Liu et al., ICML 2023).
`DMoNClusterHead`	Deep Modularity Network (DMoN) Clustering Head proposed in the "Graph Clustering with Graph Neural Networks" paper (Tsitsulin et al., JMLR 2023).
`INCClusterHead`	Neural Clustering Layer proposed in the "XAI Beyond Classification: Interpretable Neural Clustering" paper (Peng et al., JMLR 2022).
`MinCutClusterHead`	MinCut Clustering Head proposed in "Spectral Clustering in Graph Neural Networks for Graph Pooling" (Bianchi et al., ICML 2019).
`NeuromapClusterHead`	Neuromap Clustering Head from the paper "The Map Equation Goes Neural: Mapping Network Flows with Graph Neural Networks" paper (Blöcker et al., NeurIPS 2024).
`SBMClusterHead`	Stochastic Block Model (SBM) Clustering Head from the paper "Differentiable Community Detection with Graph Neural Networks and Stochastic Block Models" (Arliss & Mueller, LoG 2025).
`SBMMatchClusterHead`	Graph Matching SBM Clustering Head from the paper "Differentiable Community Detection with Graph Neural Networks and Stochastic Block Models" (Arliss & Mueller, LoG 2025).

class DECClusterHead(n_clusters: int, n_features: int, alpha: float = 1.0)[source]

Bases: BaseClusterHead

Neural Clustering Layer proposed in the “Unsupervised Deep Embedding for Clustering Analysis” paper (Xie et al., ICML 2016).

This layer learns cluster centers and computes soft assignment of input samples to clusters using Student’s t-distribution.

Specifically, the probability \(q_{ij}\) that a sample \(i\) belongs to cluster \(j\) is given by:

\[q_{ij} = \frac{(1 + \|z_i - \mu_j\|^2 / \alpha)^{-\frac{\alpha+1}{2}}} {\sum_{j'} (1 + \|z_i - \mu_{j'}\|^2 / \alpha)^{-\frac{\alpha+1}{2}}}\]

where \(z_i\) is the embedded point and \(\mu_j\) is the j-th cluster center, and \(\alpha\) is the degrees of freedom of the Student’s t-distribution (default is 1).

The target distribution \(p_{ij}\) is computed as:

\[p_{ij} = \frac{q_{ij}^2 / \sum_i q_{ij}}{\sum_j (q_{ij}^2 / \sum_i q_{ij})}\]

The loss is the KL divergence between the soft assignments \(q\) and the target distribution \(p\):

\[L = \text{KL}(P \| Q) = \sum_i \sum_j p_{ij} \log \frac{p_{ij}}{q_{ij}}\]

Parameters:

n_clusters (int) – Number of clusters.
n_features (int) – Feature dimension of the input.
alpha (float, optional) – Degrees of freedom for Student’s t-distribution. Default is 1.

reset_cluster_centers(cluster_centers: Optional[Tensor] = None) → None[source]

Manually sets the cluster centers.

Parameters:: cluster_centers (torch.Tensor, optional) – Tensor of shape (n_clusters, n_features) to initialize the cluster centers. If None, use Xavier uniform initialization.
Return type:: None

forward(z: Tensor, update_target: bool = True) → Tensor[source]

Computes the KL divergence loss between the soft assignments and the target distribution.

Parameters:

z (torch.Tensor) – Input tensor of shape (n_samples, n_features).
update_target (bool, optional) – Whether to recompute the target distribution P. If False, uses the cached distribution. This is useful for maintaining training stability by updating the target less frequently. (default: True)

Returns:

Tensor – Scalar loss tensor.

cluster(z: Tensor, soft: bool = False) → Tensor[source]

Predicts cluster assignments.

Parameters:

z (torch.Tensor) – Input tensor of shape (n_samples, n_features).
soft (bool, optional) – If True, returns the soft assignment matrix; if False, returns hard cluster assignments. (default: False)

Returns:

Tensor –:

If soft is False, (n_samples,) tensor of cluster indices.
If soft is True, (n_samples, n_clusters) tensor of probabilities.

training: bool

class DinkClusterHead(n_clusters: int, n_features: int)[source]

Bases: BaseClusterHead

Neural Clustering Layer proposed in the “Dink-Net: Neural Clustering on Large Graphs” paper (Liu et al., ICML 2023).

This layer models cluster centers as learnable parameters and optimizes clustering assignments via cluster dilation and shrink losses:

\[\mathcal{L}_\text{dilation} = \frac{-1}{(K-1)K} \sum_{i=1}^{K} \sum_{j=1, j\neq i}^{K} \| \mathbf{C}_i - \mathbf{C}_j \|_2^2\]

\[\mathcal{L}_\text{shrink} = \frac{1}{BK} \sum_{i=1}^{B} \sum_{j=1}^{K} \| \mathbf{Z}_i - \mathbf{C}_j \|_2^2\]

where \(\mathbf{C}_i\) is the i-th cluster center, \(\mathbf{Z}_i\) is the i-th embedding in a batch, \(B\) is the batch size, and \(K\) is the number of clusters. The cluster dilation loss pushes cluster centers away from each other to expand the clustering space, while the cluster shrink loss pulls node embeddings toward all cluster centers to avoid confirmation bias.

Parameters:

n_clusters (int) – Number of clusters.
n_features (int) – Feature dimension of the input.

reset_cluster_centers(cluster_centers: Optional[Tensor] = None) → None[source]

Manually sets the cluster centers.

Parameters:: cluster_centers (torch.Tensor, optional) – Tensor of shape (n_clusters, n_features) to initialize the cluster centers. If None, use Xavier uniform initialization.
Return type:: None

forward(z: Tensor) → Tuple[Tensor, Tensor][source]

Compute the clustering loss.

Parameters:: z (torch.Tensor) – Input tensor of shape (n_samples, n_features).
Returns:: dilation_loss and shrink_loss
Return type:: Tuple[torch.Tensor, torch.Tensor]

cluster(z: Tensor, soft: bool = False) → Tensor[source]

Predicts cluster assignments.

Parameters:

z (torch.Tensor) – Input tensor of shape (n_samples, n_features).
soft (bool, optional) – If True, returns the soft assignment matrix; if False, returns hard cluster assignments. (default: False)

Returns:

Tensor –:

If soft is False, (n_samples,) tensor of cluster indices.
If soft is True, (n_samples, n_clusters) tensor of probabilities.

training: bool

class DMoNClusterHead(n_clusters: int, n_features: int)[source]

Bases: BaseClusterHead

Deep Modularity Network (DMoN) Clustering Head proposed in the “Graph Clustering with Graph Neural Networks” paper (Tsitsulin et al., JMLR 2023).

This layer learns a soft cluster assignment matrix \(\mathbf{S}\) by projecting node embeddings \(\mathbf{Z}\) into \(K\) clusters using a linear transformation followed by a softmax. It optimizes the clustering structure with two objectives:

(1) Spectral modularity loss:

\[\mathcal{L}_s = - \frac{1}{2m} \mathrm{Tr}(\mathbf{S}^\top \mathbf{B} \mathbf{S})\]

where \(\mathbf{B} = \mathbf{A} - \frac{\mathbf{d}\mathbf{d}^\top}{2m}\) is the modularity matrix, and \(m\) is the total number of edges.

(2) Collapse regularization loss:

\[\mathcal{L}_c = \frac{\sqrt{K}}{N} \left\| \sum_i \mathbf{S}_i^\top \right\|_F - 1\]

which prevents unbalanced cluster sizes.

Parameters:

n_clusters (int) – Number of clusters.
n_features (int) – Feature dimension of input node embeddings.

reset_cluster_centers(cluster_centers: Optional[Tensor] = None) → None[source]

Manually sets the cluster centers.

Parameters:: cluster_centers (torch.Tensor, optional) – Tensor of shape (n_clusters, n_features) to initialize the cluster centers. If None, use Xavier uniform initialization.
Return type:: None

forward(z: Tensor, edge_index: Tensor) → Tuple[Tensor, Tensor][source]

Computes DMoN clustering objectives using node embeddings and graph structure.

Parameters:

z (torch.Tensor) – Node embeddings of shape (N, F).
edge_index (torch.Tensor) – Edge indices of shape (2, E).

Returns:

modularity_loss and collapse_loss

Return type:

Tuple[torch.Tensor, torch.Tensor]

cluster(z: Tensor, soft: bool = False) → Tensor[source]

Predicts cluster assignments.

Parameters:

z (torch.Tensor) – Input tensor of shape (n_samples, n_features).
soft (bool, optional) – If True, returns the soft assignment matrix; if False, returns hard cluster assignments. (default: False)

Returns:

Tensor –:

If soft is False, (n_samples,) tensor of cluster indices.
If soft is True, (n_samples, n_clusters) tensor of probabilities.

training: bool

class INCClusterHead(n_clusters: int, n_features: int, alpha: float = 0.001)[source]

Bases: BaseClusterHead

Neural Clustering Layer proposed in the “XAI Beyond Classification: Interpretable Neural Clustering” paper (Peng et al., JMLR 2022).

This layer implements a differentiable k-means reformulation, where cluster centers are learnable parameters.

The cluster centers \(\mathbf{\Omega}_j\) and weights \(\mathbf{W}_j\) are related via:

\[\mathbf{W}_j = 2 \alpha \mathbf{\Omega}_j\]

The loss is defined as:

\[L = \frac{1}{\alpha} \sum_{i} (2\alpha - \mathbf{W}_{c(i)}^T \mathbf{z}_i)\]

where \(c(i)\) is the nearest cluster to sample \(i\).

Parameters:

n_clusters (int) – Number of clusters.
n_features (int) – Feature dimension of the input.
alpha (float) – Scaling factor for cluster centers. Default is 0.001.

reset_cluster_centers(cluster_centers: Optional[Tensor] = None) → None[source]

Manually sets the cluster centers.

Parameters:: cluster_centers (torch.Tensor, optional) – Tensor of shape (n_clusters, n_features) to initialize the cluster centers. If None, use Xavier uniform initialization.
Return type:: None

normalize_cluster_centers() → None[source]

Normalize cluster centers and apply alpha scaling.

Return type:: None

compute_cluster_center() → Tensor[source]

Compute the actual cluster centers (scaled by alpha).

Returns:: Tensor – Scaled cluster centers.

forward(z: Tensor) → Tensor[source]

Compute the clustering loss.

Parameters:: z (torch.Tensor) – Input tensor of shape (n_samples, n_features).
Returns:: Tensor – Scalar loss tensor.

cluster(z: Tensor, soft: bool = False) → Tensor[source]

Predicts cluster assignments.

Parameters:

z (torch.Tensor) – Input tensor of shape (n_samples, n_features).
soft (bool, optional) – If True, returns the soft assignment matrix; if False, returns hard cluster assignments. (default: False)

Returns:

Tensor –:

If soft is False, (n_samples,) tensor of cluster indices.
If soft is True, (n_samples, n_clusters) tensor of probabilities.

gradient_normalize() → None[source]

Normalize the gradient of the cluster centers.

Return type:: None

training: bool

class MinCutClusterHead(n_clusters: int, n_features: int, temperature: float = 1.0)[source]

Bases: BaseClusterHead

MinCut Clustering Head proposed in “Spectral Clustering in Graph Neural Networks for Graph Pooling” (Bianchi et al., ICML 2019).

This layer learns a soft cluster assignment matrix \(\mathbf{S}\) by projecting node embeddings \(\mathbf{Z}\) into \(K\) clusters. It jointly optimizes two objectives:

(1) MinCut loss:

\[\mathcal{L}_{\text{mincut}} = - \frac{\mathrm{Tr}(\mathbf{S}^\top \mathbf{A} \mathbf{S})} {\mathrm{Tr}(\mathbf{S}^\top \mathbf{D} \mathbf{S})}\]

where \(\mathbf{D}\) is the degree matrix.

(2) Orthogonality loss:

\[\mathcal{L}_{\text{ortho}} = \left\| \frac{\mathbf{S}^\top \mathbf{S}}{\|\mathbf{S}^\top \mathbf{S}\|_F} - \frac{\mathbf{I}_K}{\sqrt{K}} \right\|_F\]

which encourages near-orthogonal cluster assignments.

Parameters:

n_clusters (int) – Number of clusters \(K\).
n_features (int) – Feature dimension of node embeddings \(F\).
temperature (float, optional) – Softmax temperature. (default: 1.0)

reset_cluster_centers(cluster_centers: Optional[Tensor] = None) → None[source]

Manually sets the cluster centers.

Parameters:: cluster_centers (torch.Tensor, optional) – Tensor of shape (n_clusters, n_features) to initialize the cluster centers. If None, use Xavier uniform initialization.
Return type:: None

forward(z: Tensor, edge_index: Tensor) → Tuple[Tensor, Tensor][source]

Compute MinCut and Orthogonality losses given node embeddings and graph structure.

Parameters:

z (torch.Tensor) – Node embeddings \((N, F)\).
edge_index (torch.Tensor) – Edge indices \((2, E)\).

Returns:

mincut_loss and ortho_loss

Return type:

Tuple[torch.Tensor, torch.Tensor]

cluster(z: Tensor, soft: bool = False) → Tensor[source]

Predict cluster assignments.

Parameters:

z (torch.Tensor) – Node embeddings of shape (N, F).
soft (bool, optional) – If True, return soft assignment probabilities.

Returns:

Hard cluster indices or soft assignment matrix.

Return type:

torch.Tensor

training: bool

class NeuromapClusterHead(n_clusters: int, n_features: int, alpha: float = 0.15, n_iters: int = 100)[source]

Bases: BaseClusterHead

Neuromap Clustering Head from the paper “The Map Equation Goes Neural: Mapping Network Flows with Graph Neural Networks” paper (Blöcker et al., NeurIPS 2024).

This module implements a differentiable version of the map equation for end-to-end optimization with (graph) neural networks.

It learns soft cluster assignments \(\mathbf{S}\) via a linear projection from node embeddings \(\mathbf{Z}\), and computes the Neuromap loss (expected per-step description length) following the Minimum Description Length principle:

\[\mathcal{L}(A, S) = q \log q - \sum_m q_m \log q_m - \sum_m m_{\text{exit}} \log m_{\text{exit}} - \sum_u p_u \log p_u + \sum_m p_m \log p_m\]

where all quantities are computed from the soft cluster assignment matrix.

Parameters:

n_clusters (int) – Maximum number of clusters.
n_features (int) – Feature dimension of input node embeddings.
alpha (float, optional) – Teleportation probability for flow. Default: 0.15.
n_iters (int, optional) – Number of power iterations for stationary distribution. Default: 100.

reset_cluster_centers(cluster_centers: Optional[Tensor] = None) → None[source]

Manually sets the cluster centers.

Parameters:: cluster_centers (torch.Tensor, optional) – Tensor of shape (n_clusters, n_features) to initialize the cluster centers. If None, use Xavier uniform initialization.
Return type:: None

build_flow(edge_index: Tensor, N: int)[source]: Construct sparse flow matrix F and stationary distribution p.

forward0(z: Tensor, edge_index: Tensor) → Tensor[source]

Compute the Neuromap (Map Equation) loss given node embeddings and adjacency.

Parameters:

z (torch.Tensor) – Node embeddings of shape (N, F).
edge_index (torch.Tensor) – Edge indices of shape (2, E).

Returns:

Map equation loss (codelength)

Return type:

torch.Tensor

forward(z: Tensor, edge_index: Tensor) → Tuple[Tensor, Tensor][source]

Compute the Neuromap (Map Equation) loss given node embeddings and adjacency.

Parameters:

z (torch.Tensor) – Node embeddings of shape (N, F).
edge_index (torch.Tensor) – Edge indices of shape (2, E).

Returns:

Map equation loss (codelength) and collapse_loss

Return type:

Tuple[torch.Tensor, torch.Tensor]

cluster(z: Tensor, soft: bool = False) → Tensor[source]

Predicts cluster assignments.

Parameters:

z (torch.Tensor) – Input tensor of shape (n_samples, n_features).
soft (bool, optional) – If True, returns the soft assignment matrix; if False, returns hard cluster assignments. (default: False)

Returns:

Tensor –:

If soft is False, (n_samples,) tensor of cluster indices.
If soft is True, (n_samples, n_clusters) tensor of probabilities.

training: bool

class SBMClusterHead(n_clusters: int, n_features: int, variant: str = 'bernoulli', eta: float = 3.0)[source]

Bases: BaseClusterHead

Stochastic Block Model (SBM) Clustering Head from the paper “Differentiable Community Detection with Graph Neural Networks and Stochastic Block Models” (Arliss & Mueller, LoG 2025).

This head learns cluster assignments by maximizing the likelihood of an SBM-based generative model. It supports both Bernoulli and Poisson variants, with optional degree correction.

The cluster assignment matrix \(\mathbf{P} \in [0,1]^{N \times K}\) is obtained via softmax transformation of the similarity between node embeddings and learnable cluster centers, and the structure matrix \(\mathbf{\Theta} \in \mathbb{R}^{K \times K}\) is estimated via MLE as:

\[\hat{\Theta}_{ij} = \frac{M_{ij}}{n_i n_j}\]

where \(M_{ij}\) is the number of edges between communities \(i\) and \(j\), and \(n_i\) is the number of nodes in community \(i\).

Loss Functions:

(1) Bernoulli SBM:

\[\mathcal{L}_B = -\sum_{(u,v) \in E} \ln(\pi_{uv}) - \eta^{-1} \sum_{(u,v) \notin E} \ln(1 - \pi_{uv})\]

where \(\pi_{uv} = \mathbf{P}_u^T \hat{\Theta} \mathbf{P}_v\).

(2) Poisson SBM:

\[\mathcal{L}_P = -\sum_{(u,v) \in E} [\ln(\pi_{uv}) - \pi_{uv}] + \eta^{-1} \sum_{(u,v) \notin E} \pi_{uv}\]

(3) Degree-Corrected variants:

For degree correction, the expected value becomes \(\phi_u \phi_v \mathbf{P}_u^T \hat{\Theta} \mathbf{P}_v\), where:

\[\hat{\phi}_u = (\mathbf{P}_u^T \mathbf{n}) \frac{d_u}{\mathbf{P}_u^T \boldsymbol{\delta}}\]

with \(\boldsymbol{\delta}\) being the sum of degrees in each community.

Parameters:

n_clusters (int) – Number of clusters.
n_features (int) – Feature dimension of input node embeddings.
variant (str, optional) – SBM variant to use. Options: 'bernoulli', 'poisson', 'bernoulli-dc', 'poisson-dc'. (default: 'bernoulli')
eta (float, optional) – Negative sampling ratio (number of negative samples per positive edge). (default: 3.0)

reset_cluster_centers(cluster_centers: Optional[Tensor] = None) → None[source]

Manually sets the cluster centers.

Parameters:: cluster_centers (torch.Tensor, optional) – Tensor of shape (n_clusters, n_features) to initialize the cluster centers. If None, use Xavier uniform initialization.
Return type:: None

forward(z: Tensor, edge_index: Tensor, num_neg_samples: Optional[int] = None) → Tuple[Tensor, Tensor][source]

Computes the SBM loss.

Parameters:

z (torch.Tensor) – Node embeddings of shape (N, F).
edge_index (torch.Tensor) – Edge indices of shape (2, E).
num_neg_samples (int, optional) – Number of negative samples. If None, uses eta * num_edges. (default: None)

Returns:

Tuple[Tensor, Tensor] – Tuple of (likelihood_loss, regularization_loss).

cluster(z: Tensor, soft: bool = False) → Tensor[source]

Predicts cluster assignments.

Parameters:

z (torch.Tensor) – Node embeddings of shape (N, F).
soft (bool, optional) – If True, returns soft assignments; otherwise hard assignments. (default: False)

Returns:

Tensor – Cluster assignments.

training: bool

class SBMMatchClusterHead(n_clusters: int, n_features: int)[source]

Bases: BaseClusterHead

Graph Matching SBM Clustering Head from the paper “Differentiable Community Detection with Graph Neural Networks and Stochastic Block Models” (Arliss & Mueller, LoG 2025).

This variant uses the Graph Matching objective, which aligns the graph with its community representation by minimizing:

\[\mathcal{L}_{Match} = -\mathrm{tr}(\mathbf{A}^T \mathbf{P} \hat{\Theta} \mathbf{P}^T)\]

This approach exploits matrix sparsity and is significantly faster than edge sampling methods.

Parameters:

n_clusters (int) – Number of clusters.
n_features (int) – Feature dimension of input node embeddings.

reset_cluster_centers(cluster_centers: Optional[Tensor] = None) → None[source]

Manually sets the cluster centers.

Parameters:: cluster_centers (torch.Tensor, optional) – Tensor of shape (n_clusters, n_features) to initialize the cluster centers. If None, use Xavier uniform initialization.
Return type:: None

forward(z: Tensor, edge_index: Tensor) → Tuple[Tensor, Tensor][source]

Computes the Graph Matching loss.

Parameters:

z (torch.Tensor) – Node embeddings of shape (N, F).
edge_index (torch.Tensor) – Edge indices of shape (2, E).

Returns:

Tuple[Tensor, Tensor] – Tuple of (likelihood_loss, regularization_loss).

cluster(z: Tensor, soft: bool = False) → Tensor[source]

Predicts cluster assignments.

Parameters:

z (torch.Tensor) – Node embeddings of shape (N, F).
soft (bool, optional) – If True, returns soft assignments; otherwise hard assignments. (default: False)

Returns:

Tensor – Cluster assignments.

training: bool

Discrete Cluster Heads 

KMeansClusterHead

The K-Means clustering head with fixed cluster centers.

class KMeansClusterHead(n_clusters: int, backend: str = 'torch', n_init: int = 10, max_iter: int = 300, random_state: Optional[int] = None)[source]

Bases: BaseClusterHead

The K-Means clustering head with fixed cluster centers.

This module performs clustering using the TorchKMeans or sklearn.cluster.KMeans algorithm, and stores the resulting cluster centers for inference. Once fitted, the cluster() method can be used to assign new points based on the stored centers.

Note

This class does not learn trainable parameters and does not define a clustering loss. It is typically used for post-hoc or plug-in clustering.

Parameters:

n_clusters (int) – Number of clusters.
backend (str, optional) – The backend to use for K-Means, either "torch" or "triton" or "sklearn". (default: "torch")
n_init (int, optional) – Number of K-Means initializations to run. (default: 10)
max_iter (int, optional) – Maximum number of iterations per K-Means run. (default: 300)
random_state (int, optional) – Random seed. (default: None)

forward(*args, **kwargs) → Tensor[source]

Runs the forward pass of the module.

Return type:: Tensor

fit_predict(z: Tensor) → Tensor[source]

Performs k-means clustering on the input data and returns cluster labels.

Parameters:: z (torch.Tensor) – The input data of shape (n_samples, n_features).
Returns:: Tensor – Cluster assignments of shape (n_samples,).

cluster(z: Tensor, soft: bool = False) → Tensor[source]

Assigns samples to clusters based on fixed cluster centers.

This function computes the squared Euclidean distance to each center and returns either hard assignments or soft probabilities.

Parameters:

z (torch.Tensor) – Input tensor of shape (n_samples, n_features).
soft (bool, optional) – If True, returns the soft assignment matrix; if False, returns hard cluster assignments. (default: False)

Returns:

Tensor –:

If soft is False, (n_samples,) tensor of cluster indices.
If soft is True, (n_samples, n_clusters) tensor of probabilities.

training: bool

GPU-Accelerated KMeans 

To overcome the computational bottleneck of CPU-based KMeans when \(N > 10^6\), pyagc.clusters provides GPU-accelerated KMeans implementations using PyTorch and OpenAI Triton, achieving multi-fold speedups over CPU baselines:

from pyagc.clusters import TorchKMeans, TritonKMeans

# PyTorch-based GPU KMeans:
kmeans = TorchKMeans(n_clusters=7, metric="euclidean", max_iter=300)
kmeans.fit(embeddings)
labels = kmeans.labels_

# Triton-accelerated GPU KMeans (requires CUDA):
kmeans = TritonKMeans(n_clusters=7, metric="euclidean", max_iter=300)
kmeans.fit(embeddings)
labels = kmeans.labels_

`TorchKMeans`	A PyTorch-based KMeans clustering implementation supporting both Euclidean and Cosine distance metrics, with optional distributed training.
`TritonKMeans`	Fast K-Means clustering implemented with Triton GPU kernels.

class TorchKMeans(metric: str = 'euclidean', init: Union[str, Tensor] = 'k-means++', random_state: Optional[int] = None, n_clusters: int = 8, n_init: int = 10, max_iter: int = 300, tol: float = 0.0001, distributed: bool = False, verbose: bool = False)[source]

Bases: object

A PyTorch-based KMeans clustering implementation supporting both Euclidean and Cosine distance metrics, with optional distributed training. This implementation is adapted from: Hzzone/torch_clustering.

Parameters:

metric (str, optional) – Distance metric to use: 'euclidean' or 'cosine'. (default: 'euclidean')
init (str or torch.Tensor, optional) – Method for initialization: 'k-means++', 'random' or user-specified tensor of shape (n_clusters, n_features). (default: 'k-means++')
random_state (int, optional) – Random seed for initialization. (default: None)
n_clusters (int, optional) – Number of clusters. (default: 8)
n_init (int, optional) – Number of times the algorithm will be run with different centroid seeds. (default: 10)
max_iter (int, optional) – Maximum number of iterations of the k-means algorithm for a single run. (default: 300)
tol (float, optional) – Relative tolerance with regards to inertia to declare convergence. (default: 1e-4)
distributed (bool, optional) – Whether to use distributed training. (default: False)
verbose (bool, optional) – Whether to print progress information. (default: False)

initialize(X: Tensor, random_state: int) → Tensor[source]

Initializes the cluster centers.

Parameters:

X (torch.Tensor) – The input data of shape (n_samples, n_features).
random_state (int) – The random seed.

Returns:

Tensor – Initialized cluster centers of shape (n_clusters, n_features).

fit_predict(X: Tensor) → Tensor[source]

Performs k-means clustering on the input data and returns cluster labels.

Parameters:: X (torch.Tensor) – The input data of shape (n_samples, n_features).
Returns:: Tensor – Cluster assignments of shape (n_samples,).

predict(X: Tensor, soft: bool = False) → Tensor[source]

Assigns samples to clusters based on fixed cluster centers.

This function computes the squared Euclidean distance to each center and returns either hard assignments or soft probabilities.

Parameters:

X (torch.Tensor) – Input tensor of shape (n_samples, n_features).
soft (bool, optional) – If True, returns the soft assignment matrix; if False, returns hard cluster assignments. (default: False)

Returns:

Tensor –:

If soft is False, (n_samples,) tensor of cluster indices.
If soft is True, (n_samples, n_clusters) tensor of probabilities.

class TritonKMeans(*args, **kwargs)[source]

Bases: object

Fast K-Means clustering implemented with Triton GPU kernels.

This implementation provides an interface compatible with TorchKMeans while leveraging Triton kernels for improved performance.

Parameters:

metric (str, default='euclidean') – Distance metric to use. Options: ‘euclidean’, ‘cosine’, ‘dot’
init (str or torch.Tensor, default='k-means++') – Method for initialization: ‘k-means++’, ‘random’ or user-specified tensor of shape (n_clusters, n_features).
random_state (int, optional) – Random seed for centroid initialization.
n_clusters (int, default=8) – Number of clusters (k).
n_init (int, default=10) – Number of times the algorithm will be run with different centroid seeds. The final result will be the best output of n_init consecutive runs.
max_iter (int, default=300) – Maximum number of iterations for a single run.
tol (float, default=1e-4) – Relative tolerance with regards to inertia to declare convergence.
verbose (bool, default=False) – Whether to print per-iteration info.
dtype (torch.dtype, optional) – Compute data type for algorithm.
device (torch.device, optional) – Target device. Defaults to “cuda:0” when available. Currently, only CUDA devices are supported.
distributed (bool, default=False) – Reserved for future distributed training support (currently not implemented).

fit(X)[source]: Fit k-means clustering.

fit_predict(X)[source]: Fit and predict cluster labels.

predict(X, soft=False)[source]: Predict cluster labels.

transform(X)[source]: Transform to cluster-distance space.

fit_transform(X)[source]: Fit and transform.

pyagc.clusters

Base Class

Differentiable Cluster Heads

Discrete Cluster Heads

GPU-Accelerated KMeans

Base Class 

Differentiable Cluster Heads 

Discrete Cluster Heads 

GPU-Accelerated KMeans 