pyagc.models

The pyagc.models module implements the high-level model classes that orchestrate the interaction between encoders and cluster heads within the Encode-Cluster-Optimize framework. Each model defines its own optimization strategy, managing the forward pass and the computation of joint losses.

Models are organized into the following categories:

  • Base Classes: Abstract interfaces for all models (BaseModel, TrainableModel, ClusteringModel).

  • Traditional Methods: Attribute-only or structure-only baselines (e.g., Node2Vec).

  • Non-Parametric Methods: Training-free models using fixed graph filters (e.g., SGC, SSGC, NAFS, SAGSC, S2CAG).

  • Deep Decoupled Methods: Models where the encoder is pre-trained with a self-supervised objective and clustering is applied post-hoc (e.g., GAE, DGI, CCASSG, S3GC, NS4GC, MAGI).

  • Deep Joint Methods: End-to-end models that jointly optimize representation learning and clustering (e.g., DAEGC, DinkNet, DMoN, MinCut, Neuromap, GCSBM).

Base Classes

LossOutput

Unified loss output format for training.

BaseModel

Base interface for all PyAGC models.

TrainableModel

Base class for trainable graph models.

ClusteringModel

Base class for end-to-end clustering models.

class LossOutput(total: Tensor, components: Dict[str, float])[source]

Bases: object

Unified loss output format for training.

This class encapsulates both the total loss (used for backpropagation) and individual loss components (used for logging and monitoring).

Parameters:
  • total (Tensor) – The total loss scalar for backpropagation.

  • components (Dict[str, float]) – Dictionary of individual loss components for logging purposes, e.g., {'reconstruction': 0.5, 'kl': 0.3}.

Example

>>> loss_output = LossOutput(
...     total=torch.tensor(1.5),
...     components={'ali': 0.8, 'nei': 0.5, 'spa': 0.2}
... )
>>> print(loss_output.log_string("Epoch 01: "))
Epoch 01: Loss: 1.5000, ALI: 0.8000, NEI: 0.5000, SPA: 0.2000
total: Tensor
components: Dict[str, float]
log_string(prefix: str = '') str[source]

Generates a formatted log string for printing.

Parameters:

prefix (str, optional) – Prefix string to prepend. (default: "")

Returns:

str – Formatted string like "Loss: 1.50, ALI: 0.80, NEI: 0.50".

class BaseModel(*args, **kwargs)[source]

Bases: ABC, Module

Base interface for all PyAGC models.

All models must implement embed() to produce node embeddings. For trainable models, inherit from TrainableModel instead.

Example

>>> class MyEncoder(BaseModel):
...     def embed(self, x, edge_index):
...         return self.encoder(x, edge_index)

See also

abstract embed(*args: Any, **kwargs: Any) Tensor[source]

Returns node embeddings.

Parameters:
  • models (For lookup-based) – x, edge_index, …

  • models – batch (node indices)

The output shape and type depend on the specific model implementation. Typically returns a Tensor of shape (num_nodes, hidden_dim).

Return type:

Tensor

reset_parameters()[source]

Resets learnable parameters. Override when needed.

infer_full(data: Data) Any[source]

Full-graph inference: returns embeddings or predictions for all nodes.

Parameters:

data (Data) – Input graph data.

Returns:

Any – Node embeddings or predictions, typically of shape (num_nodes, *).

infer_batch(loader: NeighborLoader, verbose: bool = True) Any[source]

Mini-batch inference over a NeighborLoader.

For node-level outputs, concatenates only the seed nodes of each batch.

Parameters:
  • loader (NeighborLoader) – Mini-batch data loader.

  • verbose (bool, optional) – If True, displays a progress bar. (default: True)

Returns:

Any – Node embeddings or predictions for all nodes in the loader.

class TrainableModel[source]

Bases: BaseModel

Base class for trainable graph models.

Subclasses must implement the loss() method. This class provides default implementations for train_full() and train_batch() that handle both single-loss and multi-component loss outputs.

The loss() method can return either:
  • A single Tensor for simple losses

  • A LossOutput object for losses with multiple components

set_logger(logger)[source]

Sets a custom logger for training output.

abstract loss(*args: Any, **kwargs: Any) Union[Tensor, LossOutput][source]

Computes the training loss.

Returns:

Union[Tensor, LossOutput] – Either a scalar loss Tensor, or a LossOutput object containing the total loss and individual components.

train_full(data: Data, optimizer: Optimizer, epoch: int, verbose: bool = True, **loss_kwargs: Any) float[source]

Runs one epoch of full-batch training.

Parameters:
  • data (Data) – The input full graph data.

  • optimizer (torch.optim.Optimizer) – The optimizer.

  • epoch (int) – Current epoch number.

  • verbose (bool, optional) – If True, prints training progress. (default: True)

  • **loss_kwargs – Additional keyword arguments passed to loss().

Returns:

float – Loss value of the epoch.

train_batch(loader: NeighborLoader, optimizer: Optimizer, epoch: int, verbose: bool = True, **loss_kwargs: Any) float[source]

Runs one epoch of mini-batch training.

Parameters:
  • loader (NeighborLoader) – The mini-batch loader.

  • optimizer (torch.optim.Optimizer) – The optimizer.

  • epoch (int) – Current epoch number.

  • verbose (bool, optional) – If True, prints training progress. (default: True)

  • **loss_kwargs – Additional keyword arguments passed to loss_batch().

Returns:

float – Average loss value of the epoch.

loss_batch(batch: Data, **kwargs: Any) Union[Tensor, LossOutput][source]

Computes loss for a mini-batch.

Default implementation simply calls loss(). Subclasses can override this method to handle batch-specific logic (e.g., slicing to seed nodes).

Parameters:
  • batch (Data) – A mini-batch from the loader.

  • **kwargs – Additional keyword arguments.

Returns:

Union[Tensor, LossOutput] – Loss output, same format as loss().

class ClusteringModel[source]

Bases: TrainableModel

Base class for end-to-end clustering models.

This class is designed for models that directly output cluster assignments (e.g., DMoN, MinCut) rather than embeddings. It provides a unified interface for clustering tasks by overriding the infer_full() and infer_batch() methods to return cluster assignments directly.

Subclasses should implement:
  • embed(): Returns node embeddings

  • forward(): Returns hard cluster assignments

  • loss(): Computes clustering loss

abstract forward(*args: Any, **kwargs: Any) Tensor[source]

Returns hard cluster assignments.

Returns:

Tensor – Cluster assignments of shape (num_nodes,).

infer_full(data: Data) Tensor[source]

Full-graph inference: returns cluster assignments for all nodes.

Parameters:

data (Data) – Input graph data.

Returns:

Tensor – Cluster assignments of shape (num_nodes,).

infer_batch(loader: NeighborLoader, verbose: bool = True) Tensor[source]

Mini-batch inference over a NeighborLoader.

Parameters:
  • loader (NeighborLoader) – Mini-batch data loader.

  • verbose (bool, optional) – If True, displays a progress bar. (default: True)

Returns:

Tensor – Cluster assignments for all nodes in the loader.

initialize_cluster_centers(data: Data, num_layers: int, train_idx: Tensor = None, batch_size: int = 4096, fan_out: int = -1, method: str = 'kmeans', verbose: bool = True)[source]

Initialize cluster centers using K-Means.

Supports two modes: 1. Small graphs: Use all nodes for initialization 2. Large graphs: Use only training nodes or mini-batch inference

Parameters:
  • data (Data) – Full graph data.

  • num_layers (int) – Number of encoder layers.

  • train_idx (torch.Tensor, optional) – Training node indices. If provided, only use these nodes for initialization. (default: None)

  • batch_size (int, optional) – Batch size for mini-batch inference. (default: 4096)

  • fan_out (int, optional) – Number of sampled neighbors. (default: -1)

  • method (str, optional) – Initialization method. (default: 'kmeans')

  • verbose (bool, optional) – If True, prints initializing progress. (default: True)

Traditional Methods

Node2Vec

The Node2Vec model from the "node2vec: Scalable Feature Learning for Networks" paper where random walks of length walk_length are sampled in a given graph, and node embeddings are learned via negative sampling optimization.

class Node2Vec(edge_index: Tensor, embedding_dim: int, walk_length: int, context_size: int, walks_per_node: int = 1, p: float = 1.0, q: float = 1.0, num_negative_samples: int = 1, num_nodes: Optional[int] = None, sparse: bool = False, cpu_embedding: bool = False)[source]

Bases: TrainableModel

The Node2Vec model from the “node2vec: Scalable Feature Learning for Networks” paper where random walks of length walk_length are sampled in a given graph, and node embeddings are learned via negative sampling optimization.

Parameters:
  • edge_index (torch.Tensor) – The edge indices.

  • embedding_dim (int) – The size of each embedding vector.

  • walk_length (int) – The walk length.

  • context_size (int) – The actual context size which is considered for positive samples. This parameter increases the effective sampling rate by reusing samples across different source nodes.

  • walks_per_node (int, optional) – The number of walks to sample for each node. (default: 1)

  • p (float, optional) – Likelihood of immediately revisiting a node in the walk. (default: 1)

  • q (float, optional) – Control parameter to interpolate between breadth-first strategy and depth-first strategy (default: 1)

  • num_negative_samples (int, optional) – The number of negative samples to use for each positive sample. (default: 1)

  • num_nodes (int, optional) – The number of nodes. (default: None)

  • sparse (bool, optional) – If set to True, gradients w.r.t. to the weight matrix will be sparse. (default: False)

  • cpu_embedding (bool, optional) – If set to True, stores embeddings on CPU and only moves required embeddings to GPU during training. Essential for very large graphs. (default: False)

reset_parameters()[source]

Resets all learnable parameters of the module.

to(device)[source]

Override to method to handle CPU embedding mode.

embed(batch: Optional[Tensor] = None, device: Optional[device] = None) Tensor[source]

Returns the embeddings for the nodes in batch.

Parameters:
  • batch (torch.Tensor, optional) – Node indices. If None, returns embeddings for all nodes. (default: None)

  • device (torch.device, optional) – Target device for the embeddings. If None, uses self.compute_device for cpu_embedding mode, or the embedding’s device for standard mode. (default: None)

Returns:

Tensor – Node embeddings of shape (num_nodes, embedding_dim) or (batch_size, embedding_dim).

loader(**kwargs) DataLoader[source]

Creates a DataLoader for training Node2Vec.

Returns:

DataLoader – DataLoader that samples positive and negative random walks.

pos_sample(batch: Tensor) Tensor[source]

Samples positive random walks.

Return type:

Tensor

neg_sample(batch: Tensor) Tensor[source]

Samples negative random walks.

Return type:

Tensor

sample(batch: Union[List[int], Tensor]) Tuple[Tensor, Tensor][source]

Samples positive and negative random walks for a batch of nodes.

Return type:

Tuple[Tensor, Tensor]

loss(pos_rw: Tensor, neg_rw: Tensor) LossOutput[source]

Computes the loss given positive and negative random walks.

Parameters:
  • pos_rw (torch.Tensor) – Positive random walks of shape (num_walks, context_size).

  • neg_rw (torch.Tensor) – Negative random walks of shape (num_walks, context_size).

Returns:

LossOutput – LossOutput containing total loss and individual components.

train_epoch(loader: DataLoader, optimizer: Optimizer, epoch: int, verbose: bool = True) float[source]

Runs one epoch of Node2Vec training using the custom DataLoader.

Parameters:
  • loader (DataLoader) – Node2Vec data loader created via loader().

  • optimizer (torch.optim.Optimizer) – The optimizer.

  • epoch (int) – Current epoch number.

  • verbose (bool, optional) – If True, prints training progress. (default: True)

Returns:

float – Average loss value of the epoch.

Non-Parametric Methods

These methods construct node representations via fixed filtering operations over the graph topology without learnable encoder parameters. Clustering is typically performed post-hoc on the resulting embeddings.

SGC

The non-parametric simple graph convolutional (SGC) operator from the "Simplifying Graph Convolutional Networks" paper (Wu et al., ICML 2019).

SSGC

The non-parametric simple spectral graph convolutional (SSGC) operator from the "Simple Spectral Graph Convolution" paper (Zhu, Hao, and Piotr Koniusz, ICLR 2021).

NAFS

Node-Adaptive Feature Smoothing (NAFS) from the "NAFS: A Simple yet Tough-to-beat Baseline for Graph Representation Learning" paper (Zhang et al., ICML 2022).

SAGSC

The Scalable Attributed-Graph Subspace Clustering (SAGSC) model from the "Scalable Attributed-Graph Subspace Clustering" paper (Fettal et al., AAAI 2023).

S2CAG

The S²CAG (Spectral Subspace Clustering for Attributed Graphs) model from the "Spectral Subspace Clustering for Attributed Graphs" paper (Lin et al., KDD 2025).

class SGC(K: int = 1, cached: bool = False, add_self_loops: bool = True, **kwargs)[source]

Bases: MessagePassing, BaseModel

The non-parametric simple graph convolutional (SGC) operator from the “Simplifying Graph Convolutional Networks” paper (Wu et al., ICML 2019). This implementation is adapted from: pyg/sg_conv.

\[\mathbf{X}^{\prime} = {\left(\mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}} \mathbf{\hat{D}}^{-1/2} \right)}^K \mathbf{X},\]

where \(\mathbf{\hat{A}} = \mathbf{A} + \mathbf{I}\) denotes the adjacency matrix with inserted self-loops and \(\hat{D}_{ii} = \sum_{j=0} \hat{A}_{ij}\) its diagonal degree matrix. The adjacency matrix can include other values than 1 representing edge weights via the optional edge_weight tensor.

Parameters:
  • K (int, optional) – Number of hops \(K\). (default: 1)

  • cached (bool, optional) – If set to True, the layer will cache the computation of \({\left(\mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}} \mathbf{\hat{D}}^{-1/2} \right)}^K \mathbf{X}\) on first execution, and will use the cached version for further executions. This parameter should only be set to True in transductive learning scenarios. (default: False)

  • add_self_loops (bool, optional) – If set to False, will not add self-loops to the input graph. (default: True)

  • **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

Shapes:
  • input: node features \((|\mathcal{V}|, F_{in})\), edge indices \((2, |\mathcal{E}|)\), edge weights \((|\mathcal{E}|)\) (optional)

  • output: node features \((|\mathcal{V}|, F_{in})\)

reset_parameters()[source]

Resets all learnable parameters of the module.

embed(*args, **kwargs) Tensor[source]

Computes node embeddings.

Return type:

Tensor

forward(x: Tensor, edge_index: Union[Tensor, SparseTensor], edge_weight: Optional[Tensor] = None) Tensor[source]

Runs the forward pass of the module.

Return type:

Tensor

message(x_j: Tensor, edge_weight: Tensor) Tensor[source]

Constructs messages from node \(j\) to node \(i\) in analogy to \(\phi_{\mathbf{\Theta}}\) for each edge in edge_index. This function can take any argument as input which was initially passed to propagate(). Furthermore, tensors passed to propagate() can be mapped to the respective nodes \(i\) and \(j\) by appending _i or _j to the variable name, .e.g. x_i and x_j.

Return type:

Tensor

message_and_aggregate(adj_t: Union[Tensor, SparseTensor], x: Tensor) Tensor[source]

Fuses computations of message() and aggregate() into a single function. If applicable, this saves both time and memory since messages do not explicitly need to be materialized. This function will only gets called in case it is implemented and propagation takes place based on a torch_sparse.SparseTensor or a torch.sparse.Tensor.

Return type:

Tensor

class SSGC(alpha: float, K: int = 1, cached: bool = False, add_self_loops: bool = True, **kwargs)[source]

Bases: MessagePassing, BaseModel

The non-parametric simple spectral graph convolutional (SSGC) operator from the “Simple Spectral Graph Convolution” paper (Zhu, Hao, and Piotr Koniusz, ICLR 2021). This implementation is adapted from: pyg/ssg_conv.

\[\mathbf{X}^{\prime} = \frac{1}{K} \sum_{k=1}^K\left((1-\alpha) {\left(\mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}} \mathbf{\hat{D}}^{-1/2} \right)}^k \mathbf{X}+\alpha \mathbf{X}\right),\]

where \(\mathbf{\hat{A}} = \mathbf{A} + \mathbf{I}\) denotes the adjacency matrix with inserted self-loops and \(\hat{D}_{ii} = \sum_{j=0} \hat{A}_{ij}\) its diagonal degree matrix. The adjacency matrix can include other values than 1 representing edge weights via the optional edge_weight tensor.

Parameters:
  • alpha (float) – Teleport probability \(\alpha \in [0, 1]\).

  • K (int, optional) – Number of hops \(K\). (default: 1)

  • cached (bool, optional) – If set to True, the layer will cache the computation of \(\frac{1}{K} \sum_{k=1}^K\left((1-\alpha) {\left(\mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}} \mathbf{\hat{D}}^{-1/2} \right)}^k \mathbf{X}+ \alpha \mathbf{X}\right)\) on first execution, and will use the cached version for further executions. This parameter should only be set to True in transductive learning scenarios. (default: False)

  • add_self_loops (bool, optional) – If set to False, will not add self-loops to the input graph. (default: True)

  • **kwargs (optional) – Additional arguments of torch_geometric.nn.conv.MessagePassing.

Shapes:
  • input: node features \((|\mathcal{V}|, F_{in})\), edge indices \((2, |\mathcal{E}|)\), edge weights \((|\mathcal{E}|)\) (optional)

  • output: node features \((|\mathcal{V}|, F_{in})\)

reset_parameters()[source]

Resets all learnable parameters of the module.

embed(*args, **kwargs) Tensor[source]

Computes node embeddings.

Return type:

Tensor

forward(x: Tensor, edge_index: Union[Tensor, SparseTensor], edge_weight: Optional[Tensor] = None) Tensor[source]

Runs the forward pass of the module.

Return type:

Tensor

message(x_j: Tensor, edge_weight: Tensor) Tensor[source]

Constructs messages from node \(j\) to node \(i\) in analogy to \(\phi_{\mathbf{\Theta}}\) for each edge in edge_index. This function can take any argument as input which was initially passed to propagate(). Furthermore, tensors passed to propagate() can be mapped to the respective nodes \(i\) and \(j\) by appending _i or _j to the variable name, .e.g. x_i and x_j.

Return type:

Tensor

message_and_aggregate(adj_t: Union[Tensor, SparseTensor], x: Tensor) Tensor[source]

Fuses computations of message() and aggregate() into a single function. If applicable, this saves both time and memory since messages do not explicitly need to be materialized. This function will only gets called in case it is implemented and propagation takes place based on a torch_sparse.SparseTensor or a torch.sparse.Tensor.

Return type:

Tensor

class NAFS(K: int = 20, r_list: Optional[List[float]] = None, ensemble: str = 'concat', distance: str = 'cosine')[source]

Bases: BaseModel

Node-Adaptive Feature Smoothing (NAFS) from the “NAFS: A Simple yet Tough-to-beat Baseline for Graph Representation Learning” paper (Zhang et al., ICML 2022).

NAFS is a training-free method that constructs node representations by adaptively smoothing node features over different propagation steps. It addresses two key limitations of GNNs: over-smoothing and scalability.

The method consists of two operations:

(1) Node-Adaptive Feature Smoothing:

For each smoothing step \(k\), compute:

\[X^{(k)} = \hat{A}^k X\]

where \(\hat{A} = \tilde{D}^{r-1} \tilde{A} \tilde{D}^{-r}\) is the normalized adjacency matrix with parameter \(r\).

Then adaptively combine them using smoothing weights:

\[\hat{X} = \sum_{k=0}^{K} W^{(k)} X^{(k)}\]

where \(W^{(k)}\) are diagonal weight matrices computed based on the over-smoothing distance:

\[D_i(k) = \text{dist}([X^{(k)}]_i, X_i)\]

(2) Feature Ensemble:

Combine smoothed features from different \(r\) values:

\[Z = \bigoplus_{t=1}^{T} \hat{X}^{(t)}\]

where \(\bigoplus\) can be concatenation, mean, or max pooling.

Parameters:
  • K (int) – Maximum number of smoothing steps. (default: 20)

  • r_list (List[float], optional) – List of normalization parameters for feature ensemble. (default: [0.0, 0.1, 0.2, 0.3, 0.4, 0.5])

  • ensemble (str, optional) – Ensemble strategy, one of 'mean', 'max', or 'concat'. (default: 'concat')

  • distance (str, optional) – Distance function for computing over-smoothing distance, one of 'cosine' or 'euclidean'. (default: 'cosine')

Example

>>> from pyagc.models import NAFS
>>> from pyagc.data import get_dataset
>>>
>>> # Load data
>>> x, edge_index, y = get_dataset('Cora', root='./data')
>>>
>>> # Create model
>>> model = NAFS(K=20, ensemble='concat')
>>>
>>> # Get embeddings
>>> embeddings = model.embed(x, edge_index)
>>>
>>> # Use for clustering
>>> from pyagc.clusters import KMeansClusterHead
>>> kmeans = KMeansClusterHead(n_clusters=7)
>>> pred = kmeans.fit_predict(embeddings)
embed(x: Tensor, edge_index: Union[Tensor, SparseTensor], **kwargs) Tensor[source]

Generate node embeddings using NAFS.

Parameters:
  • x (Tensor) – Node features of shape (num_nodes, num_features).

  • edge_index (Tensor) – Edge indices.

Returns:

Tensor – Node embeddings of shape (num_nodes, num_features * T) if ensemble=’concat’, otherwise (num_nodes, num_features).

forward(x: Tensor, edge_index: Union[Tensor, SparseTensor], **kwargs) Tensor[source]

Alias for embed().

Return type:

Tensor

class SAGSC(p: int, k: int, bias: float = 0.7071067811865476)[source]

Bases: BaseModel

The Scalable Attributed-Graph Subspace Clustering (SAGSC) model from the “Scalable Attributed-Graph Subspace Clustering” paper (Fettal et al., AAAI 2023).

SAGSC consists of two stages:

(1) Node Embedding Generation (SGC):

\[H = S^p X\]

where \(S = \hat{D}^{-1/2} \hat{A} \hat{D}^{-1/2}\) is the normalized adjacency and \(p\) is the propagation order.

(2) Implicit Subspace Clustering:

SAGSC avoids learning a dense coefficient matrix \(C \in \mathbb{R}^{n \times n}\) by imposing a low–rank factorization:

\[C = U U^\top,\quad U^\top U = I_k.\]

\(U\) is obtained from the top-\(k\) left singular vectors of \(H\).

To construct a non–negative affinity matrix, SAGSC uses a quadratic kernel applied row-wise:

\[Q = \Phi(U), \qquad M = Q Q^\top \ge 0.\]

Then the normalized matrix

\[\tilde{Q} = Q D^{-1/2},\]

is decomposed via SVD, and the rows of the top \(k\) singular vectors (excluding the trivial first one) are clustered using \(k\)-means.

SAGSC performs implicit spectral clustering without explicitly forming the affinity matrix \(M \in \mathbb{R}^{n \times n}\).

Parameters:
  • p (int) – Power of the SGC propagation matrix.

  • k (int) – Number of clusters (also feature dimension of encoder outputs).

  • bias (float) – Bias term in the quadratic feature map \(\Phi\).

embed(*args, **kwargs) Tensor[source]

Computes the SAGSC latent representation used for clustering.

Steps:
  1. \(H = S^p X\) (SGC propagation)

  2. \(U =\) top-\(k\) left singular vectors of \(H\)

  3. \(Q = \Phi(U)\)

  4. \(D =\) row sum vector of \(Q Q^\top\) (implemented implicitly)

  5. \(\tilde{Q} = Q D^{-1/2}\)

  6. \(Z =\) singular vectors 2 to \(k+1\) of \(\tilde{Q}\)

Parameters:
  • x (Tensor) – Node features of shape (n, d).

  • edge_index (Adj) – Edge indices.

  • edge_weight (Tensor, optional) – Edge weights.

Returns:

Tensor – Spectral embedding of shape (n, k).

forward(x: Tensor, edge_index: Union[Tensor, SparseTensor], edge_weight: Optional[Tensor] = None)[source]

Returns the spectral embedding.

class S2CAG(n_clusters: int, T: int = 10, alpha: float = 0.9, tau: int = 7, oversampling: int = 10)[source]

Bases: BaseModel

The S²CAG (Spectral Subspace Clustering for Attributed Graphs) model from the “Spectral Subspace Clustering for Attributed Graphs” paper (Lin et al., KDD 2025).

S²CAG performs unsupervised graph clustering by computing Normalized Smoothed Representations (NSR) via graph Laplacian smoothing, then extracting cluster structures through truncated SVD and spectral rounding.

The model optimizes the following objective:

\[\min_{S} \|Z - SZ\|_F^2 + \|S\|_* + \|S^T S - I\|_F^2\]

where \(Z\) is the NSR matrix, \(S\) is the self-expressive matrix, and \(\|\cdot\|_*\) denotes the nuclear norm.

Theoretically, S²CAG is equivalent to minimizing the total conductance of clusters on an affinity graph constructed from NSR.

Parameters:
  • n_clusters (int) – Number of clusters.

  • T (int, optional) – Number of propagation steps. (default: 10)

  • alpha (float, optional) – Decay factor for smoothing. (default: 0.9)

  • tau (int, optional) – Number of iterations for SVD approximation. (default: 7)

  • oversampling (int, optional) – Oversampling parameter for randomized SVD. (default: 10)

Example

>>> from pyagc.models import S2CAG
>>> from pyagc.data import get_dataset
>>>
>>> # Load data
>>> x, edge_index, y = get_dataset('Cora', root='./data')
>>> n_clusters = int(y.max()) + 1
>>>
>>> # Create model
>>> model = S2CAG(n_clusters=n_clusters, T=20, alpha=0.9)
>>>
>>> # Get cluster assignments
>>> clusters = model.embed(x, edge_index)
embed(x: Tensor, edge_index: Tensor, **kwargs) Tensor[source]

Computes cluster assignments via S²CAG.

Parameters:
  • x (Tensor) – Node features of shape (num_nodes, num_features).

  • edge_index (Tensor) – Edge indices of shape (2, num_edges).

Returns:

Tensor – Node embeddings of shape (num_nodes, n_clusters).

forward(x: Tensor, edge_index: Tensor, **kwargs) Tensor[source]

Alias for embed().

Return type:

Tensor

Deep Decoupled Methods

In decoupled methods, the encoder is pre-trained using a self-supervised objective \(\mathcal{L}_{\text{rep}}\), and a discrete clustering algorithm (e.g., KMeans) is applied to the learned embeddings post-hoc. The clustering process is not part of the training loop.

GAE

The Graph Auto-Encoder model from the "Variational Graph Auto-Encoders" paper based on user-defined encoder and decoder models.

ARGA

The Adversarially Regularized Graph Auto-Encoder model from the "Adversarially Regularized Graph Autoencoder for Graph Embedding" paper.

DGI

The Deep Graph Infomax (DGI) model from the "Deep Graph Infomax" paper (Veličković et al., ICLR 2019)based on user-defined encoder and summary model \(\mathcal{E}\) and \(\mathcal{R}\) respectively, and a corruption function \(\mathcal{C}\).

CCASSG

The CCA-SSG (Canonical Correlation Analysis-inspired Self-Supervised GNN) model is proposed in the From Canonical Correlation Analysis to Self-supervised Graph Neural Networks paper (Zhang et al., NeurIPS 2021).

GBT

The G-BT (Graph Barlow Twins) model is proposed in the "Graph Barlow Twins: A Self-Supervised Representation Learning Framework for Graphs" paper (Bielak et al., KBS 2022).

S3GC

The S3GC (Scalable Self-Supervised Graph Clustering) model from the "S3GC: Scalable Self-Supervised Graph Clustering" paper (Devvrit et al., NeurIPS 2022).

NS4GC

The NS4GC (Node Similarity-guided Contrastive Graph Clustering) model is proposed in the "Reliable Node Similarity Matrix Guided Contrastive Graph Clustering" paper (Liu et al., TKDE 2024).

MAGI

The MAGI (Modularity-Aware Graph clustering via contrastive learnIng) model from the "Revisiting Modularity Maximization for Graph Clustering: A Contrastive Learning Perspective" paper (Liu et al., KDD 2024).

class GAE(encoder: Module, decoder: Optional[Module] = None)[source]

Bases: TrainableModel

The Graph Auto-Encoder model from the “Variational Graph Auto-Encoders” paper based on user-defined encoder and decoder models.

Parameters:
reset_parameters()[source]

Resets all learnable parameters of the module.

forward(*args, **kwargs) Tensor[source]

Alias for embed().

Return type:

Tensor

embed(*args, **kwargs) Tensor[source]

Computes node embeddings via the encoder.

Return type:

Tensor

decode(*args, **kwargs) Tensor[source]

Runs the decoder and computes edge probabilities.

Return type:

Tensor

recon_loss(z: Tensor, pos_edge_index: Tensor, neg_edge_index: Optional[Tensor] = None) Tensor[source]

Given latent variables z, computes the binary cross entropy loss for positive edges pos_edge_index and negative sampled edges.

Parameters:
  • z (torch.Tensor) – The latent space \(\mathbf{Z}\).

  • pos_edge_index (torch.Tensor) – The positive edges to train against.

  • neg_edge_index (torch.Tensor, optional) – The negative edges to train against. If not given, uses negative sampling to calculate negative edges. (default: None)

Return type:

Tensor

loss(x: Tensor, edge_index: Tensor, **kwargs) Tensor[source]

Computes the reconstruction loss for GAE.

Parameters:
Returns:

Tensor – Reconstruction loss as a scalar tensor.

loss_batch(batch: Data) Tensor[source]

Computes loss for a mini-batch with seed node slicing.

Parameters:

batch (Data) – A mini-batch from the loader.

Returns:

Tensor – Reconstruction loss as a scalar tensor.

class ARGA(encoder: Module, discriminator: Module, decoder: Optional[Module] = None)[source]

Bases: GAE

The Adversarially Regularized Graph Auto-Encoder model from the “Adversarially Regularized Graph Autoencoder for Graph Embedding” paper.

Note

ARGA requires a two-phase training procedure (encoder + discriminator). Use train_encoder() and train_discriminator() separately, or implement a custom training loop.

Parameters:
reset_parameters()[source]

Resets all learnable parameters of the module.

reg_loss(z: Tensor) Tensor[source]

Computes the regularization loss of the encoder.

Parameters:

z (torch.Tensor) – The latent space \(\mathbf{Z}\).

Return type:

Tensor

discriminator_loss(z: Tensor) Tensor[source]

Computes the loss of the discriminator.

Parameters:

z (torch.Tensor) – The latent space \(\mathbf{Z}\).

Return type:

Tensor

loss(x: Tensor, edge_index: Tensor, **kwargs) LossOutput[source]

Computes the ARGA encoder loss with reconstruction and regularization components.

Parameters:
Returns:

LossOutput – LossOutput containing total loss and individual components.

loss_batch(batch: Data) LossOutput[source]

Computes encoder loss for a mini-batch with seed node slicing.

Parameters:

batch (Data) – A mini-batch from the loader.

Returns:

LossOutput – LossOutput containing total loss and individual components.

train_encoder(data: Data, optimizer: Optimizer, epoch: int, verbose: bool = True) float[source]

Trains the encoder for one epoch.

This is equivalent to train_full() but provided for clarity in the two-phase ARGA training procedure.

Parameters:
  • data (Data) – The input full graph data.

  • optimizer (torch.optim.Optimizer) – The optimizer for encoder parameters.

  • epoch (int) – Current epoch number.

  • verbose (bool, optional) – If True, prints training progress. (default: True)

Returns:

float – Loss value of the epoch.

train_discriminator(data: Data, optimizer: Optimizer, epoch: int, verbose: bool = True) float[source]

Trains the discriminator for one epoch.

Parameters:
  • data (Data) – The input full graph data.

  • optimizer (torch.optim.Optimizer) – The optimizer for discriminator parameters.

  • epoch (int) – Current epoch number.

  • verbose (bool, optional) – If True, prints training progress. (default: True)

Returns:

float – Discriminator loss value of the epoch.

class DGI(hidden_channels: int, encoder: Module, summary: Optional[Callable] = None, corruption: Optional[Callable] = None)[source]

Bases: TrainableModel

The Deep Graph Infomax (DGI) model from the “Deep Graph Infomax” paper (Veličković et al., ICLR 2019)based on user-defined encoder and summary model \(\mathcal{E}\) and \(\mathcal{R}\) respectively, and a corruption function \(\mathcal{C}\).

DGI maximizes mutual information between patch representations and corresponding high-level summaries of the graph by training a discriminator to distinguish between positive samples (real node-graph pairs) and negative samples (corrupted node-graph pairs).

This implementation is adapted from: pyg/deep_graph_infomax.

Parameters:
  • hidden_channels (int) – The latent space dimensionality.

  • encoder (torch.nn.Module) – The encoder module \(\mathcal{E}\).

  • summary (callable, optional) – The readout function \(\mathcal{R}\) that computes graph-level representations. If None, uses global mean pooling with sigmoid activation. (default: None)

  • corruption (callable, optional) – The corruption function \(\mathcal{C}\) that generates negative samples. If None, uses random feature shuffling. (default: None)

Example

>>> from pyagc.models import DGI
>>> from torch_geometric.nn import GCN
>>>
>>> # With default summary and corruption
>>> encoder = GCN(in_channels=128, hidden_channels=64, num_layers=2)
>>> model = DGI(hidden_channels=64, encoder=encoder)
>>>
>>> # With custom functions
>>> def custom_corruption(x, edge_index):
...     return x + torch.randn_like(x) * 0.1, edge_index
>>>
>>> def custom_summary(z):
...     return z.max(dim=0)[0]
>>>
>>> model = DGI(
...     hidden_channels=64,
...     encoder=encoder,
...     summary=custom_summary,
...     corruption=custom_corruption
... )
reset_parameters()[source]

Resets all learnable parameters of the module.

embed(*args, **kwargs) Tensor[source]

Computes node embeddings.

Return type:

Tensor

forward(*args, **kwargs) Tuple[Tensor, Tensor, Tensor][source]

Returns the latent space for the input arguments, their corruptions and their summary representation.

Return type:

Tuple[Tensor, Tensor, Tensor]

discriminate(z: Tensor, summary: Tensor, sigmoid: bool = True) Tensor[source]

Given the patch-summary pair z and summary, computes the probability scores assigned to this patch-summary pair.

Parameters:
  • z (torch.Tensor) – The latent space.

  • summary (torch.Tensor) – The summary vector.

  • sigmoid (bool, optional) – If set to False, does not apply the logistic sigmoid function to the output. (default: True)

Return type:

Tensor

loss(x: Tensor, edge_index: Tensor, **kwargs) LossOutput[source]

Computes the mutual information maximization objective.

Return type:

LossOutput

loss_batch(batch: Data) LossOutput[source]

Computes loss for a mini-batch with seed node slicing.

Return type:

LossOutput

class CCASSG(encoder: Module, transform1: BaseTransform, transform2: BaseTransform, lam: float = 0.001)[source]

Bases: TrainableModel

The CCA-SSG (Canonical Correlation Analysis-inspired Self-Supervised GNN) model is proposed in the From Canonical Correlation Analysis to Self-supervised Graph Neural Networks paper (Zhang et al., NeurIPS 2021).

CCASSG aims to learn node representations by maximizing the agreement between embeddings from two augmented graph views while decorrelating the feature dimensions of each view.

The loss function combines an invariance term and a decorrelation term:

\[\mathcal{L} = \underbrace{\| \tilde{\boldsymbol{Z}}^{(1)} - \tilde{\boldsymbol{Z}}^{(2)} \|_F^2}_{\text{invariance}} + \lambda \left( \| \tilde{\boldsymbol{Z}}^{(1)\top} \tilde{\boldsymbol{Z}}^{(1)} - \boldsymbol{I} \|_F^2 + \| \tilde{\boldsymbol{Z}}^{(2)\top} \tilde{\boldsymbol{Z}}^{(2)} - \boldsymbol{I} \|_F^2 \right)\]

where:

  • \(\tilde{\boldsymbol{Z}}^{(i)}\) is the column-normalized embedding of view \(i\);

  • \(\boldsymbol{I}\) is the identity matrix;

  • \(\lambda\) controls the strength of decorrelation regularization.

Parameters:
reset_parameters()[source]

Resets all learnable parameters of the module.

embed(*args, **kwargs) Tensor[source]

Computes node embeddings.

Return type:

Tensor

forward(*args, **kwargs) Tuple[Tensor, Tensor][source]

Generates embeddings from two graph augmentations.

Return type:

Tuple[Tensor, Tensor]

loss(x: Tensor, edge_index: Tensor, **kwargs) LossOutput[source]

Computes the CCA-SSG loss with multiple components.

Parameters:
Returns:

LossOutput – LossOutput containing total loss and individual components.

loss_batch(batch: Data) LossOutput[source]

Computes loss for a mini-batch with seed node slicing.

Parameters:

batch (Data) – A mini-batch from the loader.

Returns:

LossOutput – LossOutput containing total loss and individual components.

class GBT(encoder: Module, transform1: BaseTransform, transform2: BaseTransform, lam: float = 0.005)[source]

Bases: TrainableModel

The G-BT (Graph Barlow Twins) model is proposed in the “Graph Barlow Twins: A Self-Supervised Representation Learning Framework for Graphs” paper (Bielak et al., KBS 2022).

G-BT learns node representations by maximizing the similarity between two embeddings of the same node under different augmentations, while reducing redundancy between dimensions via a decorrelation objective.

The loss function is:

\[\mathcal{L} = \sum_i (1 - \mathcal{C}_{ii})^2 + \lambda \sum_{i}\sum_{j \ne i} \mathcal{C}_{ij}^2\]

where:

  • \(\mathcal{C}\) is the cross-correlation matrix between normalized embeddings from two views: \(\mathcal{C}_{ij} = \sum_b \tilde{z}^{(1)}_{b,i} \tilde{z}^{(2)}_{b,j}\).

  • \(\tilde{z}\) denotes the batch-normalized node embedding.

  • \(\lambda\) balances invariance and redundancy reduction.

Parameters:
reset_parameters()[source]

Resets all learnable parameters of the module.

embed(*args, **kwargs) Tensor[source]

Computes node embeddings.

Return type:

Tensor

forward(*args, **kwargs) Tuple[Tensor, Tensor][source]

Generates embeddings from two graph augmentations.

Return type:

Tuple[Tensor, Tensor]

loss(x: Tensor, edge_index: Tensor, **kwargs) LossOutput[source]

Computes the GBT loss with multiple components.

Parameters:
Returns:

LossOutput – LossOutput containing total loss and individual components.

loss_batch(batch: Data) LossOutput[source]

Computes loss for a mini-batch with seed node slicing.

Parameters:

batch (Data) – A mini-batch from the loader.

Returns:

LossOutput – LossOutput containing total loss and individual components.

class S3GC(edge_index: Tensor, num_nodes: int, in_channels: int, hidden_channels: int, walk_length: int, context_size: int, walks_per_node: int = 1, num_negative_samples: int = 1, p: float = 1.0, q: float = 1.0, scale: Literal['small', 'medium', 'large', 'auto'] = 'auto', keep_embeddings_on_cpu: bool = False)[source]

Bases: TrainableModel

The S3GC (Scalable Self-Supervised Graph Clustering) model from the “S3GC: Scalable Self-Supervised Graph Clustering” paper (Devvrit et al., NeurIPS 2022).

S3GC uses a simple GCN-based encoder combined with contrastive learning to learn clusterable node representations. The architecture adapts based on graph scale:

  • Small (< 50K nodes): Simple linear encoder

  • Medium (50K - 3M nodes): 2-layer MLP encoder

  • Large (> 3M nodes): 2-layer MLP with memory-efficient embeddings

The encoder combines:

  1. Direct attribute transformation: \(f(\tilde{A}X)\)

  2. Diffusion-based transformation: \(g(S_kX)\)

  3. Learnable node embeddings: \(I\)

Training uses SimCLR-style contrastive loss where nodes sampled via random walks are positives, and randomly sampled nodes are negatives.

Parameters:
  • edge_index (Tensor) – Edge indices of the graph.

  • num_nodes (int) – Number of nodes in the graph.

  • in_channels (int) – Input feature dimension.

  • hidden_channels (int) – Output embedding dimension.

  • walk_length (int) – Length of random walks for positive sampling.

  • context_size (int) – Context size for random walk (should be \(\leq\) walk_length).

  • walks_per_node (int, optional) – Number of random walks per node. (default: 1)

  • num_negative_samples (int, optional) – Number of negative samples per positive. (default: 1)

  • p (float, optional) – Return parameter for random walk. (default: 1.0)

  • q (float, optional) – In-out parameter for random walk. (default: 1.0)

  • scale (str, optional) – Graph scale, one of ['small', 'medium', 'large', 'auto']. If 'auto', automatically determined by num_nodes. (default: 'auto')

Example

>>> from pyagc.models.s3gc import S3GC, precompute_features
>>> from pyagc.data import get_dataset
>>>
>>> # Load data
>>> x, edge_index, y = get_dataset('Cora', root='./data')
>>>
>>> # Precompute features
>>> AX, SX = precompute_features(x, edge_index, x.size(0), method='ppr')
>>>
>>> # Create model (automatically detects 'small' scale)
>>> model = S3GC(
...     edge_index=edge_index,
...     num_nodes=x.size(0),
...     in_channels=x.size(1),
...     hidden_channels=256,
...     walk_length=3,
...     context_size=3
... )
>>>
>>> # Set precomputed features
>>> model.set_precomputed_features(AX, SX)
>>>
>>> # Train
>>> loader = model.loader(batch_size=2708, shuffle=True)
>>> optimizer = torch.optim.Adam(model.parameters(), lr=0.01)
>>> for epoch in range(1, 201):
...     loss = model.train_epoch(loader, optimizer, epoch)
reset_parameters()[source]

Resets all learnable parameters.

set_precomputed_features(AX: Tensor, SX: Tensor, keep_on_cpu: bool = False)[source]

Sets precomputed \(\tilde{A}X\) and \(S_kX\) matrices.

This should be called before training to avoid recomputing these matrices in every forward pass.

Parameters:
  • AX (Tensor) – Precomputed \(\tilde{A}X\) of shape (num_nodes, in_channels).

  • SX (Tensor) – Precomputed \(S_kX\) of shape (num_nodes, in_channels).

  • keep_on_cpu (bool) – If True, keep features on CPU to save GPU memory

embed(indices: Optional[Tensor] = None) Tensor[source]

Returns node embeddings.

Parameters:

indices (Tensor, optional) – Node indices. If None, returns embeddings for all nodes. (default: None)

Returns:

Tensor – Node embeddings of shape (num_nodes, hidden_channels) or (len(indices), hidden_channels).

pos_sample(batch: Tensor) Tensor[source]

Samples positive nodes via biased random walks.

Return type:

Tensor

neg_sample(batch: Tensor) Tensor[source]

Samples negative nodes randomly or cluster-aware.

Return type:

Tensor

sample(batch: Union[List[int], Tensor]) Tuple[Tensor, Tensor][source]

Samples positive and negative random walks for a batch.

Return type:

Tuple[Tensor, Tensor]

loader(**kwargs) DataLoader[source]

Creates a DataLoader for S3GC training.

Returns:

DataLoader – DataLoader that samples nodes and generates positive/negative walks.

loss(pos_rw: Tensor, neg_rw: Tensor, node_mapping: Tensor) LossOutput[source]

Computes the SimCLR-style contrastive loss.

Parameters:
  • pos_rw (Tensor) – Positive random walks of shape (num_walks, context_size).

  • neg_rw (Tensor) – Negative random walks of shape (num_walks, context_size).

  • node_mapping (Tensor) – Mapping from global node IDs to batch-local IDs.

Returns:

LossOutput – LossOutput containing total loss and individual components.

train_epoch(loader: DataLoader, optimizer: Optimizer, epoch: int, verbose: bool = True) float[source]

Runs one epoch of S3GC training using the custom DataLoader.

Parameters:
  • loader (DataLoader) – S3GC data loader created via loader().

  • optimizer (torch.optim.Optimizer) – The optimizer.

  • epoch (int) – Current epoch number.

  • verbose (bool, optional) – If True, prints training progress. (default: True)

Returns:

float – Average loss value of the epoch.

class NS4GC(encoder: Module, transform1: BaseTransform, transform2: BaseTransform, s: float = 0.6, tau: float = 0.1, lam: float = 1.0, gam: float = 1.0)[source]

Bases: TrainableModel

The NS4GC (Node Similarity-guided Contrastive Graph Clustering) model is proposed in the “Reliable Node Similarity Matrix Guided Contrastive Graph Clustering” paper (Liu et al., TKDE 2024).

NS4GC aims to learn node representations suitable for clustering via contrastive learning guided by a node similarity matrix computed from two augmented views of the graph.

The model uses the following contrastive loss function:

\[\mathcal{L} = \mathcal{L}_{\text{ali}} + \lambda \mathcal{L}_{\text{nei}} + \gamma \mathcal{L}_{\text{spa}}\]

where:

  • \(\boldsymbol{S} = \boldsymbol{Z}^{(1)} (\boldsymbol{Z}^{(2)})^{\top}\) is the similarity matrix computed by inner product between two views;

  • \(\mathcal{L}_{\text{ali}} = -\frac{1}{N} \sum_i \boldsymbol{S}_{ii}\) encourages aligned embeddings from two views;

  • \(\mathcal{L}_{\text{nei}} = -\frac{1}{|\mathcal{E}|} \sum_{(i,j) \in \mathcal{E}} \boldsymbol{S}_{ij}\) enforces consistency for neighbors;

  • \(\mathcal{L}_{\text{spa}} = \mathbb{E}_{(i,j) \notin \mathcal{E}} \sigma((\boldsymbol{S}_{ij} - s) / \tau)\) encourages sparsity of similarity values between non-neighbor pairs. \(\sigma(x) = 1 / (1 + \exp(-x))\) is the sigmoid function.

Parameters:

Example

>>> from pyagc.models import NS4GC
>>> from pyagc.transforms import GSSLTransform
>>> from pyagc.encoders import create_tuned_gnn
>>>
>>> # Create encoder
>>> encoder = create_tuned_gnn('gcn', in_channels=128, hidden_channels=64, num_layers=2)
>>>
>>> # Create augmentation transforms
>>> transform1 = GSSLTransform(p_feat_mask=0.2, p_edge_drop=0.3)
>>> transform2 = GSSLTransform(p_feat_mask=0.2, p_edge_drop=0.3)
>>>
>>> # Create model
>>> model = NS4GC(
...     encoder=encoder,
...     transform1=transform1,
...     transform2=transform2,
...     s=0.6,
...     tau=0.1,
...     lam=1.0,
...     gam=1.0
... )
reset_parameters()[source]

Resets all learnable parameters of the module.

embed(*args, **kwargs) Tensor[source]

Computes node embeddings.

Return type:

Tensor

forward(*args, **kwargs) Tuple[Tensor, Tensor][source]

Generates embeddings from two graph augmentations.

Return type:

Tuple[Tensor, Tensor]

loss(x: Tensor, edge_index: Tensor, **kwargs) LossOutput[source]

Computes the NS4GC loss with multiple components.

Parameters:
Returns:

LossOutput – LossOutput containing total loss and individual components.

loss_batch(batch: Data) LossOutput[source]

Computes loss for a mini-batch with seed node slicing.

Parameters:

batch (Data) – A mini-batch from the loader.

Returns:

LossOutput – LossOutput containing total loss and individual components.

class MAGI(encoder: Module, tau: float = 0.5, scale_embeddings: bool = True)[source]

Bases: TrainableModel

The MAGI (Modularity-Aware Graph clustering via contrastive learnIng) model from the “Revisiting Modularity Maximization for Graph Clustering: A Contrastive Learning Perspective” paper (Liu et al., KDD 2024).

MAGI establishes the connection between modularity maximization and graph contrastive learning, where positive and negative samples are naturally guided by the modularity matrix. The model uses a community-aware pretext task based on two-stage random walks to capture high-order proximity within communities.

The loss function follows InfoNCE-style contrastive learning:

\[\mathcal{L} = -\sum_{v \in \mathcal{B}} \sum_{u \in \mathcal{M}^+_v} \log \frac{\exp(\mathbf{z}_v^\top \mathbf{z}_u / \tau)} {\sum_{u \in \mathcal{M}^+_v} \exp(\mathbf{z}_v^\top \mathbf{z}_u / \tau) + \sum_{u' \in \mathcal{M}^-_v} \exp(\mathbf{z}_v^\top \mathbf{z}_{u'} / \tau)}\]

where:

  • \(\mathcal{M}^+_v = \{u \mid B_{vu} > 0\}\) are positive samples (same community).

  • \(\mathcal{M}^-_v = \{u \mid B_{vu} \leq 0\}\) are negative samples (different communities).

  • \(B_{vu}\) is the modularity coefficient computed via two-stage random walks.

  • \(\tau\) is the temperature parameter.

Parameters:
  • encoder (torch.nn.Module) – The GNN encoder module.

  • tau (float, optional) – Temperature parameter for contrastive loss. (default: 0.5)

  • scale_embeddings (bool, optional) – Whether to apply min-max scaling to embeddings before normalization. (default: True)

reset_parameters()[source]

Resets all learnable parameters of the module.

embed(*args, **kwargs) Tensor[source]

Computes node embeddings.

Returns:

Tensor – Node embeddings of shape (num_nodes, hidden_dim).

loss(x: Tensor, edge_index: Tensor, pos_pairs: Tensor, **kwargs) LossOutput[source]

Computes the MAGI contrastive loss for full-graph training.

Parameters:
  • x (torch.Tensor) – Node feature matrix of shape (num_nodes, num_features).

  • edge_index (torch.Tensor) – Edge indices of shape (2, num_edges).

  • pos_pairs (torch.Tensor) – Precomputed positive sample pairs of shape (2, num_pos_edges) based on modularity matrix.

  • **kwargs – Additional arguments for the encoder.

Returns:

LossOutput – LossOutput containing the total loss.

loss_batch(batch: Data) LossOutput[source]

Computes loss for a mini-batch with expanded seed nodes.

In mini-batch training, the positive pairs are defined only among the expanded batch seed nodes (obtained via two-stage random walks).

Parameters:

batch (Data) – A mini-batch from the MAGI loader containing: - x: Node features (including neighbors) - edge_index: Sampled edges - pos_pairs: Positive pairs for expanded batch seed nodes - expanded_batch_size: Number of expanded seed nodes

Returns:

LossOutput – LossOutput containing the total loss.

Deep Joint Methods

In joint methods, the encoder and cluster head are optimized end-to-end. The clustering objective \(\mathcal{L}_{\text{clust}}\) provides supervisory signals that directly refine the learned representations. These models inherit from ClusteringModel and output cluster assignments directly.

DAEGC

Deep Attentional Embedded Graph Clustering model from the "Attributed Graph Clustering: A Deep Attentional Embedding Approach" paper (Wang et al., IJCAI 2019).

DinkNet

The Dink-Net model from the "Dink-Net: Neural Clustering on Large Graphs" paper (Liu et al., ICML 2023).

DMoN

The Deep Modularity Network (DMoN) is proposed in the "Graph Clustering with Graph Neural Networks" paper (Tsitsulin et al., JMLR 2023).

MinCut

The MinCut model is based on "Spectral Clustering in Graph Neural Networks for Graph Pooling" (Bianchi et al., 2019).

Neuromap

The Neuromap model implements the differentiable map equation from "The Map Equation Goes Neural: Mapping Network Flows with Graph Neural Networks" (Blöcker et al., NeurIPS 2024).

GCSBM

Stochastic Block Model (GCSBM) based clustering model from the paper "Differentiable Community Detection with Graph Neural Networks and Stochastic Block Models" (Arliss & Mueller, LoG 2025).

class DAEGC(encoder: Module, n_clusters: int, hidden_channels: int, decoder: Optional[Module] = None, gamma: float = 10.0, update_interval: int = 5)[source]

Bases: ClusteringModel

Deep Attentional Embedded Graph Clustering model from the “Attributed Graph Clustering: A Deep Attentional Embedding Approach” paper (Wang et al., IJCAI 2019).

DAEGC jointly optimizes graph embedding and clustering through:

  1. Graph Attentional Autoencoder: Learns representations by encoding both structure and content with attention mechanism, then reconstructs the graph structure via inner product decoder.

  2. Self-training Clustering: Uses confident cluster assignments as soft labels to guide the optimization, iteratively refining clustering results.

The total loss combines reconstruction and clustering objectives:

\[\mathcal{L} = \mathcal{L}_r + \gamma \mathcal{L}_c\]

where:

  • \(\mathcal{L}_r\) is the binary cross-entropy reconstruction loss

  • \(\mathcal{L}_c = KL(P||Q)\) is the clustering loss with Student’s t-distribution

  • \(\gamma\) is the clustering coefficient

Parameters:
  • encoder (torch.nn.Module) – The graph attention encoder (typically GAT-based).

  • decoder (torch.nn.Module, optional) – The decoder module. If set to None, will default to InnerProductDecoder. (default: None)

  • n_clusters (int) – Number of clusters.

  • hidden_channels (int) – Hidden dimension of node embeddings.

  • gamma (float, optional) – Weight for clustering loss. (default: 10.0)

  • update_interval (int, optional) – Number of iterations between target distribution updates. (default: 5)

reset_parameters()[source]

Resets all learnable parameters of the module.

embed(x: Tensor, edge_index: Tensor, **kwargs) Tensor[source]

Computes node embeddings via the encoder.

Return type:

Tensor

decode(z: Tensor, edge_index: Tensor, sigmoid: bool = True) Tensor[source]

Reconstructs edge probabilities via the decoder.

Return type:

Tensor

forward(x: Tensor, edge_index: Tensor, **kwargs) Tensor[source]

Returns cluster assignments (hard labels).

Return type:

Tensor

recon_loss(z: Tensor, pos_edge_index: Tensor, neg_edge_index: Optional[Tensor] = None) Tensor[source]

Computes the binary cross-entropy reconstruction loss.

Given latent variables z, computes the binary cross entropy loss for positive edges pos_edge_index and negative sampled edges.

Parameters:
  • z (torch.Tensor) – The latent space \(\mathbf{Z}\).

  • pos_edge_index (torch.Tensor) – The positive edges to train against.

  • neg_edge_index (torch.Tensor, optional) – The negative edges to train against. If not given, uses negative sampling to calculate negative edges. (default: None)

Return type:

Tensor

pretrain_loss(x: Tensor, edge_index: Tensor, **kwargs) Tensor[source]

Computes pretraining loss (reconstruction only).

Parameters:
Returns:

Tensor – Reconstruction loss.

loss(x: Tensor, edge_index: Tensor, pretrain: bool = False, **kwargs) LossOutput[source]

Computes the total loss.

Parameters:
  • x (torch.Tensor) – Node features.

  • edge_index (torch.Tensor) – Edge indices.

  • pretrain (bool, optional) – If True, only compute reconstruction loss for pretraining. (default: False)

Returns:

LossOutput – LossOutput containing total loss and individual components.

loss_batch(batch: Data, pretrain: bool = False) LossOutput[source]

Computes loss for a mini-batch.

Parameters:
  • batch (Data) – A mini-batch from the loader.

  • pretrain (bool, optional) – If True, only compute reconstruction loss for pretraining.

Returns:

LossOutput – LossOutput containing total loss and individual components.

class DinkNet(encoder: Module, projector: Module, n_clusters: int, hidden_channels: int, corruption: Optional[Callable] = None, alpha: float = 1e-10)[source]

Bases: ClusteringModel

The Dink-Net model from the “Dink-Net: Neural Clustering on Large Graphs” paper (Liu et al., ICML 2023).

Dink-Net unifies representation learning and clustering optimization via:

  1. Node Discriminate Module: Learns discriminative features by distinguishing original vs. augmented nodes

  2. Neural Clustering Module: Optimizes clustering via dilation (push centers apart) and shrink (pull nodes to centers) losses

The total loss is:

\[\mathcal{L} = \mathcal{L}_\text{dilation} + \mathcal{L}_\text{shrink} + \alpha \mathcal{L}_\text{discri}\]
Parameters:
  • encoder (torch.nn.Module) – The encoder module (e.g., GCN, GraphSAGE).

  • projector (torch.nn.Module) – The projection head for discriminative learning.

  • n_clusters (int) – Number of clusters.

  • hidden_channels (int) – Hidden dimension of node embeddings.

  • corruption (Callable, optional) – Corruption function for data augmentation. If None, uses random feature shuffling. (default: None)

  • alpha (float, optional) – Trade-off weight for discriminative loss. (default: 1e-10)

reset_parameters()[source]

Resets all learnable parameters of the module.

embed(x: Tensor, edge_index: Tensor, **kwargs) Tensor[source]

Computes node embeddings.

Return type:

Tensor

forward(x: Tensor, edge_index: Tensor, **kwargs) Tensor[source]

Returns cluster assignments.

Return type:

Tensor

pretrain_loss(x: Tensor, edge_index: Tensor, **kwargs) Tensor[source]

Computes pretraining loss (discriminative only).

Return type:

Tensor

loss(x: Tensor, edge_index: Tensor, pretrain: bool = False, **kwargs) LossOutput[source]

Computes the total loss.

Parameters:
  • x (torch.Tensor) – Node features.

  • edge_index (torch.Tensor) – Edge indices.

  • pretrain (bool, optional) – If True, only compute discriminative loss. (default: False)

Returns:

LossOutput – LossOutput containing total loss and individual components.

loss_batch(batch: Data, pretrain: bool = False) LossOutput[source]

Computes loss for a mini-batch with seed node slicing.

Parameters:
  • batch (Data) – A mini-batch from the loader.

  • pretrain (bool, optional) – If True, only compute discriminative loss.

Returns:

LossOutput – LossOutput containing total loss and individual components.

class DMoN(encoder: Module, n_features: int, n_clusters: int, lam: float = 1.0)[source]

Bases: ClusteringModel

The Deep Modularity Network (DMoN) is proposed in the “Graph Clustering with Graph Neural Networks” paper (Tsitsulin et al., JMLR 2023).

This model performs unsupervised graph clustering by combining a graph encoder (e.g., GCN, GraphSAGE) with the DMoNClusterHead. The encoder produces node embeddings \(\mathbf{Z}\), which are then projected into soft cluster assignments \(\mathbf{S}\). The model jointly optimizes the modularity-based and collapse regularization objectives to learn meaningful community structures in the graph.

The optimization objective consists of two losses:

(1) Spectral modularity loss:

\[\mathcal{L}_m = - \frac{1}{2m} \mathrm{Tr}(\mathbf{S}^\top \mathbf{B} \mathbf{S})\]

where \(\mathbf{B} = \mathbf{A} - \frac{\mathbf{d}\mathbf{d}^\top}{2m}\) is the modularity matrix and \(m\) is the total number of edges. This term encourages nodes that are more densely connected than random to be assigned to the same cluster.

(2) Collapse regularization loss:

\[\mathcal{L}_c = \frac{\sqrt{K}}{N} \left\| \sum_i \mathbf{S}_i^\top \right\|_F - 1\]

which prevents degenerate solutions by penalizing unbalanced or collapsed cluster assignments.

The final training objective is a weighted combination of the two terms:

\[\mathcal{L} = \mathcal{L}_m + \lambda \mathcal{L}_c\]

where \(\lambda\) controls the strength of the regularization.

Parameters:
  • encoder (torch.nn.Module) – Node encoder that outputs node embeddings.

  • n_features (int) – Feature dimension of the encoder outputs.

  • n_clusters (int) – Number of clusters.

  • lam (float, optional) – Regularization coefficient for the collapse loss \(\mathcal{L}_c\). (default: 1.0)

reset_parameters()[source]

Resets all learnable parameters of the module.

embed(*args, **kwargs) Tensor[source]

Compute node embeddings via the encoder.

Return type:

Tensor

forward(*args, **kwargs) Tensor[source]

Predict hard cluster assignments from current parameters.

Return type:

Tensor

loss(x: Tensor, edge_index: Tensor, **kwargs) LossOutput[source]

Computes the DMoN loss with multiple components.

Parameters:
Returns:

LossOutput – LossOutput containing total loss and individual components.

loss_batch(batch: Data)[source]

DMoN currently does not support mini-batch training.

class MinCut(encoder: Module, n_features: int, n_clusters: int, lam: float = 1.0, temperature: float = 1.0)[source]

Bases: ClusteringModel

The MinCut model is based on “Spectral Clustering in Graph Neural Networks for Graph Pooling” (Bianchi et al., 2019).

It performs unsupervised graph clustering by coupling a graph encoder (e.g., GCN, GraphSAGE) with the MinCutClusterHead. The encoder produces node embeddings \(\mathbf{Z}\), which are projected into soft cluster assignments \(\mathbf{S}\). The model jointly optimizes the MinCut objective and the orthogonality regularizer to yield compact, well-separated clusters.

The optimization objective consists of two losses:

(1) MinCut loss:

\[\mathcal{L}_{\text{mincut}} = - \frac{\mathrm{Tr}(\mathbf{S}^\top \mathbf{A} \mathbf{S})} {\mathrm{Tr}(\mathbf{S}^\top \mathbf{D} \mathbf{S})}\]

encouraging large within-cluster connectivity relative to cluster volume.

(2) Orthogonality regularization:

\[\mathcal{L}_{\text{ortho}} = \left\| \frac{\mathbf{S}^\top \mathbf{S}}{\|\mathbf{S}^\top \mathbf{S}\|_F} - \frac{\mathbf{I}_K}{\sqrt{K}} \right\|_F\]

encouraging near-orthogonal cluster assignment columns to avoid collapse.

The final training objective is a weighted combination of the two terms:

\[\mathcal{L} = \mathcal{L}_{\text{mincut}} + \lambda \mathcal{L}_{\text{ortho}}\]

where \(\lambda\) controls the strength of the orthogonality regularization.

Parameters:
  • encoder (torch.nn.Module) – Node encoder that outputs node embeddings.

  • n_features (int) – Feature dimension of the encoder outputs.

  • n_clusters (int) – Number of clusters.

  • lam (float, optional) – Regularization coefficient for the orthogonality loss \(\mathcal{L}_{\text{ortho}}\). (default: 1.0)

  • temperature (float, optional) – Softmax temperature used in the MinCut head. (default: 1.0)

reset_parameters()[source]

Resets all learnable parameters of the module.

embed(*args, **kwargs) Tensor[source]

Compute node embeddings via the encoder.

Return type:

Tensor

forward(*args, **kwargs) Tensor[source]

Predict hard cluster assignments from current parameters.

Return type:

Tensor

loss(x: Tensor, edge_index: Tensor, **kwargs: Any) LossOutput[source]

Computes the MinCut loss with multiple components.

Parameters:
Returns:

LossOutput – LossOutput containing total loss and individual components.

loss_batch(batch: Data, **kwargs: Any)[source]

MinCut currently does not support mini-batch training.

class Neuromap(encoder: Module, n_features: int, n_clusters: int, lam: float = 1.0, alpha: float = 0.15, n_iters: int = 100)[source]

Bases: ClusteringModel

The Neuromap model implements the differentiable map equation from “The Map Equation Goes Neural: Mapping Network Flows with Graph Neural Networks” (Blöcker et al., NeurIPS 2024).

This model performs unsupervised graph clustering by combining a graph encoder (e.g., GCN, GraphSAGE) with the NeuromapClusterHead.

The encoder produces node embeddings \(\mathbf{Z}\), which are projected into soft cluster assignments \(\mathbf{S}\). The model then minimizes the differentiable Map Equation loss, which measures the expected description length of random walks on the graph according to the Minimum Description Length (MDL) principle:

\[\mathcal{L}(A, S) = q \log q - \sum_m q_m \log q_m - \sum_m m_{\text{exit}} \log m_{\text{exit}} - \sum_u p_u \log p_u + \sum_m p_m \log p_m\]

where all terms are derived from the soft community flow structure induced by \(S\).

This loss naturally balances model complexity and data fit without explicit regularization, allowing Neuromap to infer the number of effective communities automatically.

Parameters:
  • encoder (torch.nn.Module) – Node encoder producing node embeddings.

  • n_features (int) – Feature dimension of encoder outputs.

  • n_clusters (int) – Maximum number of clusters to consider.

  • lam (float, optional) – Regularization coefficient for the collapse loss. (default: 1.0)

  • alpha (float, optional) – Teleportation probability. (default: 0.15)

  • n_iters (int, optional) – Power iteration steps for stationary distribution. (default: 100)

reset_parameters()[source]

Resets all learnable parameters of the encoder and the cluster head.

embed(*args, **kwargs) Tensor[source]

Compute node embeddings via the encoder.

Return type:

Tensor

forward(*args, **kwargs) Tensor[source]

Predicts hard cluster assignments from current parameters.

Return type:

Tensor

loss(x: Tensor, edge_index: Tensor, **kwargs: Any) LossOutput[source]

Computes the Neuromap loss with multiple components.

Parameters:
Returns:

LossOutput – LossOutput containing total loss and individual components.

loss_batch(batch: Data, **kwargs: Any)[source]

Neuromap currently does not support mini-batch training.

class GCSBM(encoder: Module, n_features: int, n_clusters: int, variant: str = 'bernoulli', eta: float = 3.0, alpha: float = 1.0)[source]

Bases: ClusteringModel

Stochastic Block Model (GCSBM) based clustering model from the paper “Differentiable Community Detection with Graph Neural Networks and Stochastic Block Models” (Arliss & Mueller, LoG 2025).

This model learns node embeddings via a GNN encoder and performs clustering by maximizing the likelihood of an GCSBM-based generative model. It supports multiple GCSBM variants including Bernoulli, Poisson, and their degree-corrected versions.

The model optimizes:

\[\mathcal{L} = m^{-1} \mathcal{L}_{GCSBM} + \alpha \|\mathbf{1}_k - \mathrm{diag}(\hat{\Theta})\|_F\]

where \(\mathcal{L}_{GCSBM}\) is the negative log-likelihood, \(m\) is the number of edges, and \(\alpha\) controls regularization strength.

Supported Variants:

  • 'bernoulli': Standard Bernoulli GCSBM

  • 'poisson': Partial Poisson GCSBM (suitable for simple graphs)

  • 'bernoulli-dc': Degree-corrected Bernoulli GCSBM

  • 'poisson-dc': Degree-corrected Poisson GCSBM

  • 'match': Graph Matching variant (fastest)

Parameters:
  • encoder (torch.nn.Module) – Node encoder that outputs node embeddings.

  • n_features (int) – Feature dimension of the encoder outputs.

  • n_clusters (int) – Number of clusters.

  • variant (str, optional) – GCSBM variant to use. Options: 'bernoulli', 'poisson', 'bernoulli-dc', 'poisson-dc', 'match'. (default: 'bernoulli')

  • eta (float, optional) – Negative sampling ratio. (default: 3.0)

  • alpha (float, optional) – Regularization strength. (default: 1.0)

reset_parameters()[source]

Resets all learnable parameters of the module.

embed(*args, **kwargs) Tensor[source]

Compute node embeddings via the encoder.

Return type:

Tensor

forward(*args, **kwargs) Tensor[source]

Predict hard cluster assignments from current parameters.

Return type:

Tensor

loss(x: Tensor, edge_index: Tensor, **kwargs: Any) LossOutput[source]

Computes the GCSBM loss with multiple components.

Parameters:
Returns:

LossOutput – LossOutput containing total loss and individual components.

loss_batch(batch: Data, **kwargs: Any) LossOutput[source]

Computes loss for a mini-batch with seed node slicing.

Parameters:

batch (Data) – A mini-batch from the loader.

Returns:

LossOutput – LossOutput containing total loss and individual components.