pyagc.models
The pyagc.models module implements the high-level model classes that
orchestrate the interaction between encoders and cluster heads within the
Encode-Cluster-Optimize framework. Each model defines its own
optimization strategy, managing the forward pass and the computation of
joint losses.
Models are organized into the following categories:
Base Classes: Abstract interfaces for all models (
BaseModel,TrainableModel,ClusteringModel).Traditional Methods: Attribute-only or structure-only baselines (e.g.,
Node2Vec).Non-Parametric Methods: Training-free models using fixed graph filters (e.g.,
SGC,SSGC,NAFS,SAGSC,S2CAG).Deep Decoupled Methods: Models where the encoder is pre-trained with a self-supervised objective and clustering is applied post-hoc (e.g.,
GAE,DGI,CCASSG,S3GC,NS4GC,MAGI).Deep Joint Methods: End-to-end models that jointly optimize representation learning and clustering (e.g.,
DAEGC,DinkNet,DMoN,MinCut,Neuromap,GCSBM).
Base Classes
Unified loss output format for training. |
|
Base interface for all PyAGC models. |
|
Base class for trainable graph models. |
|
Base class for end-to-end clustering models. |
- class LossOutput(total: Tensor, components: Dict[str, float])[source]
Bases:
objectUnified loss output format for training.
This class encapsulates both the total loss (used for backpropagation) and individual loss components (used for logging and monitoring).
- Parameters:
Example
>>> loss_output = LossOutput( ... total=torch.tensor(1.5), ... components={'ali': 0.8, 'nei': 0.5, 'spa': 0.2} ... ) >>> print(loss_output.log_string("Epoch 01: ")) Epoch 01: Loss: 1.5000, ALI: 0.8000, NEI: 0.5000, SPA: 0.2000
- class BaseModel(*args, **kwargs)[source]
-
Base interface for all PyAGC models.
All models must implement
embed()to produce node embeddings. For trainable models, inherit fromTrainableModelinstead.Example
>>> class MyEncoder(BaseModel): ... def embed(self, x, edge_index): ... return self.encoder(x, edge_index)
See also
TrainableModel: For models with loss computationClusteringModel: For end-to-end clustering
- abstract embed(*args: Any, **kwargs: Any) Tensor[source]
Returns node embeddings.
- Parameters:
models (For lookup-based) – x, edge_index, …
models – batch (node indices)
The output shape and type depend on the specific model implementation. Typically returns a
Tensorof shape(num_nodes, hidden_dim).- Return type:
- infer_full(data: Data) Any[source]
Full-graph inference: returns embeddings or predictions for all nodes.
- Parameters:
data (Data) – Input graph data.
- Returns:
Any– Node embeddings or predictions, typically of shape(num_nodes, *).
- infer_batch(loader: NeighborLoader, verbose: bool = True) Any[source]
Mini-batch inference over a NeighborLoader.
For node-level outputs, concatenates only the seed nodes of each batch.
- class TrainableModel[source]
Bases:
BaseModelBase class for trainable graph models.
Subclasses must implement the
loss()method. This class provides default implementations fortrain_full()andtrain_batch()that handle both single-loss and multi-component loss outputs.- The
loss()method can return either: A single
Tensorfor simple lossesA
LossOutputobject for losses with multiple components
- abstract loss(*args: Any, **kwargs: Any) Union[Tensor, LossOutput][source]
Computes the training loss.
- Returns:
Union[Tensor,LossOutput] – Either a scalar lossTensor, or aLossOutputobject containing the total loss and individual components.
- train_full(data: Data, optimizer: Optimizer, epoch: int, verbose: bool = True, **loss_kwargs: Any) float[source]
Runs one epoch of full-batch training.
- Parameters:
- Returns:
float– Loss value of the epoch.
- train_batch(loader: NeighborLoader, optimizer: Optimizer, epoch: int, verbose: bool = True, **loss_kwargs: Any) float[source]
Runs one epoch of mini-batch training.
- Parameters:
loader (NeighborLoader) – The mini-batch loader.
optimizer (torch.optim.Optimizer) – The optimizer.
epoch (int) – Current epoch number.
verbose (bool, optional) – If
True, prints training progress. (default:True)**loss_kwargs – Additional keyword arguments passed to
loss_batch().
- Returns:
float– Average loss value of the epoch.
- loss_batch(batch: Data, **kwargs: Any) Union[Tensor, LossOutput][source]
Computes loss for a mini-batch.
Default implementation simply calls
loss(). Subclasses can override this method to handle batch-specific logic (e.g., slicing to seed nodes).- Parameters:
batch (Data) – A mini-batch from the loader.
**kwargs – Additional keyword arguments.
- Returns:
Union[Tensor,LossOutput] – Loss output, same format asloss().
- The
- class ClusteringModel[source]
Bases:
TrainableModelBase class for end-to-end clustering models.
This class is designed for models that directly output cluster assignments (e.g., DMoN, MinCut) rather than embeddings. It provides a unified interface for clustering tasks by overriding the
infer_full()andinfer_batch()methods to return cluster assignments directly.- Subclasses should implement:
embed(): Returns node embeddingsforward(): Returns hard cluster assignmentsloss(): Computes clustering loss
- abstract forward(*args: Any, **kwargs: Any) Tensor[source]
Returns hard cluster assignments.
- Returns:
Tensor– Cluster assignments of shape(num_nodes,).
- infer_full(data: Data) Tensor[source]
Full-graph inference: returns cluster assignments for all nodes.
- Parameters:
data (Data) – Input graph data.
- Returns:
Tensor– Cluster assignments of shape(num_nodes,).
- infer_batch(loader: NeighborLoader, verbose: bool = True) Tensor[source]
Mini-batch inference over a NeighborLoader.
- initialize_cluster_centers(data: Data, num_layers: int, train_idx: Tensor = None, batch_size: int = 4096, fan_out: int = -1, method: str = 'kmeans', verbose: bool = True)[source]
Initialize cluster centers using K-Means.
Supports two modes: 1. Small graphs: Use all nodes for initialization 2. Large graphs: Use only training nodes or mini-batch inference
- Parameters:
data (Data) – Full graph data.
num_layers (int) – Number of encoder layers.
train_idx (torch.Tensor, optional) – Training node indices. If provided, only use these nodes for initialization. (default:
None)batch_size (int, optional) – Batch size for mini-batch inference. (default:
4096)fan_out (int, optional) – Number of sampled neighbors. (default:
-1)method (str, optional) – Initialization method. (default:
'kmeans')verbose (bool, optional) – If
True, prints initializing progress. (default:True)
Traditional Methods
The Node2Vec model from the "node2vec: Scalable Feature Learning for Networks" paper where random walks of length |
- class Node2Vec(edge_index: Tensor, embedding_dim: int, walk_length: int, context_size: int, walks_per_node: int = 1, p: float = 1.0, q: float = 1.0, num_negative_samples: int = 1, num_nodes: Optional[int] = None, sparse: bool = False, cpu_embedding: bool = False)[source]
Bases:
TrainableModelThe Node2Vec model from the “node2vec: Scalable Feature Learning for Networks” paper where random walks of length
walk_lengthare sampled in a given graph, and node embeddings are learned via negative sampling optimization.- Parameters:
edge_index (torch.Tensor) – The edge indices.
embedding_dim (int) – The size of each embedding vector.
walk_length (int) – The walk length.
context_size (int) – The actual context size which is considered for positive samples. This parameter increases the effective sampling rate by reusing samples across different source nodes.
walks_per_node (int, optional) – The number of walks to sample for each node. (default:
1)p (float, optional) – Likelihood of immediately revisiting a node in the walk. (default:
1)q (float, optional) – Control parameter to interpolate between breadth-first strategy and depth-first strategy (default:
1)num_negative_samples (int, optional) – The number of negative samples to use for each positive sample. (default:
1)num_nodes (int, optional) – The number of nodes. (default:
None)sparse (bool, optional) – If set to
True, gradients w.r.t. to the weight matrix will be sparse. (default:False)cpu_embedding (bool, optional) – If set to
True, stores embeddings on CPU and only moves required embeddings to GPU during training. Essential for very large graphs. (default:False)
- embed(batch: Optional[Tensor] = None, device: Optional[device] = None) Tensor[source]
Returns the embeddings for the nodes in
batch.- Parameters:
batch (torch.Tensor, optional) – Node indices. If
None, returns embeddings for all nodes. (default:None)device (torch.device, optional) – Target device for the embeddings. If None, uses self.compute_device for cpu_embedding mode, or the embedding’s device for standard mode. (default:
None)
- Returns:
Tensor– Node embeddings of shape(num_nodes, embedding_dim)or(batch_size, embedding_dim).
- loader(**kwargs) DataLoader[source]
Creates a DataLoader for training Node2Vec.
- Returns:
DataLoader– DataLoader that samples positive and negative random walks.
- sample(batch: Union[List[int], Tensor]) Tuple[Tensor, Tensor][source]
Samples positive and negative random walks for a batch of nodes.
- loss(pos_rw: Tensor, neg_rw: Tensor) LossOutput[source]
Computes the loss given positive and negative random walks.
- Parameters:
pos_rw (torch.Tensor) – Positive random walks of shape
(num_walks, context_size).neg_rw (torch.Tensor) – Negative random walks of shape
(num_walks, context_size).
- Returns:
LossOutput– LossOutput containing total loss and individual components.
Non-Parametric Methods
These methods construct node representations via fixed filtering operations over the graph topology without learnable encoder parameters. Clustering is typically performed post-hoc on the resulting embeddings.
The non-parametric simple graph convolutional (SGC) operator from the "Simplifying Graph Convolutional Networks" paper (Wu et al., ICML 2019). |
|
The non-parametric simple spectral graph convolutional (SSGC) operator from the "Simple Spectral Graph Convolution" paper (Zhu, Hao, and Piotr Koniusz, ICLR 2021). |
|
Node-Adaptive Feature Smoothing (NAFS) from the "NAFS: A Simple yet Tough-to-beat Baseline for Graph Representation Learning" paper (Zhang et al., ICML 2022). |
|
The Scalable Attributed-Graph Subspace Clustering (SAGSC) model from the "Scalable Attributed-Graph Subspace Clustering" paper (Fettal et al., AAAI 2023). |
|
The S²CAG (Spectral Subspace Clustering for Attributed Graphs) model from the "Spectral Subspace Clustering for Attributed Graphs" paper (Lin et al., KDD 2025). |
- class SGC(K: int = 1, cached: bool = False, add_self_loops: bool = True, **kwargs)[source]
Bases:
MessagePassing,BaseModelThe non-parametric simple graph convolutional (SGC) operator from the “Simplifying Graph Convolutional Networks” paper (Wu et al., ICML 2019). This implementation is adapted from: pyg/sg_conv.
\[\mathbf{X}^{\prime} = {\left(\mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}} \mathbf{\hat{D}}^{-1/2} \right)}^K \mathbf{X},\]where \(\mathbf{\hat{A}} = \mathbf{A} + \mathbf{I}\) denotes the adjacency matrix with inserted self-loops and \(\hat{D}_{ii} = \sum_{j=0} \hat{A}_{ij}\) its diagonal degree matrix. The adjacency matrix can include other values than
1representing edge weights via the optionaledge_weighttensor.- Parameters:
K (int, optional) – Number of hops \(K\). (default:
1)cached (bool, optional) – If set to
True, the layer will cache the computation of \({\left(\mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}} \mathbf{\hat{D}}^{-1/2} \right)}^K \mathbf{X}\) on first execution, and will use the cached version for further executions. This parameter should only be set toTruein transductive learning scenarios. (default:False)add_self_loops (bool, optional) – If set to
False, will not add self-loops to the input graph. (default:True)**kwargs (optional) – Additional arguments of
torch_geometric.nn.conv.MessagePassing.
- Shapes:
input: node features \((|\mathcal{V}|, F_{in})\), edge indices \((2, |\mathcal{E}|)\), edge weights \((|\mathcal{E}|)\) (optional)
output: node features \((|\mathcal{V}|, F_{in})\)
- forward(x: Tensor, edge_index: Union[Tensor, SparseTensor], edge_weight: Optional[Tensor] = None) Tensor[source]
Runs the forward pass of the module.
- Return type:
- message(x_j: Tensor, edge_weight: Tensor) Tensor[source]
Constructs messages from node \(j\) to node \(i\) in analogy to \(\phi_{\mathbf{\Theta}}\) for each edge in
edge_index. This function can take any argument as input which was initially passed topropagate(). Furthermore, tensors passed topropagate()can be mapped to the respective nodes \(i\) and \(j\) by appending_ior_jto the variable name, .e.g.x_iandx_j.- Return type:
- message_and_aggregate(adj_t: Union[Tensor, SparseTensor], x: Tensor) Tensor[source]
Fuses computations of
message()andaggregate()into a single function. If applicable, this saves both time and memory since messages do not explicitly need to be materialized. This function will only gets called in case it is implemented and propagation takes place based on atorch_sparse.SparseTensoror atorch.sparse.Tensor.- Return type:
- class SSGC(alpha: float, K: int = 1, cached: bool = False, add_self_loops: bool = True, **kwargs)[source]
Bases:
MessagePassing,BaseModelThe non-parametric simple spectral graph convolutional (SSGC) operator from the “Simple Spectral Graph Convolution” paper (Zhu, Hao, and Piotr Koniusz, ICLR 2021). This implementation is adapted from: pyg/ssg_conv.
\[\mathbf{X}^{\prime} = \frac{1}{K} \sum_{k=1}^K\left((1-\alpha) {\left(\mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}} \mathbf{\hat{D}}^{-1/2} \right)}^k \mathbf{X}+\alpha \mathbf{X}\right),\]where \(\mathbf{\hat{A}} = \mathbf{A} + \mathbf{I}\) denotes the adjacency matrix with inserted self-loops and \(\hat{D}_{ii} = \sum_{j=0} \hat{A}_{ij}\) its diagonal degree matrix. The adjacency matrix can include other values than
1representing edge weights via the optionaledge_weighttensor.- Parameters:
alpha (float) – Teleport probability \(\alpha \in [0, 1]\).
K (int, optional) – Number of hops \(K\). (default:
1)cached (bool, optional) – If set to
True, the layer will cache the computation of \(\frac{1}{K} \sum_{k=1}^K\left((1-\alpha) {\left(\mathbf{\hat{D}}^{-1/2} \mathbf{\hat{A}} \mathbf{\hat{D}}^{-1/2} \right)}^k \mathbf{X}+ \alpha \mathbf{X}\right)\) on first execution, and will use the cached version for further executions. This parameter should only be set toTruein transductive learning scenarios. (default:False)add_self_loops (bool, optional) – If set to
False, will not add self-loops to the input graph. (default:True)**kwargs (optional) – Additional arguments of
torch_geometric.nn.conv.MessagePassing.
- Shapes:
input: node features \((|\mathcal{V}|, F_{in})\), edge indices \((2, |\mathcal{E}|)\), edge weights \((|\mathcal{E}|)\) (optional)
output: node features \((|\mathcal{V}|, F_{in})\)
- forward(x: Tensor, edge_index: Union[Tensor, SparseTensor], edge_weight: Optional[Tensor] = None) Tensor[source]
Runs the forward pass of the module.
- Return type:
- message(x_j: Tensor, edge_weight: Tensor) Tensor[source]
Constructs messages from node \(j\) to node \(i\) in analogy to \(\phi_{\mathbf{\Theta}}\) for each edge in
edge_index. This function can take any argument as input which was initially passed topropagate(). Furthermore, tensors passed topropagate()can be mapped to the respective nodes \(i\) and \(j\) by appending_ior_jto the variable name, .e.g.x_iandx_j.- Return type:
- message_and_aggregate(adj_t: Union[Tensor, SparseTensor], x: Tensor) Tensor[source]
Fuses computations of
message()andaggregate()into a single function. If applicable, this saves both time and memory since messages do not explicitly need to be materialized. This function will only gets called in case it is implemented and propagation takes place based on atorch_sparse.SparseTensoror atorch.sparse.Tensor.- Return type:
- class NAFS(K: int = 20, r_list: Optional[List[float]] = None, ensemble: str = 'concat', distance: str = 'cosine')[source]
Bases:
BaseModelNode-Adaptive Feature Smoothing (NAFS) from the “NAFS: A Simple yet Tough-to-beat Baseline for Graph Representation Learning” paper (Zhang et al., ICML 2022).
NAFS is a training-free method that constructs node representations by adaptively smoothing node features over different propagation steps. It addresses two key limitations of GNNs: over-smoothing and scalability.
The method consists of two operations:
(1) Node-Adaptive Feature Smoothing:
For each smoothing step \(k\), compute:
\[X^{(k)} = \hat{A}^k X\]where \(\hat{A} = \tilde{D}^{r-1} \tilde{A} \tilde{D}^{-r}\) is the normalized adjacency matrix with parameter \(r\).
Then adaptively combine them using smoothing weights:
\[\hat{X} = \sum_{k=0}^{K} W^{(k)} X^{(k)}\]where \(W^{(k)}\) are diagonal weight matrices computed based on the over-smoothing distance:
\[D_i(k) = \text{dist}([X^{(k)}]_i, X_i)\](2) Feature Ensemble:
Combine smoothed features from different \(r\) values:
\[Z = \bigoplus_{t=1}^{T} \hat{X}^{(t)}\]where \(\bigoplus\) can be concatenation, mean, or max pooling.
- Parameters:
K (int) – Maximum number of smoothing steps. (default:
20)r_list (List[float], optional) – List of normalization parameters for feature ensemble. (default:
[0.0, 0.1, 0.2, 0.3, 0.4, 0.5])ensemble (str, optional) – Ensemble strategy, one of
'mean','max', or'concat'. (default:'concat')distance (str, optional) – Distance function for computing over-smoothing distance, one of
'cosine'or'euclidean'. (default:'cosine')
Example
>>> from pyagc.models import NAFS >>> from pyagc.data import get_dataset >>> >>> # Load data >>> x, edge_index, y = get_dataset('Cora', root='./data') >>> >>> # Create model >>> model = NAFS(K=20, ensemble='concat') >>> >>> # Get embeddings >>> embeddings = model.embed(x, edge_index) >>> >>> # Use for clustering >>> from pyagc.clusters import KMeansClusterHead >>> kmeans = KMeansClusterHead(n_clusters=7) >>> pred = kmeans.fit_predict(embeddings)
- embed(x: Tensor, edge_index: Union[Tensor, SparseTensor], **kwargs) Tensor[source]
Generate node embeddings using NAFS.
- Parameters:
x (Tensor) – Node features of shape
(num_nodes, num_features).edge_index (Tensor) – Edge indices.
- Returns:
Tensor– Node embeddings of shape(num_nodes, num_features * T)if ensemble=’concat’, otherwise(num_nodes, num_features).
- class SAGSC(p: int, k: int, bias: float = 0.7071067811865476)[source]
Bases:
BaseModelThe Scalable Attributed-Graph Subspace Clustering (SAGSC) model from the “Scalable Attributed-Graph Subspace Clustering” paper (Fettal et al., AAAI 2023).
SAGSC consists of two stages:
(1) Node Embedding Generation (SGC):
\[H = S^p X\]where \(S = \hat{D}^{-1/2} \hat{A} \hat{D}^{-1/2}\) is the normalized adjacency and \(p\) is the propagation order.
(2) Implicit Subspace Clustering:
SAGSC avoids learning a dense coefficient matrix \(C \in \mathbb{R}^{n \times n}\) by imposing a low–rank factorization:
\[C = U U^\top,\quad U^\top U = I_k.\]\(U\) is obtained from the top-\(k\) left singular vectors of \(H\).
To construct a non–negative affinity matrix, SAGSC uses a quadratic kernel applied row-wise:
\[Q = \Phi(U), \qquad M = Q Q^\top \ge 0.\]Then the normalized matrix
\[\tilde{Q} = Q D^{-1/2},\]is decomposed via SVD, and the rows of the top \(k\) singular vectors (excluding the trivial first one) are clustered using \(k\)-means.
SAGSC performs implicit spectral clustering without explicitly forming the affinity matrix \(M \in \mathbb{R}^{n \times n}\).
- Parameters:
- embed(*args, **kwargs) Tensor[source]
Computes the SAGSC latent representation used for clustering.
- Steps:
\(H = S^p X\) (SGC propagation)
\(U =\) top-\(k\) left singular vectors of \(H\)
\(Q = \Phi(U)\)
\(D =\) row sum vector of \(Q Q^\top\) (implemented implicitly)
\(\tilde{Q} = Q D^{-1/2}\)
\(Z =\) singular vectors 2 to \(k+1\) of \(\tilde{Q}\)
- Parameters:
x (Tensor) – Node features of shape
(n, d).edge_index (Adj) – Edge indices.
edge_weight (Tensor, optional) – Edge weights.
- Returns:
Tensor– Spectral embedding of shape(n, k).
- class S2CAG(n_clusters: int, T: int = 10, alpha: float = 0.9, tau: int = 7, oversampling: int = 10)[source]
Bases:
BaseModelThe S²CAG (Spectral Subspace Clustering for Attributed Graphs) model from the “Spectral Subspace Clustering for Attributed Graphs” paper (Lin et al., KDD 2025).
S²CAG performs unsupervised graph clustering by computing Normalized Smoothed Representations (NSR) via graph Laplacian smoothing, then extracting cluster structures through truncated SVD and spectral rounding.
The model optimizes the following objective:
\[\min_{S} \|Z - SZ\|_F^2 + \|S\|_* + \|S^T S - I\|_F^2\]where \(Z\) is the NSR matrix, \(S\) is the self-expressive matrix, and \(\|\cdot\|_*\) denotes the nuclear norm.
Theoretically, S²CAG is equivalent to minimizing the total conductance of clusters on an affinity graph constructed from NSR.
- Parameters:
n_clusters (int) – Number of clusters.
T (int, optional) – Number of propagation steps. (default:
10)alpha (float, optional) – Decay factor for smoothing. (default:
0.9)tau (int, optional) – Number of iterations for SVD approximation. (default:
7)oversampling (int, optional) – Oversampling parameter for randomized SVD. (default:
10)
Example
>>> from pyagc.models import S2CAG >>> from pyagc.data import get_dataset >>> >>> # Load data >>> x, edge_index, y = get_dataset('Cora', root='./data') >>> n_clusters = int(y.max()) + 1 >>> >>> # Create model >>> model = S2CAG(n_clusters=n_clusters, T=20, alpha=0.9) >>> >>> # Get cluster assignments >>> clusters = model.embed(x, edge_index)
- embed(x: Tensor, edge_index: Tensor, **kwargs) Tensor[source]
Computes cluster assignments via S²CAG.
- Parameters:
x (Tensor) – Node features of shape
(num_nodes, num_features).edge_index (Tensor) – Edge indices of shape
(2, num_edges).
- Returns:
Tensor– Node embeddings of shape(num_nodes, n_clusters).
Deep Decoupled Methods
In decoupled methods, the encoder is pre-trained using a self-supervised objective \(\mathcal{L}_{\text{rep}}\), and a discrete clustering algorithm (e.g., KMeans) is applied to the learned embeddings post-hoc. The clustering process is not part of the training loop.
The Graph Auto-Encoder model from the "Variational Graph Auto-Encoders" paper based on user-defined encoder and decoder models. |
|
The Adversarially Regularized Graph Auto-Encoder model from the "Adversarially Regularized Graph Autoencoder for Graph Embedding" paper. |
|
The Deep Graph Infomax (DGI) model from the "Deep Graph Infomax" paper (Veličković et al., ICLR 2019)based on user-defined encoder and summary model \(\mathcal{E}\) and \(\mathcal{R}\) respectively, and a corruption function \(\mathcal{C}\). |
|
The CCA-SSG (Canonical Correlation Analysis-inspired Self-Supervised GNN) model is proposed in the From Canonical Correlation Analysis to Self-supervised Graph Neural Networks paper (Zhang et al., NeurIPS 2021). |
|
The G-BT (Graph Barlow Twins) model is proposed in the "Graph Barlow Twins: A Self-Supervised Representation Learning Framework for Graphs" paper (Bielak et al., KBS 2022). |
|
The S3GC (Scalable Self-Supervised Graph Clustering) model from the "S3GC: Scalable Self-Supervised Graph Clustering" paper (Devvrit et al., NeurIPS 2022). |
|
The NS4GC (Node Similarity-guided Contrastive Graph Clustering) model is proposed in the "Reliable Node Similarity Matrix Guided Contrastive Graph Clustering" paper (Liu et al., TKDE 2024). |
|
The MAGI (Modularity-Aware Graph clustering via contrastive learnIng) model from the "Revisiting Modularity Maximization for Graph Clustering: A Contrastive Learning Perspective" paper (Liu et al., KDD 2024). |
- class GAE(encoder: Module, decoder: Optional[Module] = None)[source]
Bases:
TrainableModelThe Graph Auto-Encoder model from the “Variational Graph Auto-Encoders” paper based on user-defined encoder and decoder models.
- Parameters:
encoder (torch.nn.Module) – The encoder module.
decoder (torch.nn.Module, optional) – The decoder module. If set to
None, will default to thetorch_geometric.nn.models.InnerProductDecoder. (default:None)
- decode(*args, **kwargs) Tensor[source]
Runs the decoder and computes edge probabilities.
- Return type:
- recon_loss(z: Tensor, pos_edge_index: Tensor, neg_edge_index: Optional[Tensor] = None) Tensor[source]
Given latent variables
z, computes the binary cross entropy loss for positive edgespos_edge_indexand negative sampled edges.- Parameters:
z (torch.Tensor) – The latent space \(\mathbf{Z}\).
pos_edge_index (torch.Tensor) – The positive edges to train against.
neg_edge_index (torch.Tensor, optional) – The negative edges to train against. If not given, uses negative sampling to calculate negative edges. (default:
None)
- Return type:
- loss(x: Tensor, edge_index: Tensor, **kwargs) Tensor[source]
Computes the reconstruction loss for GAE.
- Parameters:
x (torch.Tensor) – Node features.
edge_index (torch.Tensor) – Edge indices (positive edges).
- Returns:
Tensor– Reconstruction loss as a scalar tensor.
- class ARGA(encoder: Module, discriminator: Module, decoder: Optional[Module] = None)[source]
Bases:
GAEThe Adversarially Regularized Graph Auto-Encoder model from the “Adversarially Regularized Graph Autoencoder for Graph Embedding” paper.
Note
ARGA requires a two-phase training procedure (encoder + discriminator). Use
train_encoder()andtrain_discriminator()separately, or implement a custom training loop.- Parameters:
encoder (torch.nn.Module) – The encoder module.
discriminator (torch.nn.Module) – The discriminator module.
decoder (torch.nn.Module, optional) – The decoder module. If set to
None, will default to thetorch_geometric.nn.models.InnerProductDecoder. (default:None)
- reg_loss(z: Tensor) Tensor[source]
Computes the regularization loss of the encoder.
- Parameters:
z (torch.Tensor) – The latent space \(\mathbf{Z}\).
- Return type:
- discriminator_loss(z: Tensor) Tensor[source]
Computes the loss of the discriminator.
- Parameters:
z (torch.Tensor) – The latent space \(\mathbf{Z}\).
- Return type:
- loss(x: Tensor, edge_index: Tensor, **kwargs) LossOutput[source]
Computes the ARGA encoder loss with reconstruction and regularization components.
- Parameters:
x (torch.Tensor) – Node features.
edge_index (torch.Tensor) – Edge indices (positive edges).
- Returns:
LossOutput– LossOutput containing total loss and individual components.
- loss_batch(batch: Data) LossOutput[source]
Computes encoder loss for a mini-batch with seed node slicing.
- Parameters:
batch (Data) – A mini-batch from the loader.
- Returns:
LossOutput– LossOutput containing total loss and individual components.
- train_encoder(data: Data, optimizer: Optimizer, epoch: int, verbose: bool = True) float[source]
Trains the encoder for one epoch.
This is equivalent to
train_full()but provided for clarity in the two-phase ARGA training procedure.
- train_discriminator(data: Data, optimizer: Optimizer, epoch: int, verbose: bool = True) float[source]
Trains the discriminator for one epoch.
- Parameters:
data (Data) – The input full graph data.
optimizer (torch.optim.Optimizer) – The optimizer for discriminator parameters.
epoch (int) – Current epoch number.
verbose (bool, optional) – If
True, prints training progress. (default:True)
- Returns:
float– Discriminator loss value of the epoch.
- class DGI(hidden_channels: int, encoder: Module, summary: Optional[Callable] = None, corruption: Optional[Callable] = None)[source]
Bases:
TrainableModelThe Deep Graph Infomax (DGI) model from the “Deep Graph Infomax” paper (Veličković et al., ICLR 2019)based on user-defined encoder and summary model \(\mathcal{E}\) and \(\mathcal{R}\) respectively, and a corruption function \(\mathcal{C}\).
DGI maximizes mutual information between patch representations and corresponding high-level summaries of the graph by training a discriminator to distinguish between positive samples (real node-graph pairs) and negative samples (corrupted node-graph pairs).
This implementation is adapted from: pyg/deep_graph_infomax.
- Parameters:
hidden_channels (int) – The latent space dimensionality.
encoder (torch.nn.Module) – The encoder module \(\mathcal{E}\).
summary (callable, optional) – The readout function \(\mathcal{R}\) that computes graph-level representations. If
None, uses global mean pooling with sigmoid activation. (default:None)corruption (callable, optional) – The corruption function \(\mathcal{C}\) that generates negative samples. If
None, uses random feature shuffling. (default:None)
Example
>>> from pyagc.models import DGI >>> from torch_geometric.nn import GCN >>> >>> # With default summary and corruption >>> encoder = GCN(in_channels=128, hidden_channels=64, num_layers=2) >>> model = DGI(hidden_channels=64, encoder=encoder) >>> >>> # With custom functions >>> def custom_corruption(x, edge_index): ... return x + torch.randn_like(x) * 0.1, edge_index >>> >>> def custom_summary(z): ... return z.max(dim=0)[0] >>> >>> model = DGI( ... hidden_channels=64, ... encoder=encoder, ... summary=custom_summary, ... corruption=custom_corruption ... )
- forward(*args, **kwargs) Tuple[Tensor, Tensor, Tensor][source]
Returns the latent space for the input arguments, their corruptions and their summary representation.
- discriminate(z: Tensor, summary: Tensor, sigmoid: bool = True) Tensor[source]
Given the patch-summary pair
zandsummary, computes the probability scores assigned to this patch-summary pair.- Parameters:
z (torch.Tensor) – The latent space.
summary (torch.Tensor) – The summary vector.
sigmoid (bool, optional) – If set to
False, does not apply the logistic sigmoid function to the output. (default:True)
- Return type:
- loss(x: Tensor, edge_index: Tensor, **kwargs) LossOutput[source]
Computes the mutual information maximization objective.
- Return type:
- loss_batch(batch: Data) LossOutput[source]
Computes loss for a mini-batch with seed node slicing.
- Return type:
- class CCASSG(encoder: Module, transform1: BaseTransform, transform2: BaseTransform, lam: float = 0.001)[source]
Bases:
TrainableModelThe CCA-SSG (Canonical Correlation Analysis-inspired Self-Supervised GNN) model is proposed in the From Canonical Correlation Analysis to Self-supervised Graph Neural Networks paper (Zhang et al., NeurIPS 2021).
CCASSG aims to learn node representations by maximizing the agreement between embeddings from two augmented graph views while decorrelating the feature dimensions of each view.
The loss function combines an invariance term and a decorrelation term:
\[\mathcal{L} = \underbrace{\| \tilde{\boldsymbol{Z}}^{(1)} - \tilde{\boldsymbol{Z}}^{(2)} \|_F^2}_{\text{invariance}} + \lambda \left( \| \tilde{\boldsymbol{Z}}^{(1)\top} \tilde{\boldsymbol{Z}}^{(1)} - \boldsymbol{I} \|_F^2 + \| \tilde{\boldsymbol{Z}}^{(2)\top} \tilde{\boldsymbol{Z}}^{(2)} - \boldsymbol{I} \|_F^2 \right)\]where:
\(\tilde{\boldsymbol{Z}}^{(i)}\) is the column-normalized embedding of view \(i\);
\(\boldsymbol{I}\) is the identity matrix;
\(\lambda\) controls the strength of decorrelation regularization.
- Parameters:
encoder (torch.nn.Module) – The encoder shared across both views.
transform1 (torch_geometric.transforms.BaseTransform) – The 1-st graph view transformation.
transform2 (torch_geometric.transforms.BaseTransform) – The 2-nd graph view transformation.
lam (float) – Weight of decorrelation loss (default:
1e-3).
- forward(*args, **kwargs) Tuple[Tensor, Tensor][source]
Generates embeddings from two graph augmentations.
- loss(x: Tensor, edge_index: Tensor, **kwargs) LossOutput[source]
Computes the CCA-SSG loss with multiple components.
- Parameters:
x (torch.Tensor) – Node features.
edge_index (torch.Tensor) – Edge indices.
- Returns:
LossOutput– LossOutput containing total loss and individual components.
- loss_batch(batch: Data) LossOutput[source]
Computes loss for a mini-batch with seed node slicing.
- Parameters:
batch (Data) – A mini-batch from the loader.
- Returns:
LossOutput– LossOutput containing total loss and individual components.
- class GBT(encoder: Module, transform1: BaseTransform, transform2: BaseTransform, lam: float = 0.005)[source]
Bases:
TrainableModelThe G-BT (Graph Barlow Twins) model is proposed in the “Graph Barlow Twins: A Self-Supervised Representation Learning Framework for Graphs” paper (Bielak et al., KBS 2022).
G-BT learns node representations by maximizing the similarity between two embeddings of the same node under different augmentations, while reducing redundancy between dimensions via a decorrelation objective.
The loss function is:
\[\mathcal{L} = \sum_i (1 - \mathcal{C}_{ii})^2 + \lambda \sum_{i}\sum_{j \ne i} \mathcal{C}_{ij}^2\]where:
\(\mathcal{C}\) is the cross-correlation matrix between normalized embeddings from two views: \(\mathcal{C}_{ij} = \sum_b \tilde{z}^{(1)}_{b,i} \tilde{z}^{(2)}_{b,j}\).
\(\tilde{z}\) denotes the batch-normalized node embedding.
\(\lambda\) balances invariance and redundancy reduction.
- Parameters:
encoder (torch.nn.Module) – The shared encoder used to produce node embeddings.
transform1 (torch_geometric.transforms.BaseTransform) – The first graph view transformation.
transform2 (torch_geometric.transforms.BaseTransform) – The second graph view transformation.
lam (float) – Weight for redundancy reduction term. (default:
5e-3)
- forward(*args, **kwargs) Tuple[Tensor, Tensor][source]
Generates embeddings from two graph augmentations.
- loss(x: Tensor, edge_index: Tensor, **kwargs) LossOutput[source]
Computes the GBT loss with multiple components.
- Parameters:
x (torch.Tensor) – Node features.
edge_index (torch.Tensor) – Edge indices.
- Returns:
LossOutput– LossOutput containing total loss and individual components.
- loss_batch(batch: Data) LossOutput[source]
Computes loss for a mini-batch with seed node slicing.
- Parameters:
batch (Data) – A mini-batch from the loader.
- Returns:
LossOutput– LossOutput containing total loss and individual components.
- class S3GC(edge_index: Tensor, num_nodes: int, in_channels: int, hidden_channels: int, walk_length: int, context_size: int, walks_per_node: int = 1, num_negative_samples: int = 1, p: float = 1.0, q: float = 1.0, scale: Literal['small', 'medium', 'large', 'auto'] = 'auto', keep_embeddings_on_cpu: bool = False)[source]
Bases:
TrainableModelThe S3GC (Scalable Self-Supervised Graph Clustering) model from the “S3GC: Scalable Self-Supervised Graph Clustering” paper (Devvrit et al., NeurIPS 2022).
S3GC uses a simple GCN-based encoder combined with contrastive learning to learn clusterable node representations. The architecture adapts based on graph scale:
Small (< 50K nodes): Simple linear encoder
Medium (50K - 3M nodes): 2-layer MLP encoder
Large (> 3M nodes): 2-layer MLP with memory-efficient embeddings
The encoder combines:
Direct attribute transformation: \(f(\tilde{A}X)\)
Diffusion-based transformation: \(g(S_kX)\)
Learnable node embeddings: \(I\)
Training uses SimCLR-style contrastive loss where nodes sampled via random walks are positives, and randomly sampled nodes are negatives.
- Parameters:
edge_index (Tensor) – Edge indices of the graph.
num_nodes (int) – Number of nodes in the graph.
in_channels (int) – Input feature dimension.
hidden_channels (int) – Output embedding dimension.
walk_length (int) – Length of random walks for positive sampling.
context_size (int) – Context size for random walk (should be \(\leq\) walk_length).
walks_per_node (int, optional) – Number of random walks per node. (default:
1)num_negative_samples (int, optional) – Number of negative samples per positive. (default:
1)p (float, optional) – Return parameter for random walk. (default:
1.0)q (float, optional) – In-out parameter for random walk. (default:
1.0)scale (str, optional) – Graph scale, one of
['small', 'medium', 'large', 'auto']. If'auto', automatically determined by num_nodes. (default:'auto')
Example
>>> from pyagc.models.s3gc import S3GC, precompute_features >>> from pyagc.data import get_dataset >>> >>> # Load data >>> x, edge_index, y = get_dataset('Cora', root='./data') >>> >>> # Precompute features >>> AX, SX = precompute_features(x, edge_index, x.size(0), method='ppr') >>> >>> # Create model (automatically detects 'small' scale) >>> model = S3GC( ... edge_index=edge_index, ... num_nodes=x.size(0), ... in_channels=x.size(1), ... hidden_channels=256, ... walk_length=3, ... context_size=3 ... ) >>> >>> # Set precomputed features >>> model.set_precomputed_features(AX, SX) >>> >>> # Train >>> loader = model.loader(batch_size=2708, shuffle=True) >>> optimizer = torch.optim.Adam(model.parameters(), lr=0.01) >>> for epoch in range(1, 201): ... loss = model.train_epoch(loader, optimizer, epoch)
- set_precomputed_features(AX: Tensor, SX: Tensor, keep_on_cpu: bool = False)[source]
Sets precomputed \(\tilde{A}X\) and \(S_kX\) matrices.
This should be called before training to avoid recomputing these matrices in every forward pass.
- Parameters:
AX (Tensor) – Precomputed \(\tilde{A}X\) of shape
(num_nodes, in_channels).SX (Tensor) – Precomputed \(S_kX\) of shape
(num_nodes, in_channels).keep_on_cpu (bool) – If True, keep features on CPU to save GPU memory
- pos_sample(batch: Tensor) Tensor[source]
Samples positive nodes via biased random walks.
- Return type:
- neg_sample(batch: Tensor) Tensor[source]
Samples negative nodes randomly or cluster-aware.
- Return type:
- sample(batch: Union[List[int], Tensor]) Tuple[Tensor, Tensor][source]
Samples positive and negative random walks for a batch.
- loader(**kwargs) DataLoader[source]
Creates a DataLoader for S3GC training.
- Returns:
DataLoader– DataLoader that samples nodes and generates positive/negative walks.
- loss(pos_rw: Tensor, neg_rw: Tensor, node_mapping: Tensor) LossOutput[source]
Computes the SimCLR-style contrastive loss.
- Parameters:
pos_rw (Tensor) – Positive random walks of shape
(num_walks, context_size).neg_rw (Tensor) – Negative random walks of shape
(num_walks, context_size).node_mapping (Tensor) – Mapping from global node IDs to batch-local IDs.
- Returns:
LossOutput– LossOutput containing total loss and individual components.
- class NS4GC(encoder: Module, transform1: BaseTransform, transform2: BaseTransform, s: float = 0.6, tau: float = 0.1, lam: float = 1.0, gam: float = 1.0)[source]
Bases:
TrainableModelThe NS4GC (Node Similarity-guided Contrastive Graph Clustering) model is proposed in the “Reliable Node Similarity Matrix Guided Contrastive Graph Clustering” paper (Liu et al., TKDE 2024).
NS4GC aims to learn node representations suitable for clustering via contrastive learning guided by a node similarity matrix computed from two augmented views of the graph.
The model uses the following contrastive loss function:
\[\mathcal{L} = \mathcal{L}_{\text{ali}} + \lambda \mathcal{L}_{\text{nei}} + \gamma \mathcal{L}_{\text{spa}}\]where:
\(\boldsymbol{S} = \boldsymbol{Z}^{(1)} (\boldsymbol{Z}^{(2)})^{\top}\) is the similarity matrix computed by inner product between two views;
\(\mathcal{L}_{\text{ali}} = -\frac{1}{N} \sum_i \boldsymbol{S}_{ii}\) encourages aligned embeddings from two views;
\(\mathcal{L}_{\text{nei}} = -\frac{1}{|\mathcal{E}|} \sum_{(i,j) \in \mathcal{E}} \boldsymbol{S}_{ij}\) enforces consistency for neighbors;
\(\mathcal{L}_{\text{spa}} = \mathbb{E}_{(i,j) \notin \mathcal{E}} \sigma((\boldsymbol{S}_{ij} - s) / \tau)\) encourages sparsity of similarity values between non-neighbor pairs. \(\sigma(x) = 1 / (1 + \exp(-x))\) is the sigmoid function.
- Parameters:
encoder (torch.nn.Module) – The encoder shared across both views.
transform1 (torch_geometric.transforms.BaseTransform) – The 1-st graph view transformation.
transform2 (torch_geometric.transforms.BaseTransform) – The 2-nd graph view transformation.
s (float) – Threshold for the sparsity loss term (default:
0.6).tau (float) – Temperature for the sigmoid function in sparsity loss (default:
0.1).lam (float) – Weight for the neighborhood consistency term (default:
1.0).gam (float) – Weight for the sparsity loss term (default:
1.0).
Example
>>> from pyagc.models import NS4GC >>> from pyagc.transforms import GSSLTransform >>> from pyagc.encoders import create_tuned_gnn >>> >>> # Create encoder >>> encoder = create_tuned_gnn('gcn', in_channels=128, hidden_channels=64, num_layers=2) >>> >>> # Create augmentation transforms >>> transform1 = GSSLTransform(p_feat_mask=0.2, p_edge_drop=0.3) >>> transform2 = GSSLTransform(p_feat_mask=0.2, p_edge_drop=0.3) >>> >>> # Create model >>> model = NS4GC( ... encoder=encoder, ... transform1=transform1, ... transform2=transform2, ... s=0.6, ... tau=0.1, ... lam=1.0, ... gam=1.0 ... )
- forward(*args, **kwargs) Tuple[Tensor, Tensor][source]
Generates embeddings from two graph augmentations.
- loss(x: Tensor, edge_index: Tensor, **kwargs) LossOutput[source]
Computes the NS4GC loss with multiple components.
- Parameters:
x (torch.Tensor) – Node features.
edge_index (torch.Tensor) – Edge indices.
- Returns:
LossOutput– LossOutput containing total loss and individual components.
- loss_batch(batch: Data) LossOutput[source]
Computes loss for a mini-batch with seed node slicing.
- Parameters:
batch (Data) – A mini-batch from the loader.
- Returns:
LossOutput– LossOutput containing total loss and individual components.
- class MAGI(encoder: Module, tau: float = 0.5, scale_embeddings: bool = True)[source]
Bases:
TrainableModelThe MAGI (Modularity-Aware Graph clustering via contrastive learnIng) model from the “Revisiting Modularity Maximization for Graph Clustering: A Contrastive Learning Perspective” paper (Liu et al., KDD 2024).
MAGI establishes the connection between modularity maximization and graph contrastive learning, where positive and negative samples are naturally guided by the modularity matrix. The model uses a community-aware pretext task based on two-stage random walks to capture high-order proximity within communities.
The loss function follows InfoNCE-style contrastive learning:
\[\mathcal{L} = -\sum_{v \in \mathcal{B}} \sum_{u \in \mathcal{M}^+_v} \log \frac{\exp(\mathbf{z}_v^\top \mathbf{z}_u / \tau)} {\sum_{u \in \mathcal{M}^+_v} \exp(\mathbf{z}_v^\top \mathbf{z}_u / \tau) + \sum_{u' \in \mathcal{M}^-_v} \exp(\mathbf{z}_v^\top \mathbf{z}_{u'} / \tau)}\]where:
\(\mathcal{M}^+_v = \{u \mid B_{vu} > 0\}\) are positive samples (same community).
\(\mathcal{M}^-_v = \{u \mid B_{vu} \leq 0\}\) are negative samples (different communities).
\(B_{vu}\) is the modularity coefficient computed via two-stage random walks.
\(\tau\) is the temperature parameter.
- Parameters:
encoder (torch.nn.Module) – The GNN encoder module.
tau (float, optional) – Temperature parameter for contrastive loss. (default:
0.5)scale_embeddings (bool, optional) – Whether to apply min-max scaling to embeddings before normalization. (default:
True)
- embed(*args, **kwargs) Tensor[source]
Computes node embeddings.
- Returns:
Tensor– Node embeddings of shape(num_nodes, hidden_dim).
- loss(x: Tensor, edge_index: Tensor, pos_pairs: Tensor, **kwargs) LossOutput[source]
Computes the MAGI contrastive loss for full-graph training.
- Parameters:
x (torch.Tensor) – Node feature matrix of shape
(num_nodes, num_features).edge_index (torch.Tensor) – Edge indices of shape
(2, num_edges).pos_pairs (torch.Tensor) – Precomputed positive sample pairs of shape
(2, num_pos_edges)based on modularity matrix.**kwargs – Additional arguments for the encoder.
- Returns:
LossOutput– LossOutput containing the total loss.
- loss_batch(batch: Data) LossOutput[source]
Computes loss for a mini-batch with expanded seed nodes.
In mini-batch training, the positive pairs are defined only among the expanded batch seed nodes (obtained via two-stage random walks).
- Parameters:
batch (Data) – A mini-batch from the MAGI loader containing: -
x: Node features (including neighbors) -edge_index: Sampled edges -pos_pairs: Positive pairs for expanded batch seed nodes -expanded_batch_size: Number of expanded seed nodes- Returns:
LossOutput– LossOutput containing the total loss.
Deep Joint Methods
In joint methods, the encoder and cluster head are optimized end-to-end.
The clustering objective \(\mathcal{L}_{\text{clust}}\) provides
supervisory signals that directly refine the learned representations.
These models inherit from ClusteringModel and output cluster
assignments directly.
Deep Attentional Embedded Graph Clustering model from the "Attributed Graph Clustering: A Deep Attentional Embedding Approach" paper (Wang et al., IJCAI 2019). |
|
The Dink-Net model from the "Dink-Net: Neural Clustering on Large Graphs" paper (Liu et al., ICML 2023). |
|
The Deep Modularity Network (DMoN) is proposed in the "Graph Clustering with Graph Neural Networks" paper (Tsitsulin et al., JMLR 2023). |
|
The MinCut model is based on "Spectral Clustering in Graph Neural Networks for Graph Pooling" (Bianchi et al., 2019). |
|
The Neuromap model implements the differentiable map equation from "The Map Equation Goes Neural: Mapping Network Flows with Graph Neural Networks" (Blöcker et al., NeurIPS 2024). |
|
Stochastic Block Model (GCSBM) based clustering model from the paper "Differentiable Community Detection with Graph Neural Networks and Stochastic Block Models" (Arliss & Mueller, LoG 2025). |
- class DAEGC(encoder: Module, n_clusters: int, hidden_channels: int, decoder: Optional[Module] = None, gamma: float = 10.0, update_interval: int = 5)[source]
Bases:
ClusteringModelDeep Attentional Embedded Graph Clustering model from the “Attributed Graph Clustering: A Deep Attentional Embedding Approach” paper (Wang et al., IJCAI 2019).
DAEGC jointly optimizes graph embedding and clustering through:
Graph Attentional Autoencoder: Learns representations by encoding both structure and content with attention mechanism, then reconstructs the graph structure via inner product decoder.
Self-training Clustering: Uses confident cluster assignments as soft labels to guide the optimization, iteratively refining clustering results.
The total loss combines reconstruction and clustering objectives:
\[\mathcal{L} = \mathcal{L}_r + \gamma \mathcal{L}_c\]where:
\(\mathcal{L}_r\) is the binary cross-entropy reconstruction loss
\(\mathcal{L}_c = KL(P||Q)\) is the clustering loss with Student’s t-distribution
\(\gamma\) is the clustering coefficient
- Parameters:
encoder (torch.nn.Module) – The graph attention encoder (typically GAT-based).
decoder (torch.nn.Module, optional) – The decoder module. If set to
None, will default toInnerProductDecoder. (default:None)n_clusters (int) – Number of clusters.
hidden_channels (int) – Hidden dimension of node embeddings.
gamma (float, optional) – Weight for clustering loss. (default:
10.0)update_interval (int, optional) – Number of iterations between target distribution updates. (default:
5)
- embed(x: Tensor, edge_index: Tensor, **kwargs) Tensor[source]
Computes node embeddings via the encoder.
- Return type:
- decode(z: Tensor, edge_index: Tensor, sigmoid: bool = True) Tensor[source]
Reconstructs edge probabilities via the decoder.
- Return type:
- forward(x: Tensor, edge_index: Tensor, **kwargs) Tensor[source]
Returns cluster assignments (hard labels).
- Return type:
- recon_loss(z: Tensor, pos_edge_index: Tensor, neg_edge_index: Optional[Tensor] = None) Tensor[source]
Computes the binary cross-entropy reconstruction loss.
Given latent variables
z, computes the binary cross entropy loss for positive edgespos_edge_indexand negative sampled edges.- Parameters:
z (torch.Tensor) – The latent space \(\mathbf{Z}\).
pos_edge_index (torch.Tensor) – The positive edges to train against.
neg_edge_index (torch.Tensor, optional) – The negative edges to train against. If not given, uses negative sampling to calculate negative edges. (default:
None)
- Return type:
- pretrain_loss(x: Tensor, edge_index: Tensor, **kwargs) Tensor[source]
Computes pretraining loss (reconstruction only).
- Parameters:
x (torch.Tensor) – Node features.
edge_index (torch.Tensor) – Edge indices.
- Returns:
Tensor– Reconstruction loss.
- loss(x: Tensor, edge_index: Tensor, pretrain: bool = False, **kwargs) LossOutput[source]
Computes the total loss.
- Parameters:
x (torch.Tensor) – Node features.
edge_index (torch.Tensor) – Edge indices.
pretrain (bool, optional) – If
True, only compute reconstruction loss for pretraining. (default:False)
- Returns:
LossOutput– LossOutput containing total loss and individual components.
- loss_batch(batch: Data, pretrain: bool = False) LossOutput[source]
Computes loss for a mini-batch.
- Parameters:
- Returns:
LossOutput– LossOutput containing total loss and individual components.
- class DinkNet(encoder: Module, projector: Module, n_clusters: int, hidden_channels: int, corruption: Optional[Callable] = None, alpha: float = 1e-10)[source]
Bases:
ClusteringModelThe Dink-Net model from the “Dink-Net: Neural Clustering on Large Graphs” paper (Liu et al., ICML 2023).
Dink-Net unifies representation learning and clustering optimization via:
Node Discriminate Module: Learns discriminative features by distinguishing original vs. augmented nodes
Neural Clustering Module: Optimizes clustering via dilation (push centers apart) and shrink (pull nodes to centers) losses
The total loss is:
\[\mathcal{L} = \mathcal{L}_\text{dilation} + \mathcal{L}_\text{shrink} + \alpha \mathcal{L}_\text{discri}\]- Parameters:
encoder (torch.nn.Module) – The encoder module (e.g., GCN, GraphSAGE).
projector (torch.nn.Module) – The projection head for discriminative learning.
n_clusters (int) – Number of clusters.
hidden_channels (int) – Hidden dimension of node embeddings.
corruption (Callable, optional) – Corruption function for data augmentation. If None, uses random feature shuffling. (default:
None)alpha (float, optional) – Trade-off weight for discriminative loss. (default:
1e-10)
- embed(x: Tensor, edge_index: Tensor, **kwargs) Tensor[source]
Computes node embeddings.
- Return type:
- forward(x: Tensor, edge_index: Tensor, **kwargs) Tensor[source]
Returns cluster assignments.
- Return type:
- pretrain_loss(x: Tensor, edge_index: Tensor, **kwargs) Tensor[source]
Computes pretraining loss (discriminative only).
- Return type:
- loss(x: Tensor, edge_index: Tensor, pretrain: bool = False, **kwargs) LossOutput[source]
Computes the total loss.
- Parameters:
x (torch.Tensor) – Node features.
edge_index (torch.Tensor) – Edge indices.
pretrain (bool, optional) – If True, only compute discriminative loss. (default:
False)
- Returns:
LossOutput– LossOutput containing total loss and individual components.
- loss_batch(batch: Data, pretrain: bool = False) LossOutput[source]
Computes loss for a mini-batch with seed node slicing.
- Parameters:
batch (Data) – A mini-batch from the loader.
pretrain (bool, optional) – If True, only compute discriminative loss.
- Returns:
LossOutput– LossOutput containing total loss and individual components.
- class DMoN(encoder: Module, n_features: int, n_clusters: int, lam: float = 1.0)[source]
Bases:
ClusteringModelThe Deep Modularity Network (DMoN) is proposed in the “Graph Clustering with Graph Neural Networks” paper (Tsitsulin et al., JMLR 2023).
This model performs unsupervised graph clustering by combining a graph encoder (e.g., GCN, GraphSAGE) with the
DMoNClusterHead. The encoder produces node embeddings \(\mathbf{Z}\), which are then projected into soft cluster assignments \(\mathbf{S}\). The model jointly optimizes the modularity-based and collapse regularization objectives to learn meaningful community structures in the graph.The optimization objective consists of two losses:
(1) Spectral modularity loss:
\[\mathcal{L}_m = - \frac{1}{2m} \mathrm{Tr}(\mathbf{S}^\top \mathbf{B} \mathbf{S})\]where \(\mathbf{B} = \mathbf{A} - \frac{\mathbf{d}\mathbf{d}^\top}{2m}\) is the modularity matrix and \(m\) is the total number of edges. This term encourages nodes that are more densely connected than random to be assigned to the same cluster.
(2) Collapse regularization loss:
\[\mathcal{L}_c = \frac{\sqrt{K}}{N} \left\| \sum_i \mathbf{S}_i^\top \right\|_F - 1\]which prevents degenerate solutions by penalizing unbalanced or collapsed cluster assignments.
The final training objective is a weighted combination of the two terms:
\[\mathcal{L} = \mathcal{L}_m + \lambda \mathcal{L}_c\]where \(\lambda\) controls the strength of the regularization.
- Parameters:
encoder (torch.nn.Module) – Node encoder that outputs node embeddings.
n_features (int) – Feature dimension of the encoder outputs.
n_clusters (int) – Number of clusters.
lam (float, optional) – Regularization coefficient for the collapse loss \(\mathcal{L}_c\). (default:
1.0)
- forward(*args, **kwargs) Tensor[source]
Predict hard cluster assignments from current parameters.
- Return type:
- loss(x: Tensor, edge_index: Tensor, **kwargs) LossOutput[source]
Computes the DMoN loss with multiple components.
- Parameters:
x (torch.Tensor) – Node features.
edge_index (torch.Tensor) – Edge indices.
- Returns:
LossOutput– LossOutput containing total loss and individual components.
- class MinCut(encoder: Module, n_features: int, n_clusters: int, lam: float = 1.0, temperature: float = 1.0)[source]
Bases:
ClusteringModelThe MinCut model is based on “Spectral Clustering in Graph Neural Networks for Graph Pooling” (Bianchi et al., 2019).
It performs unsupervised graph clustering by coupling a graph encoder (e.g., GCN, GraphSAGE) with the
MinCutClusterHead. The encoder produces node embeddings \(\mathbf{Z}\), which are projected into soft cluster assignments \(\mathbf{S}\). The model jointly optimizes the MinCut objective and the orthogonality regularizer to yield compact, well-separated clusters.The optimization objective consists of two losses:
(1) MinCut loss:
\[\mathcal{L}_{\text{mincut}} = - \frac{\mathrm{Tr}(\mathbf{S}^\top \mathbf{A} \mathbf{S})} {\mathrm{Tr}(\mathbf{S}^\top \mathbf{D} \mathbf{S})}\]encouraging large within-cluster connectivity relative to cluster volume.
(2) Orthogonality regularization:
\[\mathcal{L}_{\text{ortho}} = \left\| \frac{\mathbf{S}^\top \mathbf{S}}{\|\mathbf{S}^\top \mathbf{S}\|_F} - \frac{\mathbf{I}_K}{\sqrt{K}} \right\|_F\]encouraging near-orthogonal cluster assignment columns to avoid collapse.
The final training objective is a weighted combination of the two terms:
\[\mathcal{L} = \mathcal{L}_{\text{mincut}} + \lambda \mathcal{L}_{\text{ortho}}\]where \(\lambda\) controls the strength of the orthogonality regularization.
- Parameters:
encoder (torch.nn.Module) – Node encoder that outputs node embeddings.
n_features (int) – Feature dimension of the encoder outputs.
n_clusters (int) – Number of clusters.
lam (float, optional) – Regularization coefficient for the orthogonality loss \(\mathcal{L}_{\text{ortho}}\). (default:
1.0)temperature (float, optional) – Softmax temperature used in the MinCut head. (default:
1.0)
- forward(*args, **kwargs) Tensor[source]
Predict hard cluster assignments from current parameters.
- Return type:
- loss(x: Tensor, edge_index: Tensor, **kwargs: Any) LossOutput[source]
Computes the MinCut loss with multiple components.
- Parameters:
x (torch.Tensor) – Node features.
edge_index (torch.Tensor) – Edge indices.
- Returns:
LossOutput– LossOutput containing total loss and individual components.
- class Neuromap(encoder: Module, n_features: int, n_clusters: int, lam: float = 1.0, alpha: float = 0.15, n_iters: int = 100)[source]
Bases:
ClusteringModelThe Neuromap model implements the differentiable map equation from “The Map Equation Goes Neural: Mapping Network Flows with Graph Neural Networks” (Blöcker et al., NeurIPS 2024).
This model performs unsupervised graph clustering by combining a graph encoder (e.g., GCN, GraphSAGE) with the
NeuromapClusterHead.The encoder produces node embeddings \(\mathbf{Z}\), which are projected into soft cluster assignments \(\mathbf{S}\). The model then minimizes the differentiable Map Equation loss, which measures the expected description length of random walks on the graph according to the Minimum Description Length (MDL) principle:
\[\mathcal{L}(A, S) = q \log q - \sum_m q_m \log q_m - \sum_m m_{\text{exit}} \log m_{\text{exit}} - \sum_u p_u \log p_u + \sum_m p_m \log p_m\]where all terms are derived from the soft community flow structure induced by \(S\).
This loss naturally balances model complexity and data fit without explicit regularization, allowing Neuromap to infer the number of effective communities automatically.
- Parameters:
encoder (torch.nn.Module) – Node encoder producing node embeddings.
n_features (int) – Feature dimension of encoder outputs.
n_clusters (int) – Maximum number of clusters to consider.
lam (float, optional) – Regularization coefficient for the collapse loss. (default:
1.0)alpha (float, optional) – Teleportation probability. (default:
0.15)n_iters (int, optional) – Power iteration steps for stationary distribution. (default:
100)
- forward(*args, **kwargs) Tensor[source]
Predicts hard cluster assignments from current parameters.
- Return type:
- loss(x: Tensor, edge_index: Tensor, **kwargs: Any) LossOutput[source]
Computes the Neuromap loss with multiple components.
- Parameters:
x (torch.Tensor) – Node features.
edge_index (torch.Tensor) – Edge indices.
- Returns:
LossOutput– LossOutput containing total loss and individual components.
- class GCSBM(encoder: Module, n_features: int, n_clusters: int, variant: str = 'bernoulli', eta: float = 3.0, alpha: float = 1.0)[source]
Bases:
ClusteringModelStochastic Block Model (GCSBM) based clustering model from the paper “Differentiable Community Detection with Graph Neural Networks and Stochastic Block Models” (Arliss & Mueller, LoG 2025).
This model learns node embeddings via a GNN encoder and performs clustering by maximizing the likelihood of an GCSBM-based generative model. It supports multiple GCSBM variants including Bernoulli, Poisson, and their degree-corrected versions.
The model optimizes:
\[\mathcal{L} = m^{-1} \mathcal{L}_{GCSBM} + \alpha \|\mathbf{1}_k - \mathrm{diag}(\hat{\Theta})\|_F\]where \(\mathcal{L}_{GCSBM}\) is the negative log-likelihood, \(m\) is the number of edges, and \(\alpha\) controls regularization strength.
Supported Variants:
'bernoulli': Standard Bernoulli GCSBM'poisson': Partial Poisson GCSBM (suitable for simple graphs)'bernoulli-dc': Degree-corrected Bernoulli GCSBM'poisson-dc': Degree-corrected Poisson GCSBM'match': Graph Matching variant (fastest)
- Parameters:
encoder (torch.nn.Module) – Node encoder that outputs node embeddings.
n_features (int) – Feature dimension of the encoder outputs.
n_clusters (int) – Number of clusters.
variant (str, optional) – GCSBM variant to use. Options:
'bernoulli','poisson','bernoulli-dc','poisson-dc','match'. (default:'bernoulli')eta (float, optional) – Negative sampling ratio. (default:
3.0)alpha (float, optional) – Regularization strength. (default:
1.0)
- forward(*args, **kwargs) Tensor[source]
Predict hard cluster assignments from current parameters.
- Return type:
- loss(x: Tensor, edge_index: Tensor, **kwargs: Any) LossOutput[source]
Computes the GCSBM loss with multiple components.
- Parameters:
x (torch.Tensor) – Node features.
edge_index (torch.Tensor) – Edge indices.
- Returns:
LossOutput– LossOutput containing total loss and individual components.
- loss_batch(batch: Data, **kwargs: Any) LossOutput[source]
Computes loss for a mini-batch with seed node slicing.
- Parameters:
batch (Data) – A mini-batch from the loader.
- Returns:
LossOutput– LossOutput containing total loss and individual components.