Understanding the ECO Framework

The Encode-Cluster-Optimize (ECO) framework is the foundation of PyAGC. This tutorial explains how PyAGC’s modular design implements this framework.

The Three Pillars

Encoder: Learns node representations
Cluster Head: Projects embeddings to cluster assignments
Optimization Strategy: Defines the training objective and coordination

Encoder Module

The encoder transforms raw graph data into latent representations:

\[\mathbf{Z} = \mathcal{E}(\mathbf{A}, \mathbf{X}; \Theta_{\mathcal{E}})\]

Parametric Encoders

Use learnable graph neural networks:

from pyagc.encoders import create_tuned_gnn
from pyagc.data import get_dataset
from torch_geometric.data import Data

# Load dataset
x, edge_index, y = get_dataset('Cora', root='./data')
data = Data(x=x, edge_index=edge_index)

# Create GCN encoder
gcn_encoder = create_tuned_gnn(
    gnn_type='gcn',
    in_channels=data.num_features,
    hidden_channels=256,
    num_layers=2,
    out_channels=128,
    dropout=0.5,
    norm='batch'
)

# Create GAT encoder with attention
gat_encoder = create_tuned_gnn(
    gnn_type='gat',
    in_channels=data.num_features,
    hidden_channels=256,
    num_layers=2,
    out_channels=128,
    heads=8,
    concat=False,
    dropout=0.6
)

# Forward pass
z = gcn_encoder(data.x, data.edge_index)  # [num_nodes, 128]

Non-Parametric Encoders

Use fixed graph filtering operations without learnable parameters:

from pyagc.models import SSGC
import torch

# Simple Spectral Graph Convolution
# No learnable parameters - purely based on graph structure
model = SSGC(
    alpha=0.05,      # Teleport probability
    K=2,             # Number of propagation steps
    cached=True,     # Cache propagation matrix
    add_self_loops=True
)

# Computes: Z = (I - alpha·D^{-1/2}AD^{-1/2})^K X
# This is a smoothed version of node features
z = model.embed(data.x, data.edge_index)

Cluster Head Module

The cluster head maps embeddings to cluster assignments:

\[\mathbf{P} = \mathcal{C}(\mathbf{Z}; \Theta_{\mathcal{C}})\]

Differentiable Cluster Heads

Allow end-to-end gradient-based training:

from pyagc.clusters import DECClusterHead, DMoNClusterHead

# Get embeddings (assuming z is already computed)
num_nodes, embedding_dim = z.shape
num_clusters = 7

# 1. DEC-style prototype clustering
# Uses Student's t-distribution to compute soft assignments
dec_head = DECClusterHead(
    n_clusters=num_clusters,
    n_features=embedding_dim,
    alpha=1.0  # Degrees of freedom
)

# Initialize cluster centers (e.g., from KMeans)
from pyagc.clusters import TorchKMeans
kmeans = TorchKMeans(n_clusters=num_clusters)
kmeans.fit(z)
dec_head.reset_cluster_centers(kmeans.cluster_centers_)

# Forward: compute clustering loss
loss = dec_head(z)  # KL divergence loss

# Get cluster assignments
clusters = dec_head.cluster(z, soft=False)  # Hard assignments
probs = dec_head.cluster(z, soft=True)      # Soft assignments

# 2. DMoN-style differentiable pooling
# Uses modularity maximization
dmon_head = DMoNClusterHead(
    n_clusters=num_clusters,
    n_features=embedding_dim
)

# Forward: compute modularity and collapse losses
modularity_loss, collapse_loss = dmon_head(z, data.edge_index)
total_loss = modularity_loss + collapse_loss

# Get cluster assignments
clusters = dmon_head.cluster(z, soft=False)

Discrete Cluster Heads

Apply post-hoc clustering algorithms (non-differentiable):

from pyagc.clusters import KMeansClusterHead, TorchKMeans

# 1. Using KMeansClusterHead wrapper
kmeans_head = KMeansClusterHead(
    n_clusters=7,
    backend='torch',  # 'torch' or 'sklearn'
    n_init=10,
    max_iter=300,
    random_state=42
)

# Fit and predict in one step
clusters = kmeans_head.fit_predict(z)

# Or use separately
kmeans_head.fit(z)
clusters = kmeans_head.predict(z)
centers = kmeans_head.cluster_centers_

# 2. Using TorchKMeans directly (GPU-accelerated)
kmeans = TorchKMeans(
    n_clusters=7,
    max_iter=300,
    tol=1e-4,
    random_state=42
)

kmeans.fit(z)
clusters = kmeans.labels_            # [num_nodes]
centers = kmeans.cluster_centers_    # [num_clusters, embedding_dim]
inertia = kmeans.inertia_            # Sum of squared distances

Optimization Strategy

The optimization strategy defines how encoder and cluster head interact during training.

Decoupled Training (Two-Stage)

Pre-train encoder with self-supervised objectives, then apply discrete clustering:

from pyagc.models import NS4GC
from pyagc.data import get_dataset
from pyagc.encoders import create_tuned_gnn
from pyagc.transforms import GSSLTransform
from torch_geometric.data import Data
import torch

# Load data
x, edge_index, y = get_dataset('Cora', root='./data')
data = Data(x=x, edge_index=edge_index)

# Create encoder
encoder = create_tuned_gnn(
    gnn_type='gcn',
    in_channels=data.num_features,
    hidden_channels=64,
    num_layers=2,
    norm='batch'
)

# Create data augmentation
transform1 = GSSLTransform(p_feat_mask=0.2, p_edge_drop=0.3)
transform2 = GSSLTransform(p_feat_mask=0.2, p_edge_drop=0.3)

# Create NS4GC model
model = NS4GC(
    encoder=encoder,
    transform1=transform1,
    transform2=transform2,
    lam=1.0,      # Weight for neighbor loss
    gam=1.0       # Weight for sparsity loss
).to('cuda')

# Stage 1: Pre-train encoder with contrastive learning
# Objective: L_rep = L_ali + λ·L_nei + γ·L_spa
optimizer = torch.optim.Adam(model.parameters(), lr=0.001, weight_decay=0.0)

for epoch in range(200):
    loss = model.train_full(data, optimizer, epoch, verbose=True)

# Stage 2: Generate embeddings and apply KMeans
model.eval()
with torch.no_grad():
    z = model.infer_full(data)  # [num_nodes, hidden_channels]

# Clustering is completely decoupled from encoder training
from pyagc.clusters import KMeansClusterHead
kmeans = KMeansClusterHead(n_clusters=7)
clusters = kmeans.fit_predict(z)

The two stages optimize separate objectives:

\[\text{Stage 1: } \min_{\Theta_{\mathcal{E}}} \mathcal{L}_{\text{rep}}(\mathbf{Z}) = \mathcal{L}_{\text{ali}} + \lambda \mathcal{L}_{\text{nei}} + \gamma \mathcal{L}_{\text{spa}}\]

\[\text{Stage 2: } \min_{\Theta_{\mathcal{C}}} \sum_{i} \|\mathbf{z}_i - \boldsymbol{\mu}_{c_i}\|^2\]

Joint Training (End-to-End)

Train encoder and cluster head together with a combined objective:

from pyagc.models import DAEGC
from pyagc.encoders import create_tuned_gnn
from pyagc.data import get_dataset
from torch_geometric.data import Data
import torch

# Load data
x, edge_index, y = get_dataset('Cora', root='./data')
data = Data(x=x, edge_index=edge_index).to('cuda')

# Get number of clusters
num_clusters = int(y[~torch.isnan(y)].max().item()) + 1

# Create encoder
encoder = create_tuned_gnn(
    gnn_type='gat',
    in_channels=data.num_features,
    hidden_channels=256,
    num_layers=2,
    heads=8,
)

# Create DAEGC model
# Combines GAE reconstruction + DEC clustering
model = DAEGC(
    encoder=encoder,
    n_clusters=num_clusters,
    hidden_channels=256,
    gamma=10.0,           # Weight for clustering loss
    update_interval=5     # Update target distribution every N epochs
).to('cuda')

# Stage 1: Pre-train autoencoder
# Objective: L_pretrain = ||A - decoder(encoder(X, A))||^2
print("Stage 1: Pre-training autoencoder...")
optimizer_pretrain = torch.optim.Adam(
    model.parameters(),
    lr=0.001,
    weight_decay=5e-4
)

for epoch in range(1, 201):
    # Pretrain with reconstruction loss only
    loss = model.train_full(
        data,
        optimizer_pretrain,
        epoch,
        verbose=(epoch % 10 == 0),
        pretrain=True  # Only use reconstruction loss
    )

    if epoch % 50 == 0:
        print(f'Pretrain Epoch {epoch:03d}, Loss: {loss:.4f}')

# Stage 2: Initialize cluster centers using K-Means
print("\nStage 2: Initializing cluster centers with K-Means...")
model.eval()
with torch.no_grad():
    # Get pretrained embeddings
    z = model.embed(data.x, data.edge_index)
    # Normalize for better clustering
    z = torch.nn.functional.normalize(z, p=2, dim=1)

# Initialize cluster centers via K-Means
from pyagc.clusters import TorchKMeans
kmeans = TorchKMeans(n_clusters=num_clusters, random_state=42)
kmeans.fit(z)

# Set initialized centers to the DEC cluster head
model.cluster_head.reset_cluster_centers(kmeans.cluster_centers_)
print(f"✓ Cluster centers initialized: {model.cluster_head.cluster_centers.shape}")

# Stage 3: Joint fine-tuning
# Objective: L_total = L_reconstruction + γ·KL(P || Q)
print("\nStage 3: Joint fine-tuning with clustering loss...")
optimizer_finetune = torch.optim.Adam(
    model.parameters(),
    lr=0.0001,
    weight_decay=0.0
)

for epoch in range(1, 201):
    # Joint training with both reconstruction and clustering losses
    loss = model.train_full(
        data,
        optimizer_finetune,
        epoch,
        verbose=(epoch % 10 == 0),
        pretrain=False  # Use both reconstruction + clustering losses
    )

    if epoch % 10 == 0:
        print(f'Finetune Epoch {epoch:03d}, Loss: {loss:.4f}')

# Get final cluster assignments
model.eval()
clusters = model.infer_full(data)  # Hard cluster assignments

The joint loss simultaneously optimizes representation and clustering:

\[\min_{\Theta_{\mathcal{E}}, \Theta_{\mathcal{C}}} \mathcal{L}_{\text{total}} = \underbrace{\|\mathbf{A} - \sigma(\mathbf{Z})\|^2}_{\text{reconstruction}} + \gamma \underbrace{\text{KL}(\mathbf{P} \| \mathbf{Q})}_{\text{clustering}}\]

Composing ECO Components

PyAGC’s modular design enables flexible composition of components.

Example 1: Custom Model with Swappable Encoders

from pyagc.models import ClusteringModel, LossOutput
from pyagc.encoders import create_tuned_gnn
from pyagc.clusters import DECClusterHead

class MyClusteringModel(ClusteringModel):
    """Custom clustering model with flexible encoder."""

    def __init__(self, in_channels, hidden_channels, num_clusters,
                 gnn_type='gcn'):
        super().__init__()

        # Easily swap between different GNN types
        self.encoder = create_tuned_gnn(
            gnn_type=gnn_type,  # Try: 'gcn', 'gat', 'sage', 'gin'
            in_channels=in_channels,
            hidden_channels=hidden_channels,
            num_layers=2,
            out_channels=128
        )

        self.cluster_head = DECClusterHead(
            n_clusters=num_clusters,
            n_features=128
        )

    def forward(self, data):
        z = self.encoder(data.x, data.edge_index)
        return z

    def loss(self, data):
        z = self.forward(data)
        cluster_loss = self.cluster_head(z)
        return LossOutput(total=cluster_loss)

    def predict(self, data):
        z = self.forward(data)
        return self.cluster_head.cluster(z, soft=False)

# Use different encoders with same model structure
model_gcn = MyClusteringModel(1433, 256, 7, gnn_type='gcn')
model_gat = MyClusteringModel(1433, 256, 7, gnn_type='gat')
model_sage = MyClusteringModel(1433, 256, 7, gnn_type='sage')

Example 2: Comparing Different Cluster Heads

from pyagc.clusters import (
    DECClusterHead,
    DMoNClusterHead,
    KMeansClusterHead,
    DinkClusterHead
)

# Shared encoder for fair comparison
encoder = create_tuned_gnn('gcn', data.num_features, 256, 2)

# Get embeddings once
with torch.no_grad():
    z = encoder(data.x, data.edge_index)

# Compare different clustering approaches

# 1. DEC: Prototype-based with Student's t-distribution
dec_head = DECClusterHead(n_clusters=7, n_features=256)
dec_head.reset_cluster_centers()  # Random init or use KMeans
clusters_dec = dec_head.cluster(z, soft=False)

# 2. DMoN: Modularity-aware differentiable pooling
dmon_head = DMoNClusterHead(n_clusters=7, n_features=256)
clusters_dmon = dmon_head.cluster(z, soft=False)

# 3. DinkNet: Dilation and shrink regularization
dink_head = DinkClusterHead(n_clusters=7, n_features=256)
clusters_dink = dink_head.cluster(z, soft=False)

# 4. KMeans: Classic centroid-based
kmeans_head = KMeansClusterHead(n_clusters=7)
clusters_kmeans = kmeans_head.fit_predict(z)

# Evaluate all methods
from pyagc.metrics import label_metrics
for name, clusters in [
    ('DEC', clusters_dec),
    ('DMoN', clusters_dmon),
    ('DinkNet', clusters_dink),
    ('KMeans', clusters_kmeans)
]:
    results = label_metrics(y, clusters, metrics=['NMI', 'ARI', 'ACC'])
    print(f"{name:8s} - NMI: {results['NMI']:.4f}, "
          f"ARI: {results['ARI']:.4f}, ACC: {results['ACC']:.4f}")

Example 3: Custom Multi-Objective Optimization

from pyagc.models import TrainableModel, LossOutput
from pyagc.encoders import create_tuned_gnn
from pyagc.clusters import DECClusterHead
import torch
import torch.nn.functional as F

class MultiObjectiveModel(TrainableModel):
    """Custom model with multiple loss components."""

    def __init__(self, in_channels, hidden_channels, num_clusters):
        super().__init__()

        self.encoder = create_tuned_gnn(
            gnn_type='gcn',
            in_channels=in_channels,
            hidden_channels=hidden_channels,
            num_layers=2,
            out_channels=128
        )

        self.cluster_head = DECClusterHead(
            n_clusters=num_clusters,
            n_features=128,
            alpha=1.0
        )

        # Decoder for reconstruction
        self.decoder = torch.nn.Linear(128, in_channels)

    def forward(self, data):
        z = self.encoder(data.x, data.edge_index)
        return z

    def loss(self, data):
        z = self.forward(data)

        # 1. Clustering loss (KL divergence)
        loss_cluster = self.cluster_head(z, update_target=True)

        # 2. Reconstruction loss
        x_recon = self.decoder(z)
        loss_recon = F.mse_loss(x_recon, data.x)

        # 3. Contrastive loss (InfoNCE-style)
        # Normalize embeddings
        z_norm = F.normalize(z, p=2, dim=1)
        # Compute similarity matrix
        sim_matrix = torch.matmul(z_norm, z_norm.t()) / 0.5
        # Create positive pairs from neighbors
        adj = torch.sparse_coo_tensor(
            data.edge_index,
            torch.ones(data.edge_index.shape[1], device=z.device),
            (data.num_nodes, data.num_nodes)
        ).to_dense()
        # Positive pairs: neighbors in graph
        pos_mask = adj > 0
        # Negative pairs: non-neighbors
        neg_mask = ~pos_mask
        neg_mask.fill_diagonal_(False)

        # Compute contrastive loss
        pos_sim = sim_matrix[pos_mask].mean() if pos_mask.sum() > 0 else 0
        neg_sim = torch.logsumexp(sim_matrix[neg_mask], dim=0).mean()
        loss_contrast = -pos_sim + neg_sim

        # 4. Regularization: encourage balanced clusters
        q = self.cluster_head.cluster(z, soft=True)
        cluster_sizes = q.sum(dim=0)
        target_size = q.shape[0] / q.shape[1]
        loss_balance = F.mse_loss(cluster_sizes,
                                 torch.full_like(cluster_sizes, target_size))

        # Combined loss with weights
        total_loss = (loss_cluster +
                     0.1 * loss_recon +
                     0.05 * loss_contrast +
                     0.01 * loss_balance)

        return LossOutput(
            total=total_loss,
            loss_cluster=loss_cluster,
            loss_recon=loss_recon,
            loss_contrast=loss_contrast,
            loss_balance=loss_balance
        )

    def predict(self, data):
        z = self.forward(data)
        return self.cluster_head.cluster(z, soft=False)

# Train the model
model = MultiObjectiveModel(
    in_channels=data.num_features,
    hidden_channels=256,
    num_clusters=7
).to('cuda')

optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

for epoch in range(200):
    loss_output = model.train_full(data, optimizer, epoch)

    if epoch % 10 == 0:
        print(f'Epoch {epoch:03d}:')
        print(f'  Total: {loss_output.total:.4f}')
        print(f'  Cluster: {loss_output.loss_cluster:.4f}')
        print(f'  Recon: {loss_output.loss_recon:.4f}')
        print(f'  Contrast: {loss_output.loss_contrast:.4f}')
        print(f'  Balance: {loss_output.loss_balance:.4f}')

Example 4: Mini-Batch Training for Large Graphs

from torch_geometric.loader import NeighborLoader

# For large graphs, use mini-batch training
train_loader = NeighborLoader(
    data,
    num_neighbors=[15, 10],  # 2-layer sampling
    batch_size=1024,
    shuffle=True,
    num_workers=4
)

# Create model
model = NS4GC(
    encoder=encoder,
    transform1=transform1,
    transform2=transform2
).to('cuda')

optimizer = torch.optim.Adam(model.parameters(), lr=0.001)

# Train with mini-batches
for epoch in range(200):
    avg_loss = model.train_batch(train_loader, optimizer, epoch)

    if epoch % 10 == 0:
        print(f'Epoch {epoch:03d}, Loss: {avg_loss:.4f}')

# Inference can still use full-batch or mini-batch
inference_loader = NeighborLoader(
    data,
    num_neighbors=[-1],  # Sample all neighbors
    batch_size=2048,
    shuffle=False
)

z = model.infer_batch(inference_loader)

ECO Taxonomy of Methods

PyAGC organizes 20+ state-of-the-art algorithms into the ECO framework:

Method	Encoder	Cluster Head	Optimization	Key Innovation
KMeans	None	Discrete	N/A	Attribute-only baseline
Node2Vec	Non-param	Discrete	Decoupled	Structure-only baseline
SSGC	Non-param	Discrete	Decoupled	Markov diffusion-based spectral filtering
NAFS	Non-param	Discrete	Decoupled	Adaptive filter selection with ensemble
SAGSC	Non-param	Discrete	Decoupled	Graph regularized subspace clustering
S2CAG	Non-param	Discrete	Decoupled	Conductance minimization for subspace clustering
MS2CAG	Non-param	Discrete	Decoupled	Modularity maximization for subspace clustering
GAE/VGAE	Parametric	Discrete	Decoupled	Graph autoencoder with optional variational
ARGA/ARGVA	Parametric	Discrete	Decoupled	Adversarially regularized GAE/VGAE
DGI	Parametric	Discrete	Decoupled	Mutual information maximization
CCASSG	Parametric	Discrete	Decoupled	Canonical correlation for redundancy reduction
GBT	Parametric	Discrete	Decoupled	Barlow Twins for redundancy reduction
S3GC	Parametric	Discrete	Decoupled	Scalable contrastive learning
NS4GC	Parametric	Discrete	Decoupled	Node similarity preserving contrastive
MAGI	Parametric	Discrete	Decoupled	Modularity-aware contrastive clustering
DAEGC	Parametric	Differentiable	Joint	GAT + DEC clustering
DinkNet	Parametric	Differentiable	Joint	Dilation and shrink regularization
MinCut	Parametric	Differentiable	Joint	Spectral cut minimization
DMoN	Parametric	Differentiable	Joint	Modularity maximization
Neuromap	Parametric	Differentiable	Joint	Neural map equation
GCSBM	Parametric	Differentiable	Joint	Stochastic block model

Conclusion

The ECO framework provides a unified lens for understanding and implementing attributed graph clustering methods. By decomposing algorithms into Encoder, Cluster Head, and Optimization Strategy components, PyAGC enables:

✅ Modularity: Swap components without rewriting code
✅ Extensibility: Easy to add new encoders, cluster heads, or optimization strategies
✅ Reproducibility: Standardized evaluation protocols and benchmarking
✅ Scalability: Support for graphs from thousands to billions of nodes
✅ Flexibility: From research prototyping to production deployment

Start experimenting with the ECO framework today and build state-of-the-art graph clustering solutions!

Next Steps

Create a custom cluster head for novel objectives
Scale to massive graphs with mini-batch training