PyAGC Documentation
PyAGC (PyTorch Attributed Graph Clustering) is a comprehensive, modular library for attributed graph clustering built on PyTorch and PyTorch Geometric. It provides a unified framework for implementing, evaluating, and comparing state-of-the-art graph clustering algorithms at scale.
Key Features
📊 Diverse Dataset Collection: A diverse benchmark spanning 5 orders of magnitude across multiple domains. Features both academic benchmarks and real-world industrial datasets with heterogeneous attributes and varying structural properties.
🧩 Unified Algorithm Framework: Implements 20+ SOTA AGC methods unified under the Encode-Cluster-Optimize framework. Covers the full spectrum from traditional approaches to cutting-edge deep learning methods, with modular components enabling easy experimentation and method composition.
📏 Holistic Evaluation Protocol: Goes beyond standard supervised metrics by incorporating unsupervised structural quality metrics and comprehensive efficiency profiling. Addresses the real-world scenario where ground-truth labels are unavailable.
🚀 Production-Grade Scalability: Breaks the scalability barrier with GPU-accelerated clustering and mini-batch training support. Successfully scales deep clustering methods to graphs with 111 million nodes on a single 32GB GPU, making industrial deployment feasible.
🛠️ Developer-Friendly Design: Built on PyTorch and PyTorch Geometric with a clean, modular architecture. Features plug-and-play encoders, cluster heads, and optimization strategies. Configuration-driven experiments via YAML files ensure full reproducibility.
📖 Complete Documentation & Reproducibility: Provides extensive documentation, standardized data loaders, unified preprocessing pipelines, and reproducible experiment configurations. Open-source codebase with detailed tutorials enabling researchers and practitioners to quickly prototype, benchmark, and deploy AGC solutions.
Get Started
Tutorials