pyagc.utils
===========

.. contents:: Contents
    :local:

The :mod:`pyagc.utils` package provides utility functions and classes for experiment management,
including checkpoint management for single-stage and multi-stage training, configuration loading,
logging, reproducibility, and common mathematical operations.

.. code-block:: python

    from pyagc.utils import (
        CheckpointManager,
        MultiStageCheckpointManager,
        set_seed,
        get_training_config,
        get_logger,
    )

    # Set random seeds for reproducibility:
    set_seed(42)

    # Load dataset-specific training config from YAML:
    config = get_training_config("Cora", config_path="train.conf.yaml")

    # Create a logger with file and console output:
    logger = get_logger("experiment.log", log_level=1)

    # Manage checkpoints during training:
    ckpt_mgr = CheckpointManager(ckpt_dir="./checkpoints", model_name="dmon")
    ckpt_mgr.save_checkpoint(model, optimizer, epoch=10, loss=0.35, is_best=True)

Checkpoint Management
---------------------

PyAGC provides two checkpoint managers to handle model persistence during training.
:class:`CheckpointManager` supports standard single-stage training workflows,
while :class:`MultiStageCheckpointManager` extends it for multi-stage pipelines
common in decoupled AGC methods (*e.g.*, pre-training followed by fine-tuning).

Both managers automatically track the best model, support intra-epoch saving for
mini-batch training on large graphs, and allow seamless training resumption.

.. code-block:: python

    from pyagc.utils import CheckpointManager, MultiStageCheckpointManager

    # Single-stage checkpoint management:
    ckpt = CheckpointManager("./ckpts", "dmon")
    ckpt.save_checkpoint(model, optimizer, epoch=5, loss=0.42, is_best=True)
    ckpt.load_checkpoint(model, optimizer, load_best=True, device="cuda")

    # Multi-stage checkpoint management (e.g., pretrain + finetune):
    ckpt = MultiStageCheckpointManager(
        "./ckpts", "daegc", stages=["pretrain", "finetune"]
    )
    ckpt.save_checkpoint(model, optimizer, epoch=100, loss=0.5, stage="pretrain", is_best=True)
    ckpt.load_checkpoint(model, stage="pretrain", load_best=True, device="cuda")
    ckpt.save_checkpoint(model, optimizer, epoch=50, loss=0.3, stage="finetune", is_best=True)

.. currentmodule:: pyagc.utils

.. autosummary::
   :nosignatures:
   :toctree: ../generated
   :template: autosummary/class.rst

   CheckpointManager
   MultiStageCheckpointManager

Configuration & Logging
-----------------------

PyAGC adopts a **configuration-driven** experiment design. All hyperparameters are
specified in YAML files with a hierarchical structure: a ``default`` section provides
base configurations, and dataset-specific sections selectively override these defaults.

.. code-block:: yaml

    # train.conf.yaml
    default:
      learning_rate: 0.001
      hidden_dim: 128
      model:
        num_layers: 2
        dropout: 0.5

    Cora:
      learning_rate: 0.01
      model:
        num_layers: 3

    CiteSeer:
      hidden_dim: 256

.. code-block:: python

    from pyagc.utils import get_training_config, get_logger

    # Load merged configuration (default + dataset-specific overrides):
    config = get_training_config("Cora", config_path="train.conf.yaml")
    # >>> {'learning_rate': 0.01, 'hidden_dim': 128, 'model': {'num_layers': 3, 'dropout': 0.5}}

    # Create a logger with both file and console output:
    logger = get_logger("logs/experiment.log", log_level=1, name="pyagc")
    logger.info("Training started")

.. currentmodule:: pyagc.utils

.. autosummary::
   :nosignatures:
   :toctree: ../generated

   get_training_config
   get_logger
   deep_update_dict

.. autofunction:: get_training_config

.. autofunction:: get_logger

.. autofunction:: deep_update_dict

Reproducibility
---------------

.. currentmodule:: pyagc.utils

.. autosummary::
   :nosignatures:
   :toctree: ../generated

   set_seed

.. autofunction:: set_seed

Mathematical Utilities
----------------------

Common mathematical operations used across the library, including distance computation
and matrix manipulation.

.. code-block:: python

    from pyagc.utils import pairwise_squared_distance, off_diagonal

    # Compute pairwise squared Euclidean distances (e.g., for KMeans):
    x = torch.randn(1000, 128)   # node embeddings
    centers = torch.randn(7, 128)  # cluster centers
    dists = pairwise_squared_distance(x, centers)  # (1000, 7)

    # Extract off-diagonal elements (e.g., for regularization losses):
    corr = torch.randn(128, 128)
    off_diag = off_diagonal(corr)  # (128 * 127,)

.. currentmodule:: pyagc.utils

.. autosummary::
   :nosignatures:
   :toctree: ../generated

   pairwise_squared_distance
   off_diagonal
   filter_kwargs

.. autofunction:: pairwise_squared_distance

.. autofunction:: off_diagonal

.. autofunction:: filter_kwargs