pyagc.utils.CheckpointManager

class CheckpointManager(ckpt_dir: str, model_name: str, logger=None)[source]

Bases: object

Manages model checkpoints with support for resuming training.

__init__(ckpt_dir: str, model_name: str, logger=None)[source]
Parameters:
  • ckpt_dir (str) – Directory to save checkpoints

  • model_name (str) – Base name for checkpoint files

  • logger – Logger instance for logging

Methods

__init__(ckpt_dir, model_name[, logger])

type ckpt_dir:

str

has_checkpoint([load_best])

Check if a checkpoint exists.

load_checkpoint(model[, optimizer, ...])

Load a checkpoint and restore training state.

save_checkpoint(model, optimizer, epoch, loss)

Save a checkpoint with full training state.

save_checkpoint(model, optimizer, epoch: int, loss: float, is_best: bool = False, batch_idx: Optional[int] = None, additional_info: Optional[Dict[str, Any]] = None)[source]

Save a checkpoint with full training state.

Parameters:
  • model – The model to save

  • optimizer – The optimizer state

  • epoch (int) – Current epoch number

  • loss (float) – Current loss value

  • is_best (bool) – Whether this is the best model so far

  • batch_idx (int, optional) – Current batch index within epoch (for intra-epoch saves)

  • additional_info (dict) – Additional information to save

load_checkpoint(model, optimizer=None, load_best: bool = True, device: Optional[Union[str, device]] = 'cpu')[source]

Load a checkpoint and restore training state.

Parameters:
  • model – The model to load weights into

  • optimizer – The optimizer to restore state (optional)

  • load_best (bool) – If True, load best checkpoint; otherwise load last

  • device (Union[str, device, None], default: 'cpu') – Device to map the checkpoint to

Returns:

Dictionary with checkpoint information (epoch, loss, batch_idx, etc.) Returns None if checkpoint doesn’t exist

has_checkpoint(load_best: bool = True) bool[source]

Check if a checkpoint exists.

Parameters:

load_best (bool) – If True, check for best checkpoint; otherwise check last

Returns:

True if checkpoint exists

Return type:

bool