pyagc.utils.CheckpointManager
- class CheckpointManager(ckpt_dir: str, model_name: str, logger=None)[source]
Bases:
objectManages model checkpoints with support for resuming training.
Methods
__init__(ckpt_dir, model_name[, logger])- type ckpt_dir:
has_checkpoint([load_best])Check if a checkpoint exists.
load_checkpoint(model[, optimizer, ...])Load a checkpoint and restore training state.
save_checkpoint(model, optimizer, epoch, loss)Save a checkpoint with full training state.
- save_checkpoint(model, optimizer, epoch: int, loss: float, is_best: bool = False, batch_idx: Optional[int] = None, additional_info: Optional[Dict[str, Any]] = None)[source]
Save a checkpoint with full training state.
- Parameters:
model – The model to save
optimizer – The optimizer state
epoch (int) – Current epoch number
loss (float) – Current loss value
is_best (bool) – Whether this is the best model so far
batch_idx (int, optional) – Current batch index within epoch (for intra-epoch saves)
additional_info (dict) – Additional information to save