ray.rllib.algorithms.algorithm.Algorithm.restore#

Algorithm.restore(checkpoint_path: Union[str, ray.air.checkpoint.Checkpoint], checkpoint_node_ip: Optional[str] = None, fallback_to_latest: bool = False)#

Restores training state from a given model checkpoint.

These checkpoints are returned from calls to save().

Subclasses should override load_checkpoint() instead to restore state. This method restores additional metadata saved with the checkpoint.

checkpoint_path should match with the return from save().

checkpoint_path can be /ray_results/exp/MyTrainable_abc/ checkpoint_00000/checkpoint. Or, /ray_results/exp/MyTrainable_abc/checkpoint_00000.

self.logdir should generally be corresponding to checkpoint_path, for example, /ray_results/exp/MyTrainable_abc.

self.remote_checkpoint_dir in this case, is something like, REMOTE_CHECKPOINT_BUCKET/exp/MyTrainable_abc

Parameters
  • checkpoint_path – Path to restore checkpoint from. If this path does not exist on the local node, it will be fetched from external (cloud) storage if available, or restored from a remote node.

  • checkpoint_node_ip – If given, try to restore checkpoint from this node if it doesn’t exist locally or on cloud storage.

  • fallback_to_latest – If True, will try to recover the latest available checkpoint if the given checkpoint_path could not be found.