ray.rllib.algorithms.algorithm_config.AlgorithmConfig.evaluation#

AlgorithmConfig.evaluation(*, evaluation_interval: int | None = <ray.rllib.utils.from_config._NotProvided object>, evaluation_duration: int | str | None = <ray.rllib.utils.from_config._NotProvided object>, evaluation_duration_unit: str | None = <ray.rllib.utils.from_config._NotProvided object>, evaluation_auto_duration_min_env_steps_per_sample: int | None = <ray.rllib.utils.from_config._NotProvided object>, evaluation_auto_duration_max_env_steps_per_sample: int | None = <ray.rllib.utils.from_config._NotProvided object>, evaluation_sample_timeout_s: float | None = <ray.rllib.utils.from_config._NotProvided object>, evaluation_parallel_to_training: bool | None = <ray.rllib.utils.from_config._NotProvided object>, evaluation_force_reset_envs_before_iteration: bool | None = <ray.rllib.utils.from_config._NotProvided object>, evaluation_config: ~ray.rllib.algorithms.algorithm_config.AlgorithmConfig | dict | None = <ray.rllib.utils.from_config._NotProvided object>, off_policy_estimation_methods: ~typing.Dict | None = <ray.rllib.utils.from_config._NotProvided object>, ope_split_batch_by_episode: bool | None = <ray.rllib.utils.from_config._NotProvided object>, evaluation_num_env_runners: int | None = <ray.rllib.utils.from_config._NotProvided object>, custom_evaluation_function: ~typing.Callable | None = <ray.rllib.utils.from_config._NotProvided object>, offline_evaluation_interval: int | None = <ray.rllib.utils.from_config._NotProvided object>, num_offline_eval_runners: int | None = <ray.rllib.utils.from_config._NotProvided object>, offline_loss_for_module_fn: ~typing.Callable | None = <ray.rllib.utils.from_config._NotProvided object>, offline_eval_batch_size_per_runner: int | None = <ray.rllib.utils.from_config._NotProvided object>, dataset_num_iters_per_offline_eval_runner: int | None = <ray.rllib.utils.from_config._NotProvided object>, offline_eval_rl_module_inference_only: bool | None = <ray.rllib.utils.from_config._NotProvided object>, num_cpus_per_offline_eval_runner: int | None = <ray.rllib.utils.from_config._NotProvided object>, custom_resources_per_offline_eval_runner: ~typing.Dict[str, ~typing.Any] | None = <ray.rllib.utils.from_config._NotProvided object>, offline_evaluation_timeout_s: float | None = <ray.rllib.utils.from_config._NotProvided object>, max_requests_in_flight_per_offline_eval_runner: int | None = <ray.rllib.utils.from_config._NotProvided object>, broadcast_offline_eval_runner_states: bool | None = <ray.rllib.utils.from_config._NotProvided object>, validate_offline_eval_runners_after_construction: bool | None = <ray.rllib.utils.from_config._NotProvided object>, restart_failed_offline_eval_runners: bool | None = <ray.rllib.utils.from_config._NotProvided object>, ignore_offline_eval_runner_failures: bool | None = <ray.rllib.utils.from_config._NotProvided object>, max_num_offline_eval_runner_restarts: int | None = <ray.rllib.utils.from_config._NotProvided object>, offline_eval_runner_health_probe_timeout_s: float | None = <ray.rllib.utils.from_config._NotProvided object>, offline_eval_runner_restore_timeout_s: float | None = <ray.rllib.utils.from_config._NotProvided object>, always_attach_evaluation_results=-1, evaluation_num_workers=-1) AlgorithmConfig[source]#

Sets the config’s evaluation settings.

Parameters:
  • evaluation_interval – Evaluate with every evaluation_interval training iterations. The evaluation stats are reported under the “evaluation” metric key. Set to None (or 0) for no evaluation.

  • evaluation_duration – Duration for which to run evaluation each evaluation_interval. The unit for the duration can be set via evaluation_duration_unit to either “episodes” (default) or “timesteps”. If using multiple evaluation workers (EnvRunners) in the evaluation_num_env_runners > 1 setting, the amount of episodes/timesteps to run are split amongst these. A special value of “auto” can be used in case evaluation_parallel_to_training=True. This is the recommended way when trying to save as much time on evaluation as possible. The Algorithm then runs as many timesteps via the evaluation workers as possible, while not taking longer than the parallely running training step and thus, never wasting any idle time on either training- or evaluation workers. When using this setting (evaluation_duration="auto"), it is strongly advised to set evaluation_interval=1 and evaluation_force_reset_envs_before_iteration=True at the same time.

  • evaluation_duration_unit – The unit, with which to count the evaluation duration. Either “episodes” (default) or “timesteps”. Note that this setting is ignored if evaluation_duration="auto".

  • evaluation_auto_duration_min_env_steps_per_sample – If evaluation_duration is “auto” (in which case evaluation_duration_unit is always “timesteps”), at least how many timesteps should be done per remote sample() call.

  • evaluation_auto_duration_max_env_steps_per_sample – If evaluation_duration is “auto” (in which case evaluation_duration_unit is always “timesteps”), at most how many timesteps should be done per remote sample() call.

  • evaluation_sample_timeout_s – The timeout (in seconds) for evaluation workers to sample a complete episode in the case your config settings are: evaluation_duration != auto and evaluation_duration_unit=episode. After this time, the user receives a warning and instructions on how to fix the issue.

  • evaluation_parallel_to_training – Whether to run evaluation in parallel to the Algorithm.training_step() call, using threading. Default=False. E.g. for evaluation_interval=1 -> In every call to Algorithm.train(), the Algorithm.training_step() and Algorithm.evaluate() calls run in parallel. Note that this setting - albeit extremely efficient b/c it wastes no extra time for evaluation - causes the evaluation results to lag one iteration behind the rest of the training results. This is important when picking a good checkpoint. For example, if iteration 42 reports a good evaluation episode_return_mean, be aware that these results were achieved on the weights trained in iteration 41, so you should probably pick the iteration 41 checkpoint instead.

  • evaluation_force_reset_envs_before_iteration – Whether all environments should be force-reset (even if they are not done yet) right before the evaluation step of the iteration begins. Setting this to True (default) makes sure that the evaluation results aren’t polluted with episode statistics that were actually (at least partially) achieved with an earlier set of weights. Note that this setting is only supported on the new API stack w/ EnvRunners and ConnectorV2 (config.enable_rl_module_and_learner=True AND config.enable_env_runner_and_connector_v2=True).

  • evaluation_config – Typical usage is to pass extra args to evaluation env creator and to disable exploration by computing deterministic actions. IMPORTANT NOTE: Policy gradient algorithms are able to find the optimal policy, even if this is a stochastic one. Setting “explore=False” here results in the evaluation workers not using this optimal policy!

  • off_policy_estimation_methods – Specify how to evaluate the current policy, along with any optional config parameters. This only has an effect when reading offline experiences (“input” is not “sampler”). Available keys: {ope_method_name: {“type”: ope_type, …}} where ope_method_name is a user-defined string to save the OPE results under, and ope_type can be any subclass of OffPolicyEstimator, e.g. ray.rllib.offline.estimators.is::ImportanceSampling or your own custom subclass, or the full class path to the subclass. You can also add additional config arguments to be passed to the OffPolicyEstimator in the dict, e.g. {“qreg_dr”: {“type”: DoublyRobust, “q_model_type”: “qreg”, “k”: 5}}

  • ope_split_batch_by_episode – Whether to use SampleBatch.split_by_episode() to split the input batch to episodes before estimating the ope metrics. In case of bandits you should make this False to see improvements in ope evaluation speed. In case of bandits, it is ok to not split by episode, since each record is one timestep already. The default is True.

  • evaluation_num_env_runners – Number of parallel EnvRunners to use for evaluation. Note that this is set to zero by default, which means evaluation is run in the algorithm process (only if evaluation_interval is not 0 or None). If you increase this, also increases the Ray resource usage of the algorithm since evaluation workers are created separately from those EnvRunners used to sample data for training.

  • custom_evaluation_function – Customize the evaluation method. This must be a function of signature (algo: Algorithm, eval_workers: EnvRunnerGroup) -> (metrics: dict, env_steps: int, agent_steps: int) (metrics: dict if enable_env_runner_and_connector_v2=True), where env_steps and agent_steps define the number of sampled steps during the evaluation iteration. See the Algorithm.evaluate() method to see the default implementation. The Algorithm guarantees all eval workers have the latest policy state before this function is called.

  • offline_evaluation_interval – Evaluate offline with every offline_evaluation_interval training iterations. The offline evaluation stats are reported under the “evaluation/offline_evaluation” metric key. Set to None (or 0) for no offline evaluation.

  • num_offline_eval_runners – Number of OfflineEvaluationRunner actors to create for parallel evaluation. Setting this to 0 forces sampling to be done in the local OfflineEvaluationRunner (main process or the Algorithm’s actor when using Tune).

  • offline_loss_for_module_fn – A callable to compute the loss per RLModule in offline evaluation. If not provided the training loss function ( Learner.compute_loss_for_module) is used. The signature must be ( runner: OfflineEvaluationRunner, module_id: ModuleID, config: AlgorithmConfig, batch: Dict[str, Any], fwd_out: Dict[str, TensorType]).

  • offline_eval_batch_size_per_runner – Evaluation batch size per individual OfflineEvaluationRunner worker. This setting only applies to the new API stack. The number of OfflineEvaluationRunner workers can be set via config.evaluation(num_offline_eval_runners=...). The total effective batch size is then num_offline_eval_runners x offline_eval_batch_size_per_runner.

  • dataset_num_iters_per_offline_eval_runner – Number of batches to evaluate in each OfflineEvaluationRunner during a single evaluation. If None, each learner runs a complete epoch over its data block (the dataset is partitioned into at least as many blocks as there are runners). The default is 1.

  • offline_eval_rl_module_inference_only – If True, the module spec is used in an inference-only setting (no-loss) and the RLModule can thus be built in its light version (if available). For example, the inference_only version of an RLModule might only contain the networks required for computing actions, but misses additional target- or critic networks. Also, if True, the module does NOT contain those (sub) RLModules that have their learner_only flag set to True.

  • num_cpus_per_offline_eval_runner – Number of CPUs to allocate per OfflineEvaluationRunner.

  • custom_resources_per_eval_runner – Any custom Ray resources to allocate per OfflineEvaluationRunner.

  • offline_evaluation_timeout_s – The timeout in seconds for calling run() on remote OfflineEvaluationRunner workers. Results (episode list) from workers that take longer than this time are discarded.

  • max_requests_in_flight_per_offline_eval_runner – Max number of in-flight requests to each OfflineEvaluationRunner (actor)). See the ray.rllib.utils.actor_manager.FaultTolerantActorManager class for more details. Tuning these values is important when running experiments with large evaluation batches, where there is the risk that the object store may fill up, causing spilling of objects to disk. This can cause any asynchronous requests to become very slow, making your experiment run slowly as well. You can inspect the object store during your experiment through a call to ray memory on your head node, and by using the Ray dashboard. If you’re seeing that the object store is filling up, turn down the number of remote requests in flight or enable compression or increase the object store memory through, for example: ray.init(object_store_memory=10 * 1024 * 1024 * 1024)  # =10 GB.

  • broadcast_offline_eval_runner_states – True, if merged OfflineEvaluationRunner states (from the central connector pipelines) should be broadcast back to all remote OfflineEvaluationRunner actors.

  • validate_offline_eval_runners_after_construction – Whether to validate that each created remote OfflineEvaluationRunner is healthy after its construction process.

  • restart_failed_offline_eval_runners – Whether - upon an OfflineEvaluationRunner failure - RLlib tries to restart the lost OfflineEvaluationRunner(s) as an identical copy of the failed one(s). You should set this to True when training on SPOT instances that may preempt any time and/or if you need to evaluate always a complete dataset b/c OfflineEvaluationRunner(s) evaluate through streaming split iterators on disjoint batches. The new, recreated OfflineEvaluationRunner(s) only differ from the failed one in their self.recreated_worker=True property value and have the same worker_index as the original(s). If this setting is True, the value of the ignore_offline_eval_runner_failures setting is ignored.

  • ignore_offline_eval_runner_failures – Whether to ignore any OfflineEvalautionRunner failures and continue running with the remaining OfflineEvaluationRunners. This setting is ignored, if restart_failed_offline_eval_runners=True.

  • max_num_offline_eval_runner_restarts – The maximum number of times any OfflineEvaluationRunner is allowed to be restarted (if restart_failed_offline_eval_runners is True).

  • offline_eval_runner_health_probe_timeout_s – Max amount of time in seconds, we should spend waiting for OfflineEvaluationRunner health probe calls (OfflineEvaluationRunner.ping.remote()) to respond. Health pings are very cheap, however, we perform the health check via a blocking ray.get(), so the default value should not be too large.

  • offline_eval_runner_restore_timeout_s – Max amount of time we should wait to restore states on recovered OfflineEvaluationRunner actors. Default is 30 mins.

Returns:

This updated AlgorithmConfig object.