ray.rllib.algorithms.algorithm.Algorithm.compute_actions#

Algorithm.compute_actions(observations: Union[numpy.array, jnp.ndarray, tf.Tensor, torch.Tensor, dict, tuple], state: Optional[List[Union[numpy.array, jnp.ndarray, tf.Tensor, torch.Tensor, dict, tuple]]] = None, *, prev_action: Optional[Union[numpy.array, jnp.ndarray, tf.Tensor, torch.Tensor, dict, tuple]] = None, prev_reward: Optional[Union[numpy.array, jnp.ndarray, tf.Tensor, torch.Tensor, dict, tuple]] = None, info: Optional[dict] = None, policy_id: str = 'default_policy', full_fetch: bool = False, explore: Optional[bool] = None, timestep: Optional[int] = None, episodes: Optional[List[ray.rllib.evaluation.episode.Episode]] = None, unsquash_actions: Optional[bool] = None, clip_actions: Optional[bool] = None, normalize_actions=None, **kwargs)[source]#

Computes an action for the specified policy on the local Worker.

Note that you can also access the policy object through self.get_policy(policy_id) and call compute_actions() on it directly.

Parameters

observation – Observation from the environment.
state – RNN hidden state, if any. If state is not None, then all of compute_single_action(…) is returned (computed action, rnn state(s), logits dictionary). Otherwise compute_single_action(…)[0] is returned (computed action).
prev_action – Previous action value, if any.
prev_reward – Previous reward, if any.
info – Env info dict, if any.
policy_id – Policy to query (only applies to multi-agent).
full_fetch – Whether to return extra action fetch results. This is always set to True if RNN state is specified.
explore – Whether to pick an exploitation or exploration action (default: None -> use self.config.explore).
timestep – The current (sampling) time step.
episodes – This provides access to all of the internal episodes’ state, which may be useful for model-based or multi-agent algorithms.
unsquash_actions – Should actions be unsquashed according to the env’s/Policy’s action space? If None, use self.config.normalize_actions.
clip_actions – Should actions be clipped according to the env’s/Policy’s action space? If None, use self.config.clip_actions.

Keyword Arguments

kwargs – forward compatibility placeholder

Returns

The computed action if full_fetch=False, or a tuple consisting of the full output of policy.compute_actions_from_input_dict() if full_fetch=True or we have an RNN-based Policy.

Ray 2.5.0

ray.rllib.algorithms.algorithm.Algorithm.compute_actions

ray.rllib.algorithms.algorithm.Algorithm.compute_actions#