api.config.reward_function

Classes

RewardFunction

The reward function. This class is responsible for computing the reward for each agent in the environment.

Module Contents

class api.config.reward_function.RewardFunction

Bases: Generic[api.typing.AgentID, api.typing.StateType, api.typing.RewardType]

The reward function. This class is responsible for computing the reward for each agent in the environment.

abstract reset(agents: List[api.typing.AgentID], initial_state: api.typing.StateType, shared_info: Dict[str, Any]) → None

Function to be called each time the environment is reset. This is meant to enable users to design stateful reward functions that maintain information about the game throughout an episode to determine a reward.

Parameters:

agents – List of AgentIDs for which this RewardFunc will return a Reward
initial_state – The initial state of the reset environment.
shared_info – A dictionary with shared information across all config objects.

abstract get_rewards(agents: List[api.typing.AgentID], state: api.typing.StateType, is_terminated: Dict[api.typing.AgentID, bool], is_truncated: Dict[api.typing.AgentID, bool], shared_info: Dict[str, Any]) → Dict[api.typing.AgentID, api.typing.RewardType]

Function to compute the reward for a player. This function is given a player argument, and it is expected that the reward returned by this function will be for that player.

Parameters:

agents – List of AgentIDs for which this RewardFunc should return a Reward
state – The current state of the game.
is_terminated – TODO.
is_truncated – TODO.
shared_info – A dictionary with shared information across all config objects.

Returns:

A dict of rewards, one for each AgentID in agents.