popgym.baselines.ray_models.ray_linear_attention
Module Contents
Classes
The Fast Autoregressive Transformer (FART, lol) from |
|
A multi-layer version of linear attention. |
- class popgym.baselines.ray_models.ray_linear_attention.LinearAttention(obs_space: gymnasium.spaces.Space, action_space: gymnasium.spaces.Space, num_outputs: int, model_config: ray.rllib.utils.typing.ModelConfigDict, name: str, **custom_model_kwargs)
Bases:
popgym.baselines.ray_models.base_model.BaseModelThe Fast Autoregressive Transformer (FART, lol) from
@inproceedings{katharopoulos_transformers_2020, title = { Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention }, shorttitle = {Transformers are {RNNs}}, url = {https://proceedings.mlr.press/v119/katharopoulos20a.html}, language = {en}, urldate = {2022-09-21}, booktitle = { Proceedings of the 37th {International} {Conference} on {Machine} {Learning} }, publisher = {PMLR}, author = { Katharopoulos, Angelos and Vyas, Apoorv and Pappas, Nikolaos and Fleuret, François }, month = nov, year = {2020}, note = {ISSN: 2640-3498}, pages = {5156--5165}, }- MODEL_CONFIG
- initial_state() List[ray.rllib.utils.typing.TensorType]
Return the initial states for your memory model.
The shape of states returned here should NOT contain the batch dimension. The batch dimension will be prepended by RLlib, depending on the number of episodes/rollouts in the batch
- Returns:
List of tensors denoting the t=0 recurrent state
- forward_memory(z: ray.rllib.utils.typing.TensorType, state: List[ray.rllib.utils.typing.TensorType], t_starts: ray.rllib.utils.typing.TensorType, seq_lens: ray.rllib.utils.typing.TensorType) Tuple[ray.rllib.utils.typing.TensorType, List[ray.rllib.utils.typing.TensorType]]
Forward for your custom memory model.
- Args:
- z: Preprocessed features of shape [B, T, F], with padding along the time
dimension
state: Recurrent states of shape [B, …] t_starts: Tensor of size [B] denoteint the length of the rollout so far.
Note that in some cases RLlib might chunk a long rollout into multiple forward passes. This tracks the length throughout all forward passes.
- seq_lens: Tensor of size [B] denoting the number of non-padding elements
in z. E.g. seq_lens == [100, 60], z.shape[1] == 128 means the first 100 elements of the first batch dimension are valid and the first 60 elements of the second batch dimension are valid. Rest are padding.
- Returns:
- (output, state)
where output is [B, T, D] and state is [B, …]. Note that the padding must be present in the output. The state must be exactly the same shape as the input state.
- class popgym.baselines.ray_models.ray_linear_attention.DeepLinearAttention(obs_space: gymnasium.spaces.Space, action_space: gymnasium.spaces.Space, num_outputs: int, model_config: ray.rllib.utils.typing.ModelConfigDict, name: str, **custom_model_kwargs)
Bases:
LinearAttentionA multi-layer version of linear attention.
- MODEL_CONFIG
- initial_state() List[ray.rllib.utils.typing.TensorType]
Return the initial states for your memory model.
The shape of states returned here should NOT contain the batch dimension. The batch dimension will be prepended by RLlib, depending on the number of episodes/rollouts in the batch
- Returns:
List of tensors denoting the t=0 recurrent state
- forward_memory(z: ray.rllib.utils.typing.TensorType, state: List[ray.rllib.utils.typing.TensorType], t_starts: ray.rllib.utils.typing.TensorType, seq_lens: ray.rllib.utils.typing.TensorType) Tuple[ray.rllib.utils.typing.TensorType, List[ray.rllib.utils.typing.TensorType]]
Forward for your custom memory model.
- Args:
- z: Preprocessed features of shape [B, T, F], with padding along the time
dimension
state: Recurrent states of shape [B, …] t_starts: Tensor of size [B] denoteint the length of the rollout so far.
Note that in some cases RLlib might chunk a long rollout into multiple forward passes. This tracks the length throughout all forward passes.
- seq_lens: Tensor of size [B] denoting the number of non-padding elements
in z. E.g. seq_lens == [100, 60], z.shape[1] == 128 means the first 100 elements of the first batch dimension are valid and the first 60 elements of the second batch dimension are valid. Rest are padding.
- Returns:
- (output, state)
where output is [B, T, D] and state is [B, …]. Note that the padding must be present in the output. The state must be exactly the same shape as the input state.