Baselines Quickstart
The baselines contain 13 different memory models, built to work with RLlib. In the following example, we will show you how to run a GRU but with different hidden and recurrent sizes than the original paper. See the ray_models directory for other models.
import popgym
import ray
from torch import nn
from popgym.baselines.ray_models.ray_gru import GRU
# See what GRU-specific hyperparameters we can set
print(GRU.MODEL_CONFIG)
# Show other settable model hyperparameters like
# what the actor/critic branches look like,
# what hidden size to use,
# whether to add a positional embedding, etc.
print(GRU.BASE_CONFIG)
# How long the temporal window for backprop is
# This doesn't need to be longer than 1024
bptt_size = 1024
config = {
"model": {
"max_seq_len": bptt_size,
"custom_model": GRU,
"custom_model_config": {
# Override the hidden_size from BASE_CONFIG
# The input and output sizes of the MLP feeding the memory model
"preprocessor_input_size": 128,
"preprocessor_output_size": 64,
"preprocessor": nn.Sequential(nn.Linear(128, 64), nn.ReLU()),
# this is the size of the recurrent state in most cases
"hidden_size": 128,
# We should also change other parts of the architecture to use
# this new hidden size
# For the GRU, the output is of size hidden_size
"postprocessor": nn.Sequential(nn.Linear(128, 64), nn.ReLU()),
"postprocessor_output_size": 64,
# Actor and critic networks
"actor": nn.Linear(64, 64),
"critic": nn.Linear(64, 64),
# We can also override GRU-specific hyperparams
"num_recurrent_layers": 1,
},
},
# Some other rllib defaults you might want to change
# See https://docs.ray.io/en/latest/rllib/rllib-training.html#common-parameters
# for a full list of rllib settings
#
# These should be a factor of bptt_size
"sgd_minibatch_size": bptt_size * 4,
# Should be a factor of sgd_minibatch_size
"train_batch_size": bptt_size * 8,
# The environment we are training on
"env": "popgym-ConcentrationEasy-v0",
# You probably don't want to change these values
"rollout_fragment_length": bptt_size,
"framework": "torch",
"horizon": bptt_size,
"batch_mode": "complete_episodes",
}
# Stop after 50k environment steps
ray.tune.run("PPO", config=config, stop={"timesteps_total": 50_000})
To add your own custom memory model, inherit from
popgym.baselines.ray_models.base_model and implement the initial_state and memory_forward functions,
as well as define your model configuration using MODEL_CONFIG. To use any of these or your own custom model
in ray, make it the custom_model in the rllib config.