Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[RLlib; docs] Docs do-over (new API stack): Add new AlgorithmConfig rst page and redo package_ref page for algo configs. #49464

Open
wants to merge 15 commits into
base: master
Choose a base branch
from
281 changes: 281 additions & 0 deletions doc/source/rllib/algorithm-config.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,281 @@
.. include:: /_includes/rllib/we_are_hiring.rst

.. include:: /_includes/rllib/new_api_stack.rst

.. _rllib-algo-configuration-docs:

AlgorithmConfig API
===================

RLlib's :py:class:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig` API is
the auto-validated and type-safe gateway into configuring and building an RLlib
:py:class:`~ray.rllib.algorithms.algorithm.Algorithm`.

In essence, you first create an instance of :py:class:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig`
and then call some of its methods to set various configuration options. RLlib uses the following, `black <https://github.com/psf/black>`__ compliant format
in all parts of its code.

Note that you can chain together more than one method call, including the constructor:

.. testcode::

from ray.rllib.algorithms.algorithm_config import AlgorithmConfig

config = (
# Create an `AlgorithmConfig` instance.
AlgorithmConfig()
# Change the learning rate.
.training(lr=0.0005)
# Change the number of Learner actors.
.learners(num_learners=2)
)

.. hint::

For value checking and type-safety reasons, you should never set attributes in your
:py:class:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig`
directly, but always go through the proper methods:

.. testcode::

# WRONG!
config.env = "CartPole-v1" # <- don't set attributes directly

# CORRECT!
config.environment(env="CartPole-v1") # call the proper method
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great example!



Algorithm specific config classes
---------------------------------

You don't use the base ``AlgorithmConfig`` class directly in practice, but always its algorithm-specific
subclasses, such as :py:class:`~ray.rllib.algorithms.ppo.ppo.PPOConfig`. Each subclass comes
with its own set of additional arguments to the :py:meth:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig.training`
method.

Normally, you should pick the specific :py:class:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig`
subclass that matches the :py:class:`~ray.rllib.algorithms.algorithm.Algorithm`
you would like to run your learning experiments with. For example, if you would like to
use :ref:`IMPALA <impala>` as your algorithm, you should import its specific config class:

.. testcode::

from ray.rllib.algorithms.impala import IMPALAConfig

config = (
# Create an `IMPALAConfig` instance.
IMPALAConfig()
# Specify the RL environment.
.environment("CartPole-v1")
# Change the learning rate.
.training(lr=0.0004)
)

To change algorithm-specific settings, here for ``IMPALA``, also use the
:py:meth:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig.training` method:

.. testcode::

# Change an IMPALA-specific setting (the entropy coefficient).
config.training(entropy_coeff=0.01)


You can build the :py:class:`~ray.rllib.algorithms.impala.IMPALA` instance directly from the
config object through calling the
:py:meth:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig.build_algo` method:

.. testcode::

# Build the algorithm instance.
impala = config.build_algo()

.. testcode::
:hide:

impala.stop()

The config object stored inside any built :py:class:`~ray.rllib.algorithms.algorithm.Algorithm` instance
is a copy of your original config. This allows you to further alter your original config object and
build another algorithm instance without affecting the previously built one:

.. testcode::

# Further alter the config without affecting the previously built IMPALA object ...
config.env_runners(num_env_runners=4)
# ... and build a new IMPALA from it.
another_impala = config.build_algo()

.. testcode::
:hide:

another_impala.stop()

If you are working with `Ray Tune <https://docs.ray.io/en/latest/tune/index.html>`__,
pass your :py:class:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig`
instance into the constructor of the :py:class:`~ray.tune.tuner.Tuner`:

.. code-block:: python
from ray import tune
tuner = tune.Tuner(
"IMPALA",
param_space=config, # <- your RLlib AlgorithmConfig object
..
)
# Run the experiment with Ray Tune.
results = tuner.fit()
Generic config settings
-----------------------

Most config settings are generic and apply to all of RLlib's :py:class:`~ray.rllib.algorithms.algorithm.Algorithm` classes.
The following sections walk you through the most important config settings users should pay close attention to before
diving further into other config settings and before starting with hyperparameter fine tuning.

RL Environment
~~~~~~~~~~~~~~

To configure, which :ref:`RL environment <rllib-environments-doc>` your algorithm trains against, use the ``env`` argument to the
:py:meth:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig.environment` method:

.. testcode::

config.environment()

See this :ref:`RL environment guide <rllib-environments-doc>` for more details.

Learning rate `lr`
~~~~~~~~~~~~~~~~~~

Set the learning rate for updating your models through the ``lr`` argument to the
:py:meth:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig.training` method:

.. testcode::

config.training(lr=0.0001)

.. _rllib-algo-configuration-train-batch-size:

Train batch size
~~~~~~~~~~~~~~~~

Set the train batch size, per Learner actor,
through the ``train_batch_size_per_learner`` argument to the :py:meth:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig.training`
method:

.. testcode::

config.training(train_batch_size_per_learner=256)

.. note::
You can compute the total, effective train batch size through multiplying
``train_batch_size_per_learner`` with ``(num_learners or 1)``.
Or you can also just check the value of your config's
:py:attr:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig.total_train_batch_size` property:

.. testcode::

config.training(train_batch_size_per_learner=256)
config.learners(num_learners=2)
print(config.total_train_batch_size) # expect: 512 = 256 * 2


Discount factor `gamma`
~~~~~~~~~~~~~~~~~~~~~~~

Set the `RL discount factor <https://www.envisioning.io/vocab/discount-factor?utm_source=chatgpt.com>`__
through the ``gamma`` argument to the :py:meth:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig.training`
method:

.. testcode::

config.training(gamma=0.995)

Scaling with `num_env_runners` and `num_learners`
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

.. todo (sven): link to scaling guide, once separated out in its own rst.
Set the number of :py:class:`~ray.rllib.env.env_runner.EnvRunner` actors used to collect training samples
through the ``num_env_runners`` argument to the :py:meth:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig.env_runners`
method:

.. testcode::

config.env_runners(num_env_runners=4)

# Also use `num_envs_per_env_runner` to vectorize your environment on each EnvRunner actor.
# Note that this option is only available in single-agent setups.
# The Ray Team is working on a solution for this restriction.
config.env_runners(num_envs_per_env_runner=10)

Set the number of :py:class:`~ray.rllib.core.learner.learner.Learner` actors used to update your models
through the ``num_learners`` argument to the :py:meth:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig.learners`
method. This should correspond to the number of GPUs you have available for training.

.. testcode::

config.learners(num_learners=2)

Disable `explore` behavior
~~~~~~~~~~~~~~~~~~~~~~~~~~

Switch off/on exploratory behavior
through the ``explore`` argument to the :py:meth:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig.env_runners`
method. To compute actions, the :py:class:`~ray.rllib.env.env_runner.EnvRunner` calls `forward_exploration()` on the RLModule when ``explore=True``
and `forward_inference()` when ``explore=False``. The default value is ``explore=True``.

.. testcode::

# Disable exploration behavior.
# When False, the EnvRunner calls `forward_inference()` on the RLModule to compute
# actions instead of `forward_exploration()`.
config.env_runners(explore=False)

Rollout length
~~~~~~~~~~~~~~

Set the number of timesteps that each :py:class:`~ray.rllib.env.env_runner.EnvRunner` steps
through with each of its RL environment copies through the ``rollout_fragment_length`` argument.
Pass this argument to the :py:meth:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig.env_runners`
method. Note that some algorithms, like :py:class:`~ray.rllib.algorithms.ppo.PPO`,
set this value automatically, based on the :ref:`train batch size <rllib-algo-configuration-train-batch-size>`,
number of :py:class:`~ray.rllib.env.env_runner.EnvRunner` actors and number of envs per
:py:class:`~ray.rllib.env.env_runner.EnvRunner`.

.. testcode::

config.env_runners(rollout_fragment_length=50)

All available methods and their settings
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Besides the previously described most common settings, the :py:class:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig`
class and its algo-specific subclasses come with many more configuration options.

To structure things more semantically, :py:class:`~ray.rllib.algorithms.algorithm_config.AlgorithmConfig` groups
its various config settings into the following categories, each represented by its own method:

- :ref:`Config settings for the RL environment <rllib-config-env>`
- :ref:`Config settings for training behavior (including algo-specific settings) <rllib-config-training>`
- :ref:`Config settings for EnvRunners <rllib-config-env-runners>`
- :ref:`Config settings for Learners <rllib-config-learners>`
- :ref:`Config settings for adding callbacks <rllib-config-callbacks>`
- :ref:`Config settings for multi-agent setups <rllib-config-multi_agent>`
- :ref:`Config settings for offline RL <rllib-config-offline_data>`
- :ref:`Config settings for evaluating policies <rllib-config-evaluation>`
- :ref:`Config settings for the DL framework <rllib-config-framework>`
- :ref:`Config settings for reporting and logging behavior <rllib-config-reporting>`
- :ref:`Config settings for checkpointing <rllib-config-checkpointing>`
- :ref:`Config settings for debugging <rllib-config-debugging>`
- :ref:`Experimental config settings <rllib-config-experimental>`

To familiarize yourself with the vast number of RLlib's different config options, you can browse through
`RLlib's examples folder <https://github.com/ray-project/ray/tree/master/rllib/examples>`__ or take a look at this
:ref:`examples folder overview page <rllib-examples-overview-docs>`.

Each example script usually introduces a new config setting or shows you how to implement specific customizations through
a combination of setting certain config options and adding custom code to your experiment.
1 change: 1 addition & 0 deletions doc/source/rllib/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -46,6 +46,7 @@ RLlib: Industry-Grade, Scalable Reinforcement Learning
rllib-training
key-concepts
rllib-env
algorithm-config
rllib-algorithms
user-guides
rllib-examples
Expand Down
Loading
Loading