Table of Contents:
Repository Link: RL-X
Most documentation is available in the README.md files in the respective directories:
/: Overview, Getting Started and Citation sections/experiments/: Information on the different ways to run experiments, experiment tracking and saving/loading of models/experiments/docker/: Information on how to run experiments in a Docker container/rl_x/algorithms/: Information on the folder structure of algorithms, how to add new algorithms and how to mix and match them with environments/rl_x/algorithms/aqe/: Implementation details of the Aggressive Q-Learning with Ensembles (AQE) algorithm/rl_x/algorithms/c51/: Implementation details of the Categorical Deep Q-Network (C51) algorithm/rl_x/algorithms/crossq/: Implementation details of the CrossQ algorithm/rl_x/algorithms/ddpg/: Implementation details of the Deep Deterministic Policy Gradient (DDPG) algorithm/rl_x/algorithms/dqn/: Implementation details of the Deep Q-Network (DQN) algorithm/rl_x/algorithms/ddqn/: Implementation details of the Double Deep Q-Network (DDQN) algorithm/rl_x/algorithms/dqn_hl_gauss/: Implementation details of the Deep Q-Network with Histogram Loss using Gaussians (DQN HL-Gauss) algorithm/rl_x/algorithms/droq/: Implementation details of the Dropout Q-Functions (DroQ) algorithm/rl_x/algorithms/espo/: Implementation details of the Early Stopping Policy Optimization (ESPO) algorithm/rl_x/algorithms/fastsac/: Implementation details of the Fast Soft Actor-Critic (FastSAC) algorithm/rl_x/algorithms/fasttd3/: Implementation details of the Fast Twin Delayed Deep Deterministic Gradient (FastTD3) algorithm/rl_x/algorithms/mpo/: Implementation details of the Maximum a Posteriori Policy Optimization (MPO) algorithm/rl_x/algorithms/ppo/: Implementation details of the Proximal Policy Optimization (PPO) algorithm/rl_x/algorithms/ppo_dtrl/: Implementation details of Differentiable Trust Region Layers in combination with the Proximal Policy Optimization (PPO+DTRL) algorithm/rl_x/algorithms/pqn/: Implementation details of the Parallelized Q-Network (PQN) algorithm/rl_x/algorithms/redq/: Implementation details of the Randomized Ensembled Double Q-Learning (REDQ) algorithm/rl_x/algorithms/sac/: Implementation details of the Soft Actor Critic (SAC) algorithm/rl_x/algorithms/td3/: Implementation details of the Twin Delayed Deep Deterministic Gradient (TD3) algorithm/rl_x/algorithms/tqc/: Implementation details of the Truncated Quantile Critics (TQC) algorithm/rl_x/environments/: Information on the folder structure of environments, how to add new environments and how to mix and match them with algorithms/rl_x/environments/custom_interface/: Implementation details of the custom environment interface with simple socket communication/rl_x/environments/custom_isaac_lab/: Implementation details of the custom Isaac Lab environment examples/rl_x/environments/custom_maniskill/: Implementation details of the custom ManiSkill environment examples/rl_x/environments/custom_mujoco/: Implementation details of the custom MuJoCo environment examples (with and without MJX)/rl_x/environments/custom_mujoco/robot_locomotion/: Details on the robot locomotion MuJoCo and MJX environments, to train a quadruped (Unitree Go2) or humanoid (Unitree G1) robot to walk/rl_x/environments/custom_mujoco/robot_locomotion/deployment/unitree_go2/: Instructions on how to use a trained policy to deploy it on a real Unitree Go2 robot/rl_x/environments/envpool/: Details of the EnvPool environments/rl_x/environments/gym/: Details of the Gymnasium environments/rl_x/environments/mujoco_playground/: Details of the MuJoCo Playground environments/rl_x/runner/: Information on the folder structure of the runner class and how to use it to run experimentsFor Linux, MacOS and Windows, a conda environment is recommended.
All the code was tested with Python 3.11.4, other versions might work as well.
conda create -n rlx python=3.11.4
conda activate rlx
For Linux, MacOS and Windows, RL-X has to be cloned.
git clone git@github.com:nico-bohlinger/RL-X.git
cd RL-X
For Linux, all dependencies can be installed with the following command:
pip install -e .[all]
For MacOS and Windows, EnvPool is currently not supported. Therefore, the following command has to be used:
pip install -e .
To keep linting support when registering algorithms or environments outside of RL-X, add the editable_mode=compat argument, e.g.:
pip install -e .[all] --config-settings editable_mode=compat
For Linux, MacOS and Windows, PyTorch has to be installed separately to use the CUDA 11.8 version such that there are no conflicts with JAX. If PyTorch was previously installed with CUDA 12.X (potentially even through pip install -e .) then it is necessary to uninstall the related packages.
pip uninstall $(pip freeze | grep -i '\-cu12' | cut -d '=' -f 1) -y
Afterwards, PyTorch can be installed with the following command:
pip install "torch>=2.7.0" --index-url https://download.pytorch.org/whl/cu118 --upgrade
For Linux, JAX with GPU support can be installed with the following command:
pip install "jax[cuda12]"
For MacOS and Windows, JAX with GPU support is not supported out-of-the-box. However, it can be done with some extra effort (see here for more information).
Only needed when using Isaac Lab environments. For Linux systems with NVIDIA GPUs, Isaac Sim can be installed by following the instructions:
pip install "isaacsim[all,extscache]==5.1.0" --extra-index-url https://pypi.nvidia.com
Make sure to install the compatible version of torch alongside it:
pip install -U torch==2.7.0 torchvision==0.22.0 --index-url https://download.pytorch.org/whl/cu128
Verify the Isaac Sim installation by running:
isaacsim
Clone Isaac Lab into any directory (outside of the RL-X repository):
git clone git@github.com:isaac-sim/IsaacLab.git
Install the Isaac Lab package:
cd IsaacLab/
./isaaclab.sh --install
Isaac Sim / Lab requires a newer version of gymnasium and an older version of numpy. The gymnasium version should be downgraded after installing Isaac Sim / Lab back to the RL-X compatible version:
pip install "gymnasium[mujoco,classic-control,atari,accept-rom-license,other]<=0.29.1"
Isaac Sim / Lab requires the specific numpy version it comes with. This causes a conflict with RL-X, which requires a newer numpy version to run the newest version of JAX. Furthermore, Isaac Sim / Lab builds on PyTorch using CUDA 12, while RL-X by default uses PyTorch with CUDA 11 to avoid conflicts with JAX. Therefore and in general, Isaac Lab environments should always be used with algorithms running on PyTorch. Make sure numpy is at the correct version:
pip install numpy==1.26.0
Verify the Isaac Lab installation works by running:
python scripts/tutorials/00_sim/create_empty.py
It looks like ManiSkill doesn’t work with the newest numpy version. Therefore, it might be necessary to downgrade numpy when using ManiSkill environments:
pip install "numpy<2"
To run experiments in Google Colab take a look experiments/colab_experiment.ipynb or directly open it here:
The gymnasium, custom MuJoCo and custom interface environments support parallel asynchronous vectorized environments with skipping.
When using many parallel environments, it can happen that some environments are faster than others at a given time step.
With the default implementation of the AsyncVectorEnv wrapper from gymnasium, a combined step is only completed once all environments have finished their step, which can lead to a lot of idle waiting time.
Therefore, the AsyncVectorEnvWithSkipping wrapper allows to skip up to the slowest x% of environments and sends dummy values for the skipped environments to the algorithm instead.
Be careful, this can lead to a learning performance decrease, depending on how many environments are skipped and how well the dummy values align with the environment.
Even when no environment should be skipped, the AsyncVectorEnvWithSkipping wrapper can still lead to a runtime improvement compared to the default gymnasium wrapper, because the latter waits sequentially for each environment to finish its step, while the former keeps looping over all environments until they are all finished.
Therefore, it can already collect the data from some environments while the others are still running their step.
To set the maximum percentage of environments that can be skipped, set the corresponding command line argument:
No environment is skipped:
--environment.async_skip_percentage=0.0
Up to 25% of the environments can be skipped:
--environment.async_skip_percentage=0.25
Up to 100% of the environments can be skipped:
--environment.async_skip_percentage=1.0