RLPy is a framework for conducting sequential decision making experiments that involve value-function based approaches. It provides a modular toolbox, where various components can be linked together to create experiments.

The Big Picture


Reinforcement Learning (RL)

Setting up an RL experiment requires selecting the following 4 key components:

  1. Agent: This is the box where learning happens. It is often done by changing the weight vector corresponding to the features.
  2. Policy: This box is responsible to generate actions based on the current states. The action selection mechanism often dependends on the estimated value function.
  3. Representation: In this framework, we assume the use of linear function approximators to represent the value function. This box realizes the underlying representation used for capturing the value function. Note that the features used for approximation can be non-linear.
  4. Domain: This box is an MDP that we are interested to solve.

The Experiment class works as a glue that connect all these pieces together.

Dynamic Programming

If the full model of the MDP is known, Dynamic Programming techniques can be used to solve the MDP. To setup a DP experiment the following 3 components have to be set:

  1. MDP Solver: Dynamic programming algorithm
  2. Representation: Same as the RL case. Notice that the Value Iteration and Policy Iteration techniques can be only coupled with the tabular representation.
  3. Domain: Same as the RL case.


Each of the components mentioned here has several realizations in RLPy, yet this website provides guidance only on the main abstract classes, namely: Agent, MDP Solver, Representation, Policy, Domain and Experiment

The tutorial page provides simple 10-15 minutes examples on how various experiments can be setup and used.n


The project was partially funded by ONR and AFOSR grants.

