Creating a New Domain¶

This tutorial describes the standard RLPy Domain interface, and illustrates a brief example of creating a new problem domain.

The Domain controls the environment in which the Agent resides as well as the reward function the Agent is subject to.

The Agent interacts with the Domain in discrete timesteps called episodes (see step()). At each step, the Agent informs the Domain what indexed action it wants to perform. The Domain then calculates the effects this action has on the environment and updates its internal state accordingly. It also returns the new state (ns) to the agent, along with a reward/penalty, (r) and whether or not the episode is over (terminal), in which case the agent is reset to its initial state.

This process repeats until the Domain determines that the Agent has either completed its goal or failed. The Experiment controls this cycle.

Because Agents are designed to be agnostic to the Domain that they are acting within and the problem they are trying to solve, the Domain needs to completely describe everything related to the task. Therefore, the Domain must not only define the observations that the Agent receives, but also the states it can be in, the actions that it can perform, and the relationships between the three.

Warning

While each dimension of the state s is either continuous or discrete, discrete dimensions are assume to take nonnegative integer values (ie, the index of the discrete state).

Note

You may want to review the namespace / inheritance / scoping rules in Python.

Requirements¶

Each Domain must be a subclass of Domain and call the __init__() function of the Domain superclass.
Any randomization that occurs at object construction MUST occur in the init_randomization() function, which can be called by __init__().
Any random calls should use self.random_state, not random() or np.random(), as this will ensure consistent seeded results during experiments.
After your agent is complete, you should define a unit test to ensure future revisions do not alter behavior. See rlpy/tests/test_domains for some examples.

REQUIRED Instance Variables¶

The new Domain MUST set these variables BEFORE calling the superclass __init__() function:

self.statespace_limits - Bounds on each dimension of the state space. Each row corresponds to one dimension and has two elements [min, max]. Used for discretization of continuous dimensions.
self.continuous_dims - array of integers; each element is the index (eg, row in statespace_limits above) of a continuous-valued dimension. This array is empty if all states are discrete.
self.DimNames - array of strings, a name corresponding to each dimension (eg one for each row in statespace_limits above)
self.episodeCap - integer, maximum number of steps before an episode terminated (even if not in a terminal state).
actions_num - integer, the total number of possible actions (ie, the size of the action space). This number MUST be a finite integer - continuous action spaces are not currently supported.
discount_factor - float, the discount factor (gamma in literature) by which rewards are reduced.

REQUIRED Functions¶

s0(), (see linked documentation), which returns a (possibly random) state in the domain, to be used at the start of an episode.
step(), (see linked documentation), which returns the tuple (r,ns,terminal, pa) that results from taking action a from the current state (internal to the Domain).
- r is the reward obtained during the transition
- ns is the new state after the transition
- terminal, a boolean, is true if the new state ns is a terminal one to end the episode
- pa, an array of possible actions to take from the new state ns.

SPECIAL Functions¶

In many cases, the Domain will also override the functions:

isTerminal() - returns a boolean whether or not the current (internal) state is terminal. Default is always return False.
possibleActions() - returns an array of possible action indices, which often depend on the current state. Default is to enumerate every possible action, regardless of current state.

OPTIONAL Functions¶

Optionally, define / override the following functions, used for visualization:

showDomain() - Visualization of domain based on current internal state and an action, a. Often the header will include an optional argument s to display instead of the current internal state. RLPy frequently uses matplotlib to accomplish this - see the example below.
showLearning() - Visualization of the “learning” obtained so far on this domain, usually a value function plot and policy plot. See the introductory tutorial for an example on GridWorld

XX expectedStep(), XX

Additional Information¶

As always, the Domain can log messages using self.logger.info(<str>), see Python logger doc.
You should log values assigned to custom parameters when __init__() is called.
See Domain for functions provided by the superclass, especially before defining helper functions which might be redundant.

Example: Creating the `ChainMDP` Domain¶

In this example we will recreate the simple ChainMDP Domain, which consists of n states that can only transition to n-1 or n+1: s0 <-> s1 <-> ... <-> sn n The goal is to reach state sn from s0, after which the episode terminates. The agent can select from two actions: left [0] and right [1] (it never remains in same state). But the transitions are noisy, and the opposite of the desired action is taken instead with some probability. Note that the optimal policy is to always go right.

Create a new file in your current working directory, ChainMDPTut.py. Add the header block at the top:

__copyright__ = "Copyright 2013, RLPy http://www.acl.mit.edu/RLPy"
__credits__ = ["Alborz Geramifard", "Robert H. Klein", "Christoph Dann",
               "William Dabney", "Jonathan P. How"]
__license__ = "BSD 3-Clause"
__author__ = "Ray N. Forcement"

from rlpy.Tools import plt, mpatches, fromAtoB
from rlpy.Domains.Domain import Domain
import numpy as np

Declare the class, create needed members variables (here several objects to be used for visualization and a few domain reward parameters), and write a docstring description:

class ChainMDPTut(Domain):
    """
    Tutorial Domain - nearly identical to ChainMDP.py
    """
    #: Reward for each timestep spent in the goal region
    GOAL_REWARD = 0
    #: Reward for each timestep
    STEP_REWARD = -1
    #: Set by the domain = min(100,rows*cols)
    episodeCap = 0
    # Used for graphical normalization
    MAX_RETURN  = 1
    # Used for graphical normalization
    MIN_RETURN  = 0
    # Used for graphical shifting of arrows
    SHIFT       = .3
    #:Used for graphical radius of states
    RADIUS      = .5
    # Stores the graphical pathes for states so that we can later change their colors
    circles     = None
    #: Number of states in the chain
    chainSize   = 0
    # Y values used for drawing circles
    Y           = 1

Copy the __init__ declaration from Domain.py, add needed parameters (here the number of states in the chain, chainSize), and log them. Assign self.statespace_limits, self.episodeCap, self.continuous_dims, self.DimNames, self.actions_num, and self.discount_factor. Then call the superclass constructor:

def __init__(self, chainSize=2):
    """
    :param chainSize: Number of states \'n\' in the chain.
    """
    self.chainSize          = chainSize
    self.start              = 0
    self.goal               = chainSize - 1
    self.statespace_limits  = np.array([[0,chainSize-1]])
    self.episodeCap         = 2*chainSize
    self.continuous_dims    = []
    self.DimNames           = ['State']
    self.actions_num        = 2
    self.discount_factor    = 0.9
    super(ChainMDPTut,self).__init__()

Copy the step() and function declaration and implement it accordingly to return the tuple (r,ns,isTerminal,possibleActions), and similarly for s0(). We want the agent to always start at state [0] to begin, and only achieves reward and terminates when s = [n-1]:

def step(self,a):
    s = self.state[0]
    if a == 0: #left
        ns = max(0,s-1)
    if a == 1: #right
        ns = min(self.chainSize-1,s+1)
    self.state = np.array([ns])

    terminal = self.isTerminal()
    r = self.GOAL_REWARD if terminal else self.STEP_REWARD
    return r, ns, terminal, self.possibleActions()

def s0(self):
    self.state = np.array([0])
    return self.state, self.isTerminal(), self.possibleActions()

In accordance with the above termination condition, override the isTerminal() function by copying its declaration from Domain.py:
```
def isTerminal(self):
    s = self.state
    return (s[0] == self.chainSize - 1)
```

For debugging convenience, demonstration, and entertainment, create a domain visualization by overriding the default (which is to do nothing). With matplotlib, generally this involves first performing a check to see if the figure object needs to be created (and adding objects accordingly), otherwise merely updating existing plot objects based on the current self.state and action a:

def showDomain(self, a = 0):
    #Draw the environment
    s = self.state
    s = s[0]
    if self.circles is None: # We need to draw the figure for the first time
       fig = plt.figure(1, (self.chainSize*2, 2))
       ax = fig.add_axes([0, 0, 1, 1], frameon=False, aspect=1.)
       ax.set_xlim(0, self.chainSize*2)
       ax.set_ylim(0, 2)
       ax.add_patch(mpatches.Circle((1+2*(self.chainSize-1), self.Y), self.RADIUS*1.1, fc="w")) #Make the last one double circle
       ax.xaxis.set_visible(False)
       ax.yaxis.set_visible(False)
       self.circles = [mpatches.Circle((1+2*i, self.Y), self.RADIUS, fc="w") for i in np.arange(self.chainSize)]
       for i in np.arange(self.chainSize):
           ax.add_patch(self.circles[i])
           if i != self.chainSize-1:
                fromAtoB(1+2*i+self.SHIFT,self.Y+self.SHIFT,1+2*(i+1)-self.SHIFT, self.Y+self.SHIFT)
                if i != self.chainSize-2: fromAtoB(1+2*(i+1)-self.SHIFT,self.Y-self.SHIFT,1+2*i+self.SHIFT, self.Y-self.SHIFT, 'r')
           fromAtoB(.75,self.Y-1.5*self.SHIFT,.75,self.Y+1.5*self.SHIFT,'r',connectionstyle='arc3,rad=-1.2')
           plt.show()

    [p.set_facecolor('w') for p in self.circles]
    self.circles[s].set_facecolor('k')
    plt.draw()

Note

When first creating a matplotlib figure, you must call pl.show(); when updating the figure on subsequent steps, use pl.draw().

That’s it! Now test it by creating a simple settings file on the domain of your choice. An example experiment is given below:

#!/usr/bin/env python
"""
Domain Tutorial for RLPy
=================================

Assumes you have created the ChainMDPTut.py domain according to the
tutorial and placed it in the current working directory.
Tests the agent using SARSA with a tabular representation.
"""
__author__ = "Robert H. Klein"
from rlpy.Agents import SARSA
from rlpy.Representations import Tabular
from rlpy.Policies import eGreedy
from rlpy.Experiments import Experiment
from ChainMDPTut import ChainMDPTut
import os
import logging


def make_experiment(exp_id=1, path="./Results/Tutorial/ChainMDPTut-SARSA"):
    """
    Each file specifying an experimental setup should contain a
    make_experiment function which returns an instance of the Experiment
    class with everything set up.

    @param id: number used to seed the random number generators
    @param path: output directory where logs and results are stored
    """
    opt = {}
    opt["exp_id"] = exp_id
    opt["path"] = path

    ## Domain:
    chainSize = 50
    domain = ChainMDPTut(chainSize=chainSize)
    opt["domain"] = domain

    ## Representation
    # discretization only needed for continuous state spaces, discarded otherwise
    representation  = Tabular(domain)

    ## Policy
    policy = eGreedy(representation, epsilon=0.2)

    ## Agent
    opt["agent"] = SARSA(policy=policy, representation=representation, discount_factor=domain.discount_factor, initial_learn_rate=0.1)
    opt["checks_per_policy"] = 100
    opt["max_steps"] = 2000
    opt["num_policy_checks"] = 10
    experiment = Experiment(**opt)
    return experiment

if __name__ == '__main__':
    experiment = make_experiment(1)
    experiment.run(visualize_steps=False,  # should each learning step be shown?
                   visualize_learning=True,  # show policy / value function?
                   visualize_performance=1)  # show performance runs?
    experiment.plot()
    experiment.save()

What to do next?¶

In this Domain tutorial, we have seen how to

Write a Domain that inherits from the RLPy base Domain class
Override several base functions
Create a visualization
Add the Domain to RLPy and test it

Adding your component to RLPy¶

If you would like to add your component to RLPy, we recommend developing on the development version (see Development Version). Please use the following header template at the top of each file:

__copyright__ = "Copyright 2013, RLPy http://www.acl.mit.edu/RLPy"
__credits__ = ["Alborz Geramifard", "Robert H. Klein", "Christoph Dann",
                "William Dabney", "Jonathan P. How"]
__license__ = "BSD 3-Clause"
__author__ = "Tim Beaver"

Fill in the appropriate __author__ name and __credits__ as needed. Note that RLPy requires the BSD 3-Clause license.

If you installed RLPy in a writeable directory, the className of the new domain can be added to the __init__.py file in the Domains/ directory. (This allows other files to import the new domain).
If available, please include a link or reference to the publication associated with this implementation (and note differences, if any).

If you would like to add your new domain to the RLPy project, we recommend you branch the project and create a pull request to the RLPy repository.

You can also email the community list rlpy@mit.edu for comments or questions. To subscribe click here.

RLPy

Table Of Contents

Previous topic

Next topic

This Page

Creating a New Domain¶

Requirements¶

REQUIRED Instance Variables¶

REQUIRED Functions¶

SPECIAL Functions¶

OPTIONAL Functions¶

Additional Information¶

Example: Creating the `ChainMDP` Domain¶

What to do next?¶

Adding your component to RLPy¶

RLPy

Creating a New Domain¶

Requirements¶

REQUIRED Instance Variables¶

REQUIRED Functions¶

SPECIAL Functions¶

OPTIONAL Functions¶

Additional Information¶

Example: Creating the ChainMDP Domain¶

What to do next?¶

Adding your component to RLPy¶

Example: Creating the `ChainMDP` Domain¶