.. _make_domain: .. this is a comment. see http://sphinx-doc.org/rest.html for markup instructions Creating a New Domain ===================== This tutorial describes the standard RLPy :class:`~rlpy.Domains.Domain.Domain` interface, and illustrates a brief example of creating a new problem domain. .. Below taken directly from Domain.py The Domain controls the environment in which the :class:`~rlpy.Agents.Agent.Agent` resides as well as the reward function the Agent is subject to. The Agent interacts with the Domain in discrete timesteps called *episodes* (see :func:`~rlpy.Domains.Domain.Domain.step`). At each step, the Agent informs the Domain what indexed action it wants to perform. The Domain then calculates the effects this action has on the environment and updates its internal state accordingly. It also returns the new state (*ns*) to the agent, along with a reward/penalty, (*r*) and whether or not the episode is over (*terminal*), in which case the agent is reset to its initial state. This process repeats until the Domain determines that the Agent has either completed its goal or failed. The :py:class:`~rlpy.Experiments.Experiment.Experiment` controls this cycle. Because Agents are designed to be agnostic to the Domain that they are acting within and the problem they are trying to solve, the Domain needs to completely describe everything related to the task. Therefore, the Domain must not only define the observations that the Agent receives, but also the states it can be in, the actions that it can perform, and the relationships between the three. .. warning:: While each dimension of the state *s* is either *continuous* or *discrete*, discrete dimensions are assume to take nonnegative **integer** values (ie, the index of the discrete state). .. note :: You may want to review the namespace / inheritance / scoping `rules in Python `_. Requirements ------------ * Each Domain must be a subclass of :class:`~rlpy.Domains.Domain.Domain` and call the :func:`~rlpy.Domains.Domain.Domain.__init__` function of the Domain superclass. * Any randomization that occurs at object construction *MUST* occur in the :func:`~rlpy.Domains.Domain.Domain.init_randomization` function, which can be called by ``__init__()``. * Any random calls should use ``self.random_state``, not ``random()`` or ``np.random()``, as this will ensure consistent seeded results during experiments. * After your agent is complete, you should define a unit test to ensure future revisions do not alter behavior. See rlpy/tests/test_domains for some examples. REQUIRED Instance Variables """"""""""""""""""""""""""" The new Domain *MUST* set these variables *BEFORE* calling the superclass ``__init__()`` function: #. ``self.statespace_limits`` - Bounds on each dimension of the state space. Each row corresponds to one dimension and has two elements [min, max]. Used for discretization of continuous dimensions. #. ``self.continuous_dims`` - array of integers; each element is the index (eg, row in ``statespace_limits`` above) of a continuous-valued dimension. This array is empty if all states are discrete. #. ``self.DimNames`` - array of strings, a name corresponding to each dimension (eg one for each row in ``statespace_limits`` above) #. ``self.episodeCap`` - integer, maximum number of steps before an episode terminated (even if not in a terminal state). #. ``actions_num`` - integer, the total number of possible actions (ie, the size of the action space). This number **MUST** be a finite integer - continuous action spaces are not currently supported. #. ``discount_factor`` - float, the discount factor (gamma in literature) by which rewards are reduced. REQUIRED Functions """""""""""""""""" #. :func:`~rlpy.Domains.Domain.Domain.s0`, (see linked documentation), which returns a (possibly random) state in the domain, to be used at the start of an *episode*. #. :func:`~rlpy.Domains.Domain.Domain.step`, (see linked documentation), which returns the tuple ``(r,ns,terminal, pa)`` that results from taking action *a* from the current state (internal to the Domain). * *r* is the reward obtained during the transition * *ns* is the new state after the transition * *terminal*, a boolean, is true if the new state *ns* is a terminal one to end the episode * *pa*, an array of possible actions to take from the new state *ns*. SPECIAL Functions """"""""""""""""" In many cases, the Domain will also override the functions: #. :func:`~rlpy.Domains.Domain.Domain.isTerminal` - returns a boolean whether or not the current (internal) state is terminal. Default is always return False. #. :func:`~rlpy.Domains.Domain.Domain.possibleActions` - returns an array of possible action indices, which often depend on the current state. Default is to enumerate **every** possible action, regardless of current state. OPTIONAL Functions """""""""""""""""" Optionally, define / override the following functions, used for visualization: #. :func:`~rlpy.Domains.Domain.Domain.showDomain` - Visualization of domain based on current internal state and an action, *a*. Often the header will include an optional argument *s* to display instead of the current internal state. RLPy frequently uses `matplotlib `_ to accomplish this - see the example below. #. :func:`~rlpy.Domains.Domain.Domain.showLearning` - Visualization of the "learning" obtained so far on this domain, usually a value function plot and policy plot. See the introductory tutorial for an example on :class:`~rlpy.Domains.Gridworld.GridWorld` XX expectedStep(), XX Additional Information ---------------------- * As always, the Domain can log messages using ``self.logger.info()``, see Python ``logger`` doc. * You should log values assigned to custom parameters when ``__init__()`` is called. * See :class:`~rlpy.Domains.Domain.Domain` for functions provided by the superclass, especially before defining helper functions which might be redundant. Example: Creating the ``ChainMDP`` Domain ----------------------------------------------------------- In this example we will recreate the simple ``ChainMDP`` Domain, which consists of *n* states that can only transition to *n-1* or *n+1*: ``s0 <-> s1 <-> ... <-> sn`` \n The goal is to reach state ``sn`` from ``s0``, after which the episode terminates. The agent can select from two actions: left [0] and right [1] (it never remains in same state). But the transitions are noisy, and the opposite of the desired action is taken instead with some probability. Note that the optimal policy is to always go right. #. Create a new file in your current working directory, ``ChainMDPTut.py``. Add the header block at the top:: __copyright__ = "Copyright 2013, RLPy http://www.acl.mit.edu/RLPy" __credits__ = ["Alborz Geramifard", "Robert H. Klein", "Christoph Dann", "William Dabney", "Jonathan P. How"] __license__ = "BSD 3-Clause" __author__ = "Ray N. Forcement" from rlpy.Tools import plt, mpatches, fromAtoB from rlpy.Domains.Domain import Domain import numpy as np #. Declare the class, create needed members variables (here several objects to be used for visualization and a few domain reward parameters), and write a docstring description:: class ChainMDPTut(Domain): """ Tutorial Domain - nearly identical to ChainMDP.py """ #: Reward for each timestep spent in the goal region GOAL_REWARD = 0 #: Reward for each timestep STEP_REWARD = -1 #: Set by the domain = min(100,rows*cols) episodeCap = 0 # Used for graphical normalization MAX_RETURN = 1 # Used for graphical normalization MIN_RETURN = 0 # Used for graphical shifting of arrows SHIFT = .3 #:Used for graphical radius of states RADIUS = .5 # Stores the graphical pathes for states so that we can later change their colors circles = None #: Number of states in the chain chainSize = 0 # Y values used for drawing circles Y = 1 #. Copy the __init__ declaration from ``Domain.py``, add needed parameters (here the number of states in the chain, ``chainSize``), and log them. Assign ``self.statespace_limits, self.episodeCap, self.continuous_dims, self.DimNames, self.actions_num,`` and ``self.discount_factor``. Then call the superclass constructor:: def __init__(self, chainSize=2): """ :param chainSize: Number of states \'n\' in the chain. """ self.chainSize = chainSize self.start = 0 self.goal = chainSize - 1 self.statespace_limits = np.array([[0,chainSize-1]]) self.episodeCap = 2*chainSize self.continuous_dims = [] self.DimNames = ['State'] self.actions_num = 2 self.discount_factor = 0.9 super(ChainMDPTut,self).__init__() #. Copy the ``step()`` and function declaration and implement it accordingly to return the tuple (r,ns,isTerminal,possibleActions), and similarly for ``s0()``. We want the agent to always start at state *[0]* to begin, and only achieves reward and terminates when *s = [n-1]*:: def step(self,a): s = self.state[0] if a == 0: #left ns = max(0,s-1) if a == 1: #right ns = min(self.chainSize-1,s+1) self.state = np.array([ns]) terminal = self.isTerminal() r = self.GOAL_REWARD if terminal else self.STEP_REWARD return r, ns, terminal, self.possibleActions() def s0(self): self.state = np.array([0]) return self.state, self.isTerminal(), self.possibleActions() #. In accordance with the above termination condition, override the ``isTerminal()`` function by copying its declaration from ``Domain.py``:: def isTerminal(self): s = self.state return (s[0] == self.chainSize - 1) #. For debugging convenience, demonstration, and entertainment, create a domain visualization by overriding the default (which is to do nothing). With matplotlib, generally this involves first performing a check to see if the figure object needs to be created (and adding objects accordingly), otherwise merely updating existing plot objects based on the current ``self.state`` and action *a*:: def showDomain(self, a = 0): #Draw the environment s = self.state s = s[0] if self.circles is None: # We need to draw the figure for the first time fig = plt.figure(1, (self.chainSize*2, 2)) ax = fig.add_axes([0, 0, 1, 1], frameon=False, aspect=1.) ax.set_xlim(0, self.chainSize*2) ax.set_ylim(0, 2) ax.add_patch(mpatches.Circle((1+2*(self.chainSize-1), self.Y), self.RADIUS*1.1, fc="w")) #Make the last one double circle ax.xaxis.set_visible(False) ax.yaxis.set_visible(False) self.circles = [mpatches.Circle((1+2*i, self.Y), self.RADIUS, fc="w") for i in np.arange(self.chainSize)] for i in np.arange(self.chainSize): ax.add_patch(self.circles[i]) if i != self.chainSize-1: fromAtoB(1+2*i+self.SHIFT,self.Y+self.SHIFT,1+2*(i+1)-self.SHIFT, self.Y+self.SHIFT) if i != self.chainSize-2: fromAtoB(1+2*(i+1)-self.SHIFT,self.Y-self.SHIFT,1+2*i+self.SHIFT, self.Y-self.SHIFT, 'r') fromAtoB(.75,self.Y-1.5*self.SHIFT,.75,self.Y+1.5*self.SHIFT,'r',connectionstyle='arc3,rad=-1.2') plt.show() [p.set_facecolor('w') for p in self.circles] self.circles[s].set_facecolor('k') plt.draw() .. note:: When first creating a matplotlib figure, you must call pl.show(); when updating the figure on subsequent steps, use pl.draw(). That's it! Now test it by creating a simple settings file on the domain of your choice. An example experiment is given below: .. literalinclude:: ../examples/tutorial/ChainMDPTut_example.py :language: python :linenos: What to do next? ---------------- In this Domain tutorial, we have seen how to * Write a Domain that inherits from the RLPy base ``Domain`` class * Override several base functions * Create a visualization * Add the Domain to RLPy and test it Adding your component to RLPy """"""""""""""""""""""""""""" If you would like to add your component to RLPy, we recommend developing on the development version (see :ref:`devInstall`). Please use the following header template at the top of each file:: __copyright__ = "Copyright 2013, RLPy http://www.acl.mit.edu/RLPy" __credits__ = ["Alborz Geramifard", "Robert H. Klein", "Christoph Dann", "William Dabney", "Jonathan P. How"] __license__ = "BSD 3-Clause" __author__ = "Tim Beaver" Fill in the appropriate ``__author__`` name and ``__credits__`` as needed. Note that RLPy requires the BSD 3-Clause license. * If you installed RLPy in a writeable directory, the className of the new domain can be added to the ``__init__.py`` file in the ``Domains/`` directory. (This allows other files to import the new domain). * If available, please include a link or reference to the publication associated with this implementation (and note differences, if any). If you would like to add your new domain to the RLPy project, we recommend you branch the project and create a pull request to the `RLPy repository `_. You can also email the community list ``rlpy@mit.edu`` for comments or questions. To subscribe `click here `_.