.. _make_agent: .. this is a comment. see http://sphinx-doc.org/rest.html for markup instructions Creating a New Agent ==================== This tutorial describes the standard RLPy :class:`~rlpy.Agents.Agent.Agent` interface, and illustrates a brief example of creating a new learning agent. .. Below taken directly from Agent.py The Agent receives observations from the Domain and updates the Representation accordingly. In a typical Experiment, the Agent interacts with the Domain in discrete timesteps. At each Experiment timestep the Agent receives some observations from the Domain which it uses to update the value function Representation of the Domain (ie, on each call to its :func:`~rlpy.Agents.Agent.Agent.learn` function). The Policy is used to select an action to perform. This process (observe, update, act) repeats until some goal or fail state, determined by the Domain, is reached. At this point the Experiment determines whether the agent starts over or has its current policy tested (without any exploration). .. note :: You may want to review the namespace / inheritance / scoping `rules in Python `_. Requirements ------------ * Each learning agent must be a subclass of :class:`~rlpy.Agents.Agent.Agent` and call the :func:`~rlpy.Agents.Agent.Agent.__init__` function of the Agent superclass. * Accordingly, each Agent must be instantiated with a Representation, Policy, and Domain in the ``__init__()`` function * Any randomization that occurs at object construction *MUST* occur in the :func:`~rlpy.Agents.Agent.Agent.init_randomization` function, which can be called by ``__init__()``. * Any random calls should use ``self.random_state``, not ``random()`` or ``np.random()``, as this will ensure consistent seeded results during experiments. * After your agent is complete, you should define a unit test to ensure future revisions do not alter behavior. See rlpy/tests for some examples. REQUIRED Instance Variables """"""""""""""""""""""""""" --- REQUIRED Functions """""""""""""""""" :func:`~rlpy.Agents.Agent.Agent.learn` - called on every timestep (see documentation) .. Note:: The Agent *MUST* call the (inherited) :func:`~rlpy.Agents.Agent.Agent.episodeTerminated` function after learning if the transition led to a terminal state (ie, ``learn()`` will return ``isTerminal=True``) .. Note:: The ``learn()`` function *MUST* call the :func:`~rlpy.Representations.Representation.Representation.pre_discover` function at its beginning, and :func:`~rlpy.Representations.Representation.Representation.post_discover` at its end. This allows adaptive representations to add new features (no effect on fixed ones). Additional Information ---------------------- * As always, the agent can log messages using ``self.logger.info()``, see the Python ``logger`` documentation * You should log values assigned to custom parameters when ``__init__()`` is called. * See :class:`~rlpy.Agents.Agent.Agent` for functions provided by the superclass. Example: Creating the ``SARSA0`` Agent -------------------------------------- In this example, we will create the standard SARSA learning agent (without eligibility traces (ie the λ parameter= 0 always)). This algorithm first computes the Temporal Difference Error, essentially the difference between the prediction under the current value function and what was actually observed (see e.g. `Sutton and Barto's *Reinforcement Learning* (1998) `_ or `Wikipedia `_). It then updates the representation by summing the current function with this TD error, weighted by a factor called the *learning rate*. #. Create a new file in the current working directory, ``SARSA0.py``. Add the header block at the top:: __copyright__ = "Copyright 2013, RLPy http://www.acl.mit.edu/RLPy" __credits__ = ["Alborz Geramifard", "Robert H. Klein", "Christoph Dann", "William Dabney", "Jonathan P. How"] __license__ = "BSD 3-Clause" __author__ = "Ray N. Forcement" from rlpy.Agents.Agent import Agent, DescentAlgorithm import numpy #. Declare the class, create needed members variables (here a learning rate), described above) and write a docstring description:: class SARSA0(DescentAlgorithm, Agent): """ Standard SARSA algorithm without eligibility trace (ie lambda=0) """ #. Copy the __init__ declaration from ``Agent`` and ``DescentAlgorithm`` in ``Agent.py``, and add needed parameters (here the initial_learn_rate) and log them. (kwargs is a catch-all for initialization parameters.) Then call the superclass constructor:: def __init__(self, policy, representation, discount_factor, initial_learn_rate=0.1, **kwargs): super(SARSA0,self).__init__(policy=policy, representation=representation, discount_factor=discount_factor, **kwargs) self.logger.info("Initial learning rate:\t\t%0.2f" % initial_learn_rate) #. Copy the learn() declaration and implement accordingly. Here, compute the td-error, and use it to update the value function estimate (by adjusting feature weights):: def learn(self, s, p_actions, a, r, ns, np_actions, na, terminal): # The previous state could never be terminal # (otherwise the episode would have already terminated) prevStateTerminal = False # MUST call this at start of learn() self.representation.pre_discover(s, prevStateTerminal, a, ns, terminal) # Compute feature function values and next action to be taken discount_factor = self.discount_factor # 'gamma' in literature feat_weights = self.representation.weight_vec # Value function, expressed as feature weights features_s = self.representation.phi(s, prevStateTerminal) # active feats in state features = self.representation.phi_sa(s, prevStateTerminal, a, features_s) # active features or an (s,a) pair features_prime_s= self.representation.phi(ns, terminal) features_prime = self.representation.phi_sa(ns, terminal, na, features_prime_s) nnz = count_nonzero(features_s) # Number of non-zero elements # Compute td-error td_error = r + np.dot(discount_factor * features_prime - features, feat_weights) # Update value function (or if TD-learning diverges, take no action) if nnz > 0: feat_weights_old = feat_weights.copy() feat_weights += self.learn_rate * td_error if not np.all(np.isfinite(feat_weights)): feat_weights = feat_weights_old print "WARNING: TD-Learning diverged, theta reached infinity!" # MUST call this at end of learn() - add new features to representation as required. expanded = self.representation.post_discover(s, False, a, td_error, features_s) # MUST call this at end of learn() - handle episode termination cleanup as required. if terminal: self.episodeTerminated() .. note:: You can and should define helper functions in your agents as needed, and arrange class hierarchy. (See eg TDControlAgent.py) That's it! Now test the agent by creating a simple settings file on the domain of your choice. An example experiment is given below: .. literalinclude:: ../examples/tutorial/SARSA0_example.py :language: python :linenos: What to do next? ---------------- In this Agent tutorial, we have seen how to * Write a learning agent that inherits from the RLPy base ``Agent`` class * Add the agent to RLPy and test it Adding your component to RLPy """"""""""""""""""""""""""""" If you would like to add your component to RLPy, we recommend developing on the development version (see :ref:`devInstall`). Please use the following header at the top of each file:: __copyright__ = "Copyright 2013, RLPy http://www.acl.mit.edu/RLPy" __credits__ = ["Alborz Geramifard", "Robert H. Klein", "Christoph Dann", "William Dabney", "Jonathan P. How"] __license__ = "BSD 3-Clause" __author__ = "Tim Beaver" * Fill in the appropriate ``__author__`` name and ``__credits__`` as needed. Note that RLPy requires the BSD 3-Clause license. * If you installed RLPy in a writeable directory, the className of the new agent can be added to the ``__init__.py`` file in the ``Agents/`` directory. (This allows other files to import the new agent). * If available, please include a link or reference to the publication associated with this implementation (and note differences, if any). If you would like to add your new agent to the RLPy project, we recommend you branch the project and create a pull request to the `RLPy repository `_. You can also email the community list ``rlpy@mit.edu`` for comments or questions. To subscribe `click here `_.