Creating a New Agent¶
This tutorial describes the standard RLPy Agent
interface,
and illustrates a brief example of creating a new learning agent.
The Agent receives observations from the Domain and updates the Representation accordingly.
In a typical Experiment, the Agent interacts with the Domain in discrete
timesteps.
At each Experiment timestep the Agent receives some observations from the Domain
which it uses to update the value function Representation of the Domain
(ie, on each call to its learn()
function).
The Policy is used to select an action to perform.
This process (observe, update, act) repeats until some goal or fail state,
determined by the Domain, is reached. At this point the
Experiment determines
whether the agent starts over or has its current policy tested
(without any exploration).
Note
You may want to review the namespace / inheritance / scoping rules in Python.
Requirements¶
- Each learning agent must be a subclass of
Agent
and call the__init__()
function of the Agent superclass. - Accordingly, each Agent must be instantiated with a Representation,
Policy, and Domain in the
__init__()
function - Any randomization that occurs at object construction MUST occur in
the
init_randomization()
function, which can be called by__init__()
. - Any random calls should use
self.random_state
, notrandom()
ornp.random()
, as this will ensure consistent seeded results during experiments. - After your agent is complete, you should define a unit test to ensure future revisions do not alter behavior. See rlpy/tests for some examples.
REQUIRED Instance Variables¶
—
REQUIRED Functions¶
learn()
- called on every timestep (see documentation)
Note
The Agent MUST call the (inherited)
episodeTerminated()
function after learning if the transition led to a terminal state (ie,learn()
will returnisTerminal=True
)Note
The
learn()
function MUST call thepre_discover()
function at its beginning, andpost_discover()
at its end. This allows adaptive representations to add new features (no effect on fixed ones).
Additional Information¶
- As always, the agent can log messages using
self.logger.info(<str>)
, see the Pythonlogger
documentation - You should log values assigned to custom parameters when
__init__()
is called. - See
Agent
for functions provided by the superclass.
Example: Creating the SARSA0
Agent¶
In this example, we will create the standard SARSA learning agent (without eligibility traces (ie the λ parameter= 0 always)). This algorithm first computes the Temporal Difference Error, essentially the difference between the prediction under the current value function and what was actually observed (see e.g. Sutton and Barto’s *Reinforcement Learning* (1998) or Wikipedia). It then updates the representation by summing the current function with this TD error, weighted by a factor called the learning rate.
Create a new file in the current working directory,
SARSA0.py
. Add the header block at the top:__copyright__ = "Copyright 2013, RLPy http://www.acl.mit.edu/RLPy" __credits__ = ["Alborz Geramifard", "Robert H. Klein", "Christoph Dann", "William Dabney", "Jonathan P. How"] __license__ = "BSD 3-Clause" __author__ = "Ray N. Forcement" from rlpy.Agents.Agent import Agent, DescentAlgorithm import numpy
Declare the class, create needed members variables (here a learning rate), described above) and write a docstring description:
class SARSA0(DescentAlgorithm, Agent): """ Standard SARSA algorithm without eligibility trace (ie lambda=0) """
Copy the __init__ declaration from
Agent
andDescentAlgorithm
inAgent.py
, and add needed parameters (here the initial_learn_rate) and log them. (kwargs is a catch-all for initialization parameters.) Then call the superclass constructor:def __init__(self, policy, representation, discount_factor, initial_learn_rate=0.1, **kwargs): super(SARSA0,self).__init__(policy=policy, representation=representation, discount_factor=discount_factor, **kwargs) self.logger.info("Initial learning rate:\t\t%0.2f" % initial_learn_rate)
Copy the learn() declaration and implement accordingly. Here, compute the td-error, and use it to update the value function estimate (by adjusting feature weights):
def learn(self, s, p_actions, a, r, ns, np_actions, na, terminal): # The previous state could never be terminal # (otherwise the episode would have already terminated) prevStateTerminal = False # MUST call this at start of learn() self.representation.pre_discover(s, prevStateTerminal, a, ns, terminal) # Compute feature function values and next action to be taken discount_factor = self.discount_factor # 'gamma' in literature feat_weights = self.representation.weight_vec # Value function, expressed as feature weights features_s = self.representation.phi(s, prevStateTerminal) # active feats in state features = self.representation.phi_sa(s, prevStateTerminal, a, features_s) # active features or an (s,a) pair features_prime_s= self.representation.phi(ns, terminal) features_prime = self.representation.phi_sa(ns, terminal, na, features_prime_s) nnz = count_nonzero(features_s) # Number of non-zero elements # Compute td-error td_error = r + np.dot(discount_factor * features_prime - features, feat_weights) # Update value function (or if TD-learning diverges, take no action) if nnz > 0: feat_weights_old = feat_weights.copy() feat_weights += self.learn_rate * td_error if not np.all(np.isfinite(feat_weights)): feat_weights = feat_weights_old print "WARNING: TD-Learning diverged, theta reached infinity!" # MUST call this at end of learn() - add new features to representation as required. expanded = self.representation.post_discover(s, False, a, td_error, features_s) # MUST call this at end of learn() - handle episode termination cleanup as required. if terminal: self.episodeTerminated()
Note
You can and should define helper functions in your agents as needed, and arrange class hierarchy. (See eg TDControlAgent.py)
That’s it! Now test the agent by creating a simple settings file on the domain of your choice. An example experiment is given below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 | #!/usr/bin/env python
"""
Agent Tutorial for RLPy
=================================
Assumes you have created the SARSA0.py agent according to the tutorial and
placed it in the current working directory.
Tests the agent on the GridWorld domain.
"""
__author__ = "Robert H. Klein"
from rlpy.Domains import GridWorld
from SARSA0 import SARSA0
from rlpy.Representations import Tabular
from rlpy.Policies import eGreedy
from rlpy.Experiments import Experiment
import os
def make_experiment(exp_id=1, path="./Results/Tutorial/gridworld-sarsa0"):
"""
Each file specifying an experimental setup should contain a
make_experiment function which returns an instance of the Experiment
class with everything set up.
@param id: number used to seed the random number generators
@param path: output directory where logs and results are stored
"""
opt = {}
opt["exp_id"] = exp_id
opt["path"] = path
## Domain:
maze = '4x5.txt'
domain = GridWorld(maze, noise=0.3)
opt["domain"] = domain
## Representation
# discretization only needed for continuous state spaces, discarded otherwise
representation = Tabular(domain, discretization=20)
## Policy
policy = eGreedy(representation, epsilon=0.2)
## Agent
opt["agent"] = SARSA0(representation=representation, policy=policy,
discount_factor=domain.discount_factor,
initial_learn_rate=0.1)
opt["checks_per_policy"] = 100
opt["max_steps"] = 2000
opt["num_policy_checks"] = 10
experiment = Experiment(**opt)
return experiment
if __name__ == '__main__':
experiment = make_experiment(1)
experiment.run(visualize_steps=False, # should each learning step be shown?
visualize_learning=True, # show policy / value function?
visualize_performance=1) # show performance runs?
experiment.plot()
experiment.save()
|
What to do next?¶
In this Agent tutorial, we have seen how to
- Write a learning agent that inherits from the RLPy base
Agent
class - Add the agent to RLPy and test it
Adding your component to RLPy¶
If you would like to add your component to RLPy, we recommend developing on the development version (see Development Version). Please use the following header at the top of each file:
__copyright__ = "Copyright 2013, RLPy http://www.acl.mit.edu/RLPy"
__credits__ = ["Alborz Geramifard", "Robert H. Klein", "Christoph Dann",
"William Dabney", "Jonathan P. How"]
__license__ = "BSD 3-Clause"
__author__ = "Tim Beaver"
- Fill in the appropriate
__author__
name and__credits__
as needed. Note that RLPy requires the BSD 3-Clause license. - If you installed RLPy in a writeable directory, the className of the new
agent can be added to
the
__init__.py
file in theAgents/
directory. (This allows other files to import the new agent). - If available, please include a link or reference to the publication associated with this implementation (and note differences, if any).
If you would like to add your new agent to the RLPy project, we recommend you branch the project and create a pull request to the RLPy repository.
You can also email the community list rlpy@mit.edu
for comments or
questions. To subscribe click here.