This tutorial describes the standard RLPy Representation interface, and illustrates a brief example of creating a new value function representation.
The Representation is the approximation of the value function associated with a Domain, usually in some lower-dimensional feature space.
The Agent receives observations from the Domain on each step and calls its learn() function, which is responsible for updating the Representation accordingly. Agents can later query the Representation for the value of being in a state V(s) or the value of taking an action in a particular state ( known as the Q-function, Q(s,a) ).
Note
At present, it is assumed that the Linear Function approximator family of representations is being used.
Note
You may want to review the namespace / inheritance / scoping rules in Python.
The new Representation MUST set the variables BEFORE calling the superclass __init__() function:
The new Representation MUST define two functions:
Representations whose feature functions may change over the course of execution (termed adaptive or dynamic Representations) should override one or both functions below as needed. Note that self.isDynamic should = True.
In this example we will recreate the simple IncrementalTabular Representation, which merely creates a binary feature function fd() that is associated with each discrete state d we have encountered so far. fd(s) = 1 when d=s, 0 elsewhere, ie, the vector of feature functions evaluated at s will have all zero elements except one. Note that this is identical to the Tabular Representation, except that feature functions are only created as needed, not instantiated for every single state at the outset. Though simple, neither the Tabular nor IncrementalTabular representations generalize to nearby states in the domain, and can be intractable to use on large domains (as there are as many feature functions as there are states in the entire space). Continuous dimensions of s (assumed to be bounded in this Representation) are discretized.
Create a new file in the Representations/ directory, IncrTabularTut.py. Add the header block at the top:
__copyright__ = "Copyright 2013, RLPy http://www.acl.mit.edu/RLPy"
__credits__ = ["Alborz Geramifard", "Robert H. Klein", "Christoph Dann",
"William Dabney", "Jonathan P. How"]
__license__ = "BSD 3-Clause"
__author__ = "Ray N. Forcement"
Declare the class, create needed members variables (here an optional hash table to lookup feature function values previously computed), and write a docstring description:
class IncrTabularTut(Representation):
"""
Tutorial representation: identical to IncrementalTabular
"""
hash = None
Copy the __init__ declaration from Representation.py, add needed parameters (here none), and log them. Assign self.features_num and self.isDynamic, then call the superclass constructor:
def __init__(self, domain, logger, discretization=20):
self.hash = {}
self.features_num = 0
self.isDynamic = True
super(IncrTabularTut, self).__init__(domain, discretization)
Copy the phi_nonTerminal() function declaration and implement it accordingly to return the vector of feature function values for a given state. Here, lookup feature function values using self.hashState(s) provided by the parent class. Note here that self.hash should always contain hash_id if pre_discover() is called as required:
def phi_nonTerminal(self, s):
hash_id = self.hashState(s)
id = self.hash.get(hash_id)
F_s = np.zeros(self.features_num, bool)
if id is not None:
F_s[id] = 1
return F_s
Copy the featureType() function declaration and implement it accordingly to return the datatype returned by each feature function. Here, feature functions are binary, so the datatype is boolean:
def featureType(self):
return bool
Override parent functions as necessary; here we require a pre_discover() function to populate the hash table for each new encountered state:
def pre_discover(self, s, terminal, a, sn, terminaln):
return self._add_state(s) + self._add_state(sn)
Finally, define any needed helper functions:
def _add_state(self, s):
hash_id = self.hashState(s)
id = self.hash.get(hash_id)
if id is None:
#New State
self.features_num += 1
#New id = feature_num - 1
id = self.features_num - 1
self.hash[hash_id] = id
#Add a new element to the feature weight vector
self.addNewWeight()
return 1
return 0
def __deepcopy__(self, memo):
new_copy = IncrementalTabular(self.domain, self.logger, self.discretization)
new_copy.hash = deepcopy(self.hash)
return new_copy
That’s it! Now add your new Representation to Representations/__init__.py:
``from IncrTabularTut import IncrTabularTut``
Finally, create a unit test for your Representation as described in Creating a Unit Test
Now test it by creating a simple settings file on the domain of your choice. An example experiment is given below:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 | #!/usr/bin/env python
"""
Representation Tutorial for RLPy
================================
Assumes you have created the IncrTabularTut.py agent according to the tutorial and
placed it in the Representations/ directory.
Tests the Representation on the GridWorld domain usin SARSA
"""
__author__ = "Robert H. Klein"
from rlpy.Domains import GridWorld
from rlpy.Agents import SARSA
from rlpy.Representations import IncrTabularTut
from rlpy.Policies import eGreedy
from rlpy.Experiments import Experiment
import os
def make_experiment(exp_id=1, path="./Results/Tutorial/gridworld-IncrTabularTut"):
"""
Each file specifying an experimental setup should contain a
make_experiment function which returns an instance of the Experiment
class with everything set up.
@param id: number used to seed the random number generators
@param path: output directory where logs and results are stored
"""
opt = {}
opt["exp_id"] = exp_id
opt["path"] = path
## Domain:
maze = os.path.join(GridWorld.default_map_dir, '4x5.txt')
domain = GridWorld(maze, noise=0.3)
opt["domain"] = domain
## Representation
# discretization only needed for continuous state spaces, discarded otherwise
representation = IncrTabularTut(domain)
## Policy
policy = eGreedy(representation, epsilon=0.2)
## Agent
opt["agent"] = SARSA(representation=representation, policy=policy,
discount_factor=domain.discount_factor,
learn_rate=0.1)
opt["checks_per_policy"] = 100
opt["max_steps"] = 2000
opt["num_policy_checks"] = 10
experiment = Experiment(**opt)
return experiment
if __name__ == '__main__':
experiment = make_experiment(1)
experiment.run(visualize_steps=False, # should each learning step be shown?
visualize_learning=True, # show policy / value function?
visualize_performance=1) # show performance runs?
experiment.plot()
experiment.save()
|
In this Representation tutorial, we have seen how to
If you would like to add your component to RLPy, we recommend developing on the development version (see Development Version). Please use the following header at the top of each file:
__copyright__ = "Copyright 2013, RLPy http://www.acl.mit.edu/RLPy"
__credits__ = ["Alborz Geramifard", "Robert H. Klein", "Christoph Dann",
"William Dabney", "Jonathan P. How"]
__license__ = "BSD 3-Clause"
__author__ = "Tim Beaver"
If you would like to add your new representation to the RLPy project, we recommend you branch the project and create a pull request to the RLPy repository.
You can also email the community list rlpy@mit.edu for comments or questions. To subscribe click here.