CPSC422 Spring 2007
Assignment 2
Due: 2:00pm, Thursday 1 February 2007.
The aim of this assignment is to learn about reinforcement learning. You are to write
code for the simple game at:
You are free to use any of the code for that applet. You
can also refer to and use any of the code for the reinforcement learning applets, e.g.,
Please read and post to the bulletin board in the course WebCT site for
hints on how to modify the controllers.
You can either do this assignment by yourself or with someone else
(i.e., in pairs).
Write a reinforcement-learning controller for the simple game (modifying only
SGameController.java) that uses function approximation. You need to:
- select a number of reasonable features
- implement the function approximation
- show how the answer or speed vary as a function of any parameters
- test it both in the policy found in the long term, as well as how quickly it learns
If you are working by yourself, then you only need to do one set of features.
If you are working in a pair, you need to compare three sets of features.
In this question, you are to think about the effect of changing the
alpha on your implementation. Compare the following values for alpha:
- 1/n
- 10/(9+n)
- fixed value of 0.2
- fixed value of 0.02
- use 0.2 for a while then change to 0.02
You should compare these for the cases:
- The values you want the expected value for are all 10
- The values alternate between 2 and 10
- The values are 5 for the first 100 steps then are 10 forever
- The values are 10 for every 10 steps and 0 otherwise
- The values are those generated by Q-learning
Write full sentences, and justify your answers with specific examples
from running your code.
[Note that this question is worth marks, so don't forget to do it.]
- For each question and each part in this assignment, say how long you spent on it.
Was this
reasonable? What did you learn?
- If there was more than one person in your group, say what each
person did.
- Tell us every person you discussed this with (including the TA
and fellow students), and every external
source (e.g., web site, research paper, book) you referenced to do
this assignment.
David Poole