CPSC422 Spring 2007
Assignment 1
Due: 2:00pm, Thursday 25 January 2007.
The aim of this assignment is to learn about MDPs. You are to write
code for the simple game at:
You are free to use any of the code for that applet. You
can also refer to and use any of the code for the value iteration applet at:
Please read and post to the bulletin board in the course WebCT site for
hints on how to modify the controllers.
You can either do this assignment by yourself or with someone else
(i.e., in pairs).
Write a controller for the simple game (modifying only
SameController.java) that uses:
- value iteration
- asynchronous value iteration
- modified policy iteration
If you are working by yourself, then you only need to do one of these.
If you are working in a pair, you need to do all three.
In this question, you are to think about the effect of changing the
discount factor on your implementation.
- How does changing the discount factor affect the rate of
convergence?
- How does changing the discount factor affect the policy found?
- What is an appropriate discount factor for this game?
Write full sentences, and justify your answers with specific examples
from running your code.
[Note that this question is worth marks, so don't forget to do it.]
- For each question and each part in this assignment, say how long you spent on it.
Was this
reasonable? What did you learn?
- If there was more than one person in your group, say what each
person did.
- Tell us every person you discussed this with (including the TA
and fellow students), and every external
source (e.g., web site, research paper, book) you referenced to do
this assignment.
David Poole