You may bring in one letter sized piece of paper with anything written on it. You may not use calculators, PDAs, robotic assistants or other electronic aids.
Some important points (that students often forget):
In this game the "up" action has the dynamics given by:
That is the agent goes up with probability 0.8 and goes up-left with probability 0.1 and up-right with probability 0.1.
If there is no treasure, a treasure can appear with probability 0.2. When it appears, it appears randomly at one of the corners, and each corner has an equal probability of treasure appearing. The treasure stays where it is until the agent lands on the square where the treasure is, and the agent gets an immediate reward of +10, and the treasure disappears in the next state transition. The agent and the treasure move simultaneously so that if the agent arrives at a square at the same time the treasure appears at the same time, it gets the reward.
Suppose we are doing asynchronous value iteration and have the following value for each state:
where the left grid shows the values for the states where there is no treasure and the right grid shows the values of the states when there is a treasure at the top-right.
Consider the next step of asynchronous value iteration. For state s13, which is marked by * in the above figure, and the action a2 which is "up", what value is assigned to Q[s13,a2] on the next iteration of value iteration? You need to show all working, but don't need to do any arithmetic (i.e., leave it as an expression). Explain each terms in your expression.
You should also expect some questions about what you learned from doing your assignment (e.g., about designing features).