Week 11: Mountain-car with linear feature approximators#
What you see#
The example show the Sarsa algorithm with linear function approximators applied to the MountainCar example. The state \(s\) is two-dimensional (position, velocity) and the right-hand pane visualize the (estimate) of the value-function \(V(s)\) for all states \(s\) The function approximators use tile-coding which is what gives rise to the grid-like pattern.
How it works#
The Sarsa algorithm approximate the Q-values using \(q(s, a, w) = \mathbf{x}(s, a) \mathbf{w}^\top\). In this case the feature vector \(x(s,a)\) is a very high-dimensional vector (about 4000 dimensions) constructed using tile-coding. You can find the details in [SB18] but to greatly simplify the construction, the state-space is divided into a fairly fine grid, and then the dimension of \(x(s,a)\) which correspond to each grid-point is set to 1 (and all other are zero) This is why the updates appear to be local.
For visualization it is not convenient to plot \(q(s,a, \mathbf{w} )\) because of the actions, so there we plot the corresponding estimate of the value-funciton \(v(s) = \max_a q(s, a, \mathbf{w})\).