Exercise 7: Linearization and iterative LQR#
Note
The exercises material is divided into general information (found on this page) and the actual exercise instructions. You can download this weeks exercise instructions from here:
You are encouraged to prepare the homework problems 1 (indicated by a hand in the PDF file) at home and present your solution during the exercise session.
To get the newest version of the course material, please see Making sure your files are up to date
This week is concerned with linarization, that is, how to (approximately!) turn a non-linear discete dynamical system of the form:
into a linear dynamical system suitable for LQR:
The procedure described in the notes assume we are interested in states and actions that are close to an expansion point \(\bar{ \mathbf{x} }_k\) and \(\bar{ \mathbf{u} }_k\) and then approximate the dynamics using a Taylor expansion, see (Herlau [Her24], Section 17.1).
Thus we can re-arrange this expression to yield:
To implement this, we need to be able to compute the terms above. As an example, we will use the Pendulum-model we have seen several times during the course. In our example we will assume that \(\bar{\mathbf{x} } = \begin{bmatrix} 0 \\ 1 \\ 0.5\end{bmatrix}\) and \(\bar{\mathbf{u}} = \begin{bmatrix} 0 \end{bmatrix}\)
We can compute \(f_k(\bar{\mathbf x}_{k}, \bar{\mathbf u}_{k})\) as follows:
>>> from irlc.ex04.model_pendulum import DiscreteSinCosPendulumModel
>>> import numpy as np
>>> model = DiscreteSinCosPendulumModel()
>>> x_bar = [0, 1, 0.5] # [cos(theta), sin(theta), d theta/dt]
>>> u_bar = [1] # Get an arbitrary action
>>> print("Compute f_k(xbar, ubar):", model.f(x_bar,u_bar))
Compute f_k(xbar, ubar): [0.00999983 0.99995 0.61423912]
Meanwhile the two Jacobians can be computed as follows:
>>> from irlc.ex04.model_pendulum import DiscreteSinCosPendulumModel
>>> import numpy as np
>>> model = DiscreteSinCosPendulumModel()
>>> x_bar = [0, 1, 0.5] # [cos(theta), sin(theta), d theta/dt]
>>> u_bar = [1] # Get an arbitrary action
>>> A, B = model.f_jacobian(x_bar,u_bar)
>>> print(A)
[[ 9.99950000e-01 0.00000000e+00 1.99990000e-02]
[-9.99983333e-03 0.00000000e+00 -1.99996667e-04]
[ 1.96400000e-01 -0.00000000e+00 1.00000000e+00]]
>>> print(B)
[[0. ]
[0. ]
[0.06299615]]
See also Exercise 4: Discretization and PID control.
Iterative LQR#
The iterative LQR exercise closely follow the pseudo code in … In the following is an overview of the various functions that are provided as part of the exercise:
The cost and dynamic approximation#
In iterative LQR the cost-function will be approximated using a second-order Taylor expansion. In other words, we assume a cost-function of the standard form:
Note
See (Herlau [Her24], Subequation 17.8) for a definition of these expressions.
This cost-function can then be taylor expanded around expansion points \(\bar {\mathbf x}_k, \bar {\mathbf u}_k\) to get a quadratic approximation of the form:
This should be compared to the form of the cost-function the LQR()
algorithm assumes which is:
So we can identify these forms by assuming that e.g. \(Q_N \equiv c_{\mathbf x \mathbf x, N}\) and so on. Within the code all of these cost terms will be represented as follows:
c
= \(\begin{bmatrix} c_0, c_1, \dots, c_{N-1} c_N\end{bmatrix} = \begin{bmatrix} q_0 & q_1 & \dots & q_N \end{bmatrix}\)c_x
= \(\begin{bmatrix} c_{\mathbf x, 0} & c_{\mathbf x, 1} & \dots & c_{\mathbb x, N-1} & c_{\mathbf x, N} \end{bmatrix}\)c_xx
= \(\begin{bmatrix} c_{ {\mathbf xx}, 0} & c_{ {\mathbf xx}, 1} & \dots & c_{ {\mathbf xx}, N-1} & c_{ {\mathbf xx}, N} \end{bmatrix}\)
Meanwhile, the dynamics is of the form \(\mathbf x_{k+1} = A_k \mathbf x_k + B_k \mathbf u_k\) where:
A
= \(\begin{bmatrix} A_0 & A_1 & \dots & A_{N-1} \end{bmatrix}\)B
= \(\begin{bmatrix} B_0 & B_1 & \dots & B_{N-1} \end{bmatrix}\)
Classes and functions#
- irlc.ex07.ilqr.backward_pass(A, B, c_x, c_u, c_xx, c_ux, c_uu, mu=1)[source]#
Given all derivatives, apply the LQR algorithm to get the control law.
The input arguments are described in the online documentation and the lecture notes. You should use them to call your implementation of the
LQR()
method. Note that you should give a value of all inputs except for thed
-term.- Parameters:
A (
list
) – linearization of the dynamics matrices \(A_k\).B (
list
) – linearization of the dynamics matrices \(B_k\).c_x (
list
) – Cost terms corresponding to \(\mathbf{q}_k\)c_u (
list
) – Cost terms corresponding to \(\mathbf{r}_k\)c_xx (
list
) – Cost terms corresponding to \(Q_k\)c_ux (
list
) – Cost terms corresponding to \(H_k\)c_uu (
list
) – Cost terms corresponding to \(R_k\)mu – Regularization parameter for the LQR method
- Returns:
The control law \(L_k, \mathbf{l}_k\) as two lists.
- irlc.ex07.ilqr.cost_of_trajectory(model, xs, us)[source]#
Helper function which computes the cost of the trajectory.
The cost is defined as:
\[c_N( \bar {\mathbf x}_N, \bar {\mathbf u}_) + \sum_{k=0}^{N-1} c_k(\bar {\mathbf x}_k, \bar {\mathbf u}_k)\]and to compute it, you should use the two helper methods
model.cost.c
andmodel.cost.cN
(seec()
andcN()
).- Parameters:
model (
DiscreteControlModel
) – The control model used to compute the cost.xs (
list
) – A list of length \(N+1\) of the form \(\begin{bmatrix}\bar {\mathbf x}_0 & \dots & \bar {\mathbf x}_N \end{bmatrix}\)us (
list
) – A list of length \(N\) of the form \(\begin{bmatrix}\bar {\mathbf x}_0 & \dots & \bar {\mathbf x}_{N-1} \end{bmatrix}\)
- Return type:
- Returns:
The cost as a number.
- irlc.ex07.ilqr.get_derivatives(model, x_bar, u_bar)[source]#
Compute all the derivatives used in the model.
The return type should match the meaning in (Her24, Subequation 17.8) and in the online documentation.
c
should be a list of length \(N+1\)c_x
should be a list of length \(N+1\)c_xx
should be a list of length \(N+1\)c_u
should be a list of length \(N\)c_uu
should be a list of length \(N\)c_ux
should be a list of length \(N\)A
should be a list of length \(N\)B
should be a list of length \(N\)
Use the model to compute these terms. For instance, this will compute the first terms
A[0]
andB[0]
:A0, B0 = model.f_jacobian(x_bar[0], u_bar[0], 0)
Meanwhile, to compute the first terms of the cost-functions you should use:
c[0], c_x[0], c_u[0], c_xx[0], c_ux[0], c_uu[0] = model.cost.c(x_bar[0], u_bar[0], k=0, compute_gradients=True)
- Parameters:
model (
DiscreteControlModel
) – The model to use when computing the derivatives of the costx_bar (
list
) – The nominal state-trajectoryu_bar (
list
) – The nominal action-trajectory
- Returns:
Lists of all derivatives computed around the nominal trajectory (see the lecture notes).
- irlc.ex07.ilqr.forward_pass(model, x_bar, u_bar, L, l, alpha=1.0)[source]#
Simulates the effect of the controller on the model
We assume the system starts in \(\mathbf{x}_0 = \bar {\mathbf x}_0\), and then simulate the effect of generating actions using the closed-loop policy
\[\mathbf{u}_k = \bar {\mathbf u}_k + \alpha \mathbf{l}_k + L_k (\mathbf{x}_k - \bar { \mathbf x}_k)\](see (Her24, eq. (17.16))).
- Parameters:
model (
DiscreteControlModel
) – The model used to compute the dynamics.x_bar (
list
) – A nominal list of statesu_bar (
list
) – A nominal list of actions (not used by the method)L (
list
) – A list of control matrices \(L_k\)l (
list
) – A list of control vectors \(\mathbf{l}_k\)alpha – The linesearch parameter.
- Returns:
A list of length \(N+1\) of simulated states and a list of length \(N\) of simulated actions.
Solutions to selected exercises#
Problem 7.1