Exercise 6: Linear-quadratic problems in control#
Note
The exercises material is divided into general information (found on this page) and the actual exercise instructions. You can download this weeks exercise instructions from here:
You are encouraged to prepare the homework problems 1 (indicated by a hand in the PDF file) at home and present your solution during the exercise session.
To get the newest version of the course material, please see Making sure your files are up to date
This weeks exercise will be about the LQR (Linear-quadratic regulator) algorithm. The starting point of the LQR algorithm is a problem of the form (\(k=0,\dots,N-1\)):
This means there are \(N\) matrices \(A_k\) and so on. To practically implement this,
we will assume the matrices are stored as lists: \(A_0, A_1, \dots A_{N-1}\) = A = [A0, A1, ...]
of
length \(N\) so that A[k]
corresponds to \(A_k\). The same goes for all the other terms.
Hint
Only the terms A
and B
are required. The rest of the terms will default to 0-matrices.
The LQR algorithm takes a problem of this form and then ultimately compute a control law:
And a cost-to-go function as:
Again there are \(N-1\) terms. The function we implement will therefore end with:
return (L, l), (V, v, vc)
so that L[k]
corresponds to \(L_k\) and vc[k]
corresponds to \(v_k\).
Example: Calling the LQR function#
Consider a discrete system with dynamics:
and cost:
The following code will run LQR on this problem using a horizon of \(N=20\) and then compute the first action:
Using \(\mathbf x_0 = \begin{bmatrix} 1 \\ 0\end{bmatrix}\):
>>> import numpy as np
>>> from irlc.ex06.dlqr import LQR
>>> N = 20 # Planning horizon.
>>> A = np.asarray([[1, 1], [0, 1]])
>>> B = np.asarray([[0], [1]])
>>> Q = np.asarray([[1, 0], [0, 0]])
>>> R = np.asarray([[1]])
>>> (L,l), (V,v,vc) = LQR(A=[A]*N, B=[B]*N, Q=[Q]*N, R=[R]*N, QN=Q)
>>> x = np.asarray([1, 0])
>>> u = L[0] @ x + l[0]
>>> print(f"Action taken at time k=0 in state {x=} is {u=}")
Action taken at time k=0 in state x=array([1, 0]) is u=array([-0.48053382])
Note that the code uses the syntax [A]*N
to create a list containing A
repeated \(N\) times. An example:
>>> [42]*3
[42, 42, 42]
Classes and functions#
- irlc.ex06.dlqr.LQR(A, B, d=None, Q=None, R=None, H=None, q=None, r=None, qc=None, QN=None, qN=None, qcN=None, mu=0)[source]#
Implement the LQR as defined in (Her24, Algorithm 22). I recommend viewing this documentation online (documentation for week 6).
When you solve this exercise, look at the algorithm in the book. Since the LQR problem is on the form:
\[x_{k+1} = A_k x_k + B_k u_k + d_k\]For \(k=0,\dots,N-1\) this means there are \(N\) matrices \(A_k\). This is implemented by assuming that
A
(i.e., the input argument) is alist
of length \(N\) so thatA[k]
corresponds to \(A_k\).Similar conventions are used for the cost term (please see the lecture notes or the online documentation for their meaning). Recall it has the form:
\[c(x_k, u_k) = \frac{1}{2} \mathbf x_k^\top Q_k \mathbf x_k + \frac{1}{2} \mathbf q_k^\top \mathbf x_k + q_k + \cdots\]When the function is called, the vector \(\textbf{q}_k\) corresponds to
q
and the constant \(q_k\) correspond toqc
(q-constant).Note
Only the terms
A
andB
are required. The rest of the terms will default to 0-matrices.The LQR algorithm will ultimately compute a control law of the form:
\[\mathbf u_k = L_k \mathbf x_k + \mathbf l_k\]And a cost-to-go function as:
\[J_k(x_k) = \frac{1}{2} \mathbf x_k^\top V_k \mathbf x_k + v_k^\top \mathbf x_k + v_k\]Again there are \(N-1\) terms. The function then return
return (L, l), (V, v, vc)
so thatL[k]
corresponds to \(L_k\).- Parameters:
A (
list
) – A list ofnp.ndarray
containing all terms \(A_k\)B (
list
) – A list ofnp.ndarray
containing all terms \(B_k\)d (
list
) – A list ofnp.ndarray
containing all terms \(\mathbf d_k\) (optional)Q (
list
) – A list ofnp.ndarray
containing all terms \(Q_k\) (optional)R (
list
) – A list ofnp.ndarray
containing all terms \(R_k\) (optional)H (
list
) – A list ofnp.ndarray
containing all terms \(H_k\) (optional)q (
list
) – A list ofnp.ndarray
containing all terms \(\mathbf q_k\) (optional)r (
list
) – A list ofnp.ndarray
containing all terms \(\mathbf r_k\) (optional)qc (
list
) – A list offloat
containing all terms \(q_k\) (i.e., constant terms) (optional)QN (
ndarray
) – Anp.ndarray
containing the terminal cost term \(Q_N\) (optional)qN (
ndarray
) – Anp.ndarray
containing the terminal cost term \(\mathbf q_N\) (optional)qcN (
ndarray
) – Anp.ndarray
containing the terminal cost term \(q_N\)mu (
float
) – A regularization term which is useful for iterative-LQR (next week). Default to 0.
- Returns:
A tuple of the form
(L, l), (V, v, vc)
corresponding to the control and cost-matrices.
Solutions to selected exercises#
Problem 6.1-6.3
Problem 6.4