Понтрягин's maximal principle

November 22, 2023

Понтрягин's maximal principle

Pontryagin's maximal principle is often used in Optimal Control, like dynamic programming or game theory.

He was a blind mathematician.

Lagrange Multiplier

There is the dynamics be governed by the differential equation $$\begin{cases} \dot{\mathbf{x}}(t)=\mathbf{f}(\mathbf{x}(t),\mathbf{\alpha}(t)) & (t>0) \\ \mathbf{x}(0) = x^0\end{cases}$$

Unconstrained Optimization: To find maximum point for a given smoot function $f:\mathbb{R}^n\rightarrow \mathbb{R}$, answer is critical point of $f$: $$\nabla f (x^*)=0$$ when $f(x^*)=\max_{x\in \mathbb{R}^n} f(x)$.

Constrained Optimization: There is region $R:= \{x\in \mathbb{R}^n | g(x)\le 0\}$ determined by some given functoin $g:\mathbb{R}^n\rightarrow \mathbb{R}$. Suppose $x^*\in R$ and $f(x^*)=\max_{x\in R} f(x)$.

$\bullet$ Case 1: $x^*$ lies in the interior of $R$: same as unconstrained optimization.

$\bullet$ Case 2: $x^*$ lie on $\partial R$: $$\nabla f (x^*) =\lambda \nabla g (x^*)$$when $\lambda$ is Lagrange multiplier.

Pontryagin Maximum Principle

Generalized momentum $\mathbf{p}$ of Hamiltonian system role as Lagrange multiplier.

Fixed time, Free endpoint

Given $A\subseteq \mathbb{R}^m$ and also $\mathbf{f}: \mathbb{R}^n \times A \rightarrow \mathbb{R}^n$, $x^0\in \mathbb{R}^n$, there is control space:

$$\mathcal{A} = \{\mathbf{\alpha}(\cdot): [0:\infty)\rightarrow A\ |\ \mathbf{\alpha} (\cdot) \mbox{ is measurable}\}$$

Then given $\mathbf{\alpha}(\cdot)\in \mathcal{A}$, there is ODE:

$$\begin{cases} \dot{\mathbf{x}}(t) = \mathbf{f}(\mathbf{x}(t),\mathbf{\alpha}(t))& (t \ge 0) \\ \mathbf{x}(0) = x^0\end{cases}$$

Finally, there is payoff:

$$P[\mathbf{\alpha}(\cdot)] = \int^T_0 r(\mathbf{x}(t), \mathbf{\alpha}(t))dt+g(\mathbf{x}(T))$$

where the terminal time $T>0$, running payoff $r:\mathbb{R}^n\times A \rightarrow \mathbb{R}$ and terminal payoff $g:\mathbb{R}^n\rightarrow \mathbb{R}$ are given.

The goal is find a control $\mathbf{\alpha}^*(\cdot)$ such that

$$P[\mathbf{\alpha}^*(\cdot)] = \max_{\mathbf{\alpha}(\cdot)\in\mathcal{A}} P[\mathbf{\alpha}(\cdot)]$$

Pontryagin Maximum Principle assume $\mathbf{p}(\cdot)$ with optimal trajactory $\mathbf{x}^*(\cdot)$ satisfies Hamilton's ODE.

Definition: The control theory Hamiltonian is the function

$$H(x,p,a) := \mathbf{f}(x, a) \cdot p + r(x, a)\quad (x,p \in \mathbb{R}^n, a\in A)$$

Theorem (Pontryagin Maximum Principle): Assume $\mathbf{\alpha}^*(\cdot)$ is optimal for (ODE), $(P)$ and $\mathbf{x}^*(\cdot)$ is the corresponding trajectory.

Then there exists a function $\mathbf{p}^*:[0,T]\rightarrow \mathbb{R}^n$ such that $$\dot{\mathbf{x}}^*(t)=\nabla_p H(\mathbf{x}^*(t),\mathbf{p}^*(t),\mathbf{\alpha}^*(t))=f$$ $$\dot{\mathbf{p}}^*(t)=-\nabla_x H(\mathbf{x}^*(t),\mathbf{p}^*(t),\mathbf{\alpha}^*(t))=-(p\cdot \nabla)f - \nabla r$$ and $$H(\mathbf{x}^*(t),\mathbf{p}^*(t),\mathbf{\alpha}^*(t)) = \max_{a\in A} H(\mathbf{x}^*(t),\mathbf{p}^*(t),a)\quad (0\le t\le T)$$ In addition, $$\mbox{the mapping}\quad t\longmapsto H(\mathbf{x}^*(t),\mathbf{p}^*(t),\mathbf{\alpha}^*(t))\quad \mbox{is constant}$$ Finally, we have the terminal condition (see later in Transversality condition) $$\mathbf{p}^*(T)=\nabla g(\mathbf{x}^*(T))$$.

Free time, Fixed endpoint

Given control $\mathbf{\alpha}(\cdot)\int \mathcal{A}$, ODE, payoff, target point(endpoint) $x^1\in \mathbb{R}^n$ given.

Theorem (Pontryagin Maximum Principle): Assume $\mathbf{\alpha}^*(\cdot)$ is optimal for (ODE), $(P)$ and $\mathbf{x}^*(\cdot)$ is the corresponding trajectory.

Then there exists a function $\mathbf{p}^*:[0,\tau^*]\rightarrow \mathbb{R}^n$ such that $$\dot{\mathbf{x}}^*(t)=\nabla_p H(\mathbf{x}^*(t),\mathbf{p}^*(t),\mathbf{\alpha}^*(t))$$ $$\dot{\mathbf{p}}^*(t)=-\nabla_x H(\mathbf{x}^*(t),\mathbf{p}^*(t),\mathbf{\alpha}^*(t))$$ and $$H(\mathbf{x}^*(t),\mathbf{p}^*(t),\mathbf{\alpha}^*(t)) = \max_{a\in A} H(\mathbf{x}^*(t),\mathbf{p}^*(t),a)\quad (0\le t\le \tau^*)$$ Also, $$H(\mathbf{x}^*(t),\mathbf{p}^*(t),\mathbf{\alpha}^*(t))\equiv 0 \quad (0\le t \le \tau^*)$$ Here $\tau^*$ denotes the first time trajectory $x^*(\cdot)$ hits the target point $x^1$. We call $\mathbf{x}^*(\cdot)$ the state of the optimally controlled sysyem and $\mathbf{p}^*(\cdot)$ the costate.

Maximum Principle with Transversality Conditions

Consider different ODE: $$\begin{cases} \dot{\mathbf{x}}(t) = \mathbf{f}(\mathbf{x}(t),\mathbf{\alpha}(t)) & (t>0) \\ \mathbf{x}(0) = x_0 & (x_0\in X_0 \subset \mathbb{R}^n) \\ \mathbf{x}(\tau) = x_1 & (x_1 \in X_1 \subset \mathbb{R}^n)\end{cases}$$ and payoff is $$P[\mathbf{\alpha}(\cdot)]=\int_0^\tau r(\mathbf{x}(t),\mathbf{\alpha}(t))dt$$ where $\tau=\tau[\mathbf{\alpha}(\cdot)]$ is the first time we hit $X_1$.

Theorem (More Transversality Condition): Let $\mathbf{\alpha}^*(\cdot)$ and $\mathbf{x}^*(\cdot)$ solve the problem above, then there exists a function $\mathbf{p}^*(\cdot):[0,\tau^*]\rightarrow \mathbb{R}^n$, such that satisfy equation with $H$ for $0\le t \le \tau^*$. In addition, $$\begin{cases}\mathbf{p}^*(\tau^*) \perp \mbox{(tangent plane of }X_1\mbox{ at }x_1) \\ \mathbf{p}^*(0) \perp \mbox{(tangent plane of }X_0\mbox{ at }x_0)\end{cases}$$ this is transversality condition.

If $X_1=\{ x\ |\ g_k(x)=0,\ k=1,\cdots,l\}$, then $$\mathbf{p}^*(\tau^*)=\sum_{k=1}^l\lambda_k\nabla g_k(x^1)$$

Maximum Principle with State Constraints

In fixed endpoint problem, we can constraint $\mathbf{x}(\cdot)$ must always remain within a given region $R\subset \mathbb{R}^n$. $$R=\{x\in \mathbb{R}^n\ |\ g(x)\le 0\}$$ for a given function $g(\cdot):\mathbb{R}^n\rightarrow \mathbb{R}$.

Definition: It will be convenient to introduce the quantity $$ c(x,a):= \nabla g(x)\cdot \mathbf{f}(x, a)$$. Note that 'if $\mathbf{x}(t)\in \partial R$ for times $s_0\le t \le s_1$, then $c(\mathbf{x}(t),\mathbf{\alpha}(t))\equiv 0$ ($s_0\le t \le s_1$). (Since $\mathbf{f}$ is then tangent to $\partial R$, whereas $\nabla g$ is perpendicular.)

Theorem (Maximum Principle for State Constraints): Let $\mathbf{\alpha}^*(\cdot),\mathbf{x}^*(\cdot)$ solve the control theory problem above. Suppose also that $\mathbf{x}^*(t)\in \partial R$ for $s_)\le t\le s_1$. Then there exists a costate functino $\mathbf{p}^*(\cdot):[s_0,s_1]\rightarrow \mathbb{R}^n$ such that ODE holds. There also exists $\lambda^*(\cdot):[s_0,s_1]\rightarrow \mathbb{R}$ such that for times $s_0\le t \le s_1$ we have $$\dot{\mathbf{p}}^*(t)=-\nabla_xH(\mathbf{x}^*(t),\mathbf{p}^*(t),\mathbf{\alpha}^*(t))+\lambda^*(t)\nabla_x c(\mathbf{x}^*(t),\mathbf{\alpha}^*(t))$$ and $$H(\mathbf{x}^*(t),\mathbf{p}^*(t),\mathbf{\alpha}^*(t))=\max_{a\in A} \{ H(\mathbf{x}^*(t),\mathbf{p}^*(t),a)\ |\ c(\mathbf{x}^*(t),a)=0\}$$

g여러개, A, M''????

Proof of Pontryagin's Maximum Principle

An Informal Derivation

This shows why we use $H(x,p,a)$ and $\mathbf{p}^*(\cdot)$.

...많다..

Exercise for Pontryagian's Maximum Principle

1. Consider the optimal control problem.

$$\min \int_0^1(3x(t)^2+u(t)^2)dt\quad\mbox{subj. to}\quad \dot{x}(t)=x(t)+u(t),\ x(0)=x_0$$

($i$) Determine the optimal feedback control.

($ii$) Determine the optimal cost.

Answer: Similar to Example 3

Let $\alpha(t)\equiv u(t)$. Then payoff(-cost) function is $$P[\alpha(\cdot)]=-\int_0^1 3x(t)^2+\alpha(t)^2dt$$, and there is no constraint.

Introduce the maximum principle: $$f(x,a)=x+a,\ g\equiv0,\ r(x,a)=-3x^2-a^2$$ and hence $$H(x,p,a)=fp+r=(x+a)p-(3x^2+a^2)$$

The maximality condition becomes $$H(x(t),p(t),\alpha(t))=\max_{a\in \mathbb{R}}\{-(3x(t)^2+a^2)+p(t)(x(t)+a)\}$$

By differentiating on $a$, to maximize, $-2a+p=0$, or $a=\frac{p}{2}$, and so $$\alpha(t)=\frac{p(t)}{2}$$

The dynamical equations are therefore $$\dot{x}(t)=x(t)+\frac{p(t)}{2}$$ and $$\dot{p}(t)=-H_x=6x(t)-p(t)$$.

Moreover $x(0)=x^0$, and the terminal condition is $p(T)=0$.

Then system of equations become $$\begin{pmatrix}\dot{x}\\ \dot{p}\end{pmatrix} = \begin{pmatrix}1 & 1/2\\6 & -1\end{pmatrix}\begin{pmatrix}x\\p\end{pmatrix}$$ let above matrix write as $M$, then the general solution is $$\begin{pmatrix}x(t)\\ p(t)\end{pmatrix} =e^{tM} \begin{pmatrix} x^0\\ p^0\end{pmatrix}$$

Feedback controls: Let $c(\cdot):[0,T]\rightarrow\mathbb{R}$ which $$\alpha(t)=c(t)x(t)$$ we suppose that an optimal feedback control of this form exists. Now $$\frac{p(t)}{2}=\alpha(t)=c(t)x(t)$$ where $c(t)=\frac{p(t)}{2x(t)}$. Define $$d(t):=\frac{p(t)}{x(t)}$$ so that $c(t)=\frac{d(t)}{2}$.

Differential equation that $d(\cdot)$ satisties: $$\dot{d}=\frac{\dot{p}}{x}-\frac{p\dot{x}}{x^2}$$ and recall that $$\begin{cases}\dot{x}=x+\frac{p}{2}\\ \dot{p}=6x-p\end{cases}$$ Therefore $$\dot{d}=\frac{6x-p}{x}-\frac{p}{x^2}(x+\frac{p}{2})=6-d-d(1+\frac{d}{2})=6-2d-\frac{d^2}{2}$$ Since $p(T)=0$, the terminal condition is $d(T)=0$.

Riccati equation is $$\begin{cases}\dot{d}=6-2d-\frac{1}{2}d^2& (0\le t < T)\\ d(T)=0\end{cases}$$ solve this and set $\alpha(t)=\frac{1}{2}d(t)x(t)$.

Solve the Riccati equation: reform to second-order linear ODE $$d(t)=\frac{2\dot{b}(t)}{b(t)}$$ for a function $b(\cdot)$ to be found. Then by $\dot{d}=\frac{2\ddot{b}}{b}-\frac{2(\dot{b})^2}{b^2}=\frac{2\ddot{b}}{b}-\frac{d^2}{2}$, equation become $\frac{2\ddot{b}}{b}=\dot{d}+\frac{d^2}{2}=6-2d=6-2\frac{2\dot{b}}{b}$, finally, $$\begin{cases}\ddot{b}=3b-2\dot{b}&(0\le t < T)\\ \dot{b}(T)=0,\ b(T)=1\end{cases}$$

Solution is $b(t)=\frac{1}{4}e^{3-3t}+\frac{3}{4}e^{t-1}$.

2. We will solve two similar optimal control problems.

(a) Use PMP to solve $$\min \int_0^2 (u_1(t)^2+u_2(t)^2)dt\quad \mbox{subj. to. } \begin{cases} \begin{pmatrix} \dot{x}_1(t)\\ \dot{x}_2(t)\end{pmatrix}=\begin{pmatrix} u_1(t)\\u_2(t)\end{pmatrix}\\ x(0)=0,\ x(2)\in S_2\end{cases}$$

where $S_2 = \{ x \in \mathbb{R}^2:\ x_2^2-x_1 + 1=0\}$.

(b) Use PMP to solve $$\min \int_0^2 (u_1(t)^2+u_2(t)^2)dt\quad \mbox{subj. to. } \begin{cases} \begin{pmatrix} \dot{x}_1(t)\\ \dot{x}_2(t)\end{pmatrix}=\begin{pmatrix} u_1(t)\\u_2(t)\end{pmatrix}\\ x(0)\in S_0,\ x(2)\in S_2\end{cases}$$

where $S_0 = \{ x \in \mathbb{R}^2:\ x_2^2+x_1=0\}$ and $S_2$ is as above.

Answer: Similar to example4.6.1,4.6.2

(a) Let $\vec{\alpha}=\vec{u}$, payoff is $$P[\vec{\alpha}(\cdot)]:= -\int_0^2 \left| \vec{u}(t) \right|^2 dt$$

Then $f(\vec{x},\vec{a})=\vec{a}$, $g\equiv 0$, $r(\vec{x},\vec{a})=-\left| \vec{a} \right |^2$ and therefore $$H(\vec{x},\vec{p},\vec{a})=f(\vec{x},\vec{a})\vec{p} + r(\vec{x},\vec{a}) = \vec{a} \cdot \vec{p} - \left| \vec{a} \right|^2$$

The dynamical equation is $$\dot{\vec{x}}(t)=H_\vec{p} = (a_1(t), a_2(t))$$

and adjoint equation is $$\dot{\vec{p}}(t) = -H_{\vec{x}}= 0$$

and there is terminal condition (???) $\vec{p}(2)=g_{\vec{x}}(x(T))=0$.

Lastly, the maximality principle asserts $$H(\vec{x}(t),\vec{p}(t),\vec{\alpha}(t))=\max_{\vec{a} \in \mathbb{R}^2} \{ \vec{a} \cdot \vec{p} - \left| \vec{a} \right|^2\}$$

Each time $t$ the control value a must be selected to maximize $\vec{a}\cdot \vec{p}-\left| \vec{a} \right|^2$ for $\vec{a}\in \mathbb{R}^2$. Then $$ $\vec{\alpha}(t) = \frac{1}{2}\vec{p}$$

Then transversality conditions says that $$\vec{p}(2)\perp S_2$$

(b) Everything same except last line.

Then transversality conditions says that $$\vec{p}(0)\perp S_0,\ \vec{p}(2)\perp S_2$$

3. Use PMP to solve the optimal control problem

$$\min \int_0^1 4(2-u)xdt\quad\mbox{subj. to}\quad \begin{cases}\dot{x}=2(2u-1)x&x(0)=2,\ x(1)=4\\ 0\le u \le 2\end{cases}$$

Hint: First prove that $x(t)$ has constant sign on $t\in [0,1]$.

Answer: similar to Example2

Since $x(0)>0$, $\dot{x}>-2x$, $x(t)>0$.

Fixed point, Fixed time????: Last condition of Fixed time free endpoint not needed.

Let $\alpha = u$, payoff is $$P[\alpha(\cdot)]:=-\int_0^T 4(2-\alpha(t))x(t)dt$$

Then $f(x,a)=2(2a-1)x,\ g\equiv 0,\ r(x,a)=-4(2-a)x$ and therefore $$H(x,p,a)=f(x,a)p+r(x,a)=2(2a-1)xp-4(2-a)x=-2xp-8x + a(4xp+4x)$$

The dynamical equation is $$\dot{x}(t)=H_p=-2x(t)+4\alpha(t)x(t)$$ and adjoint equation is $$\dot{p}(t)=-H_x=2p+8-4\alpha p-4\alpha$$ and there is no terminal condition(?) $p(T)=g_x(x(T))=0$.(Maybe $g$ is not used.)

Lastly, the maximality principle asserts $$H(x(t),p(t),\alpha(t))=\max_{0\le a\le 2}\{-2xp-8x+a(4xp+4x)\}$$.

Each time $t$ the control value $\alpha(t)$ must be selected to maximize $a (4xp+4x)$ for $0\le a\le 1$. Then since $x(t)>0$, $$\alpha(t)=\begin{cases}2 & \mbox{if } p(t) > -1\\ 0&\mbox{if } p(t)\le -1\end{cases}$$ Hence if we know $p(\cdot)$, we can design the optimal control $\alpha(\cdot)$.

To solve costate $p(\cdot)$, $$\dot{p}(t)=2p(t)+8-\alpha(t)[4p(t)+4] \quad (0\le t \le T)$$

($p(T)=0$ used?....)

When $p(t)>-1$, $\dot{p}(t)=-6p$,

$\dot{x}(t)=6x(t)$,

When $p(t)\le -1$, $\dot{p}(t)=2p+8$,

$\dot{x}(t)=-2x(t)$,

Let transtition time is $\tau$. $x(t)= \begin{cases}Ae^{-2t} &(t\in[0,\tau))\\Be^{6t}&(t\in[\tau,1])\end{cases}$, then $A=2$, $B=4e^{-6}$, $\tau=\frac{6-\ln2}{8}$.

Finally, optimal control is $$u(t)=\begin{cases}0&(t\in [0,\frac{6-\ln2}{8}))\\2&(t\in[\frac{6-\ln2}{8},1])\end{cases}$$

4. Consider the optimal control problem

$$\min \int_0^{t_f} u(t)dt\quad\mbox{s.t.}\quad\begin{cases}\dot{x}(t)=-x(t)+u(t),& x(0)=x_0\ x(t_f)=0\\ u\in [0,m]\end{cases}$$

($a$) Suppose $t_f$ is fixed. For what values of $x_0$ is it possible to find a solution to the above problem, i.e. for what values of $x_0$ can the constraints be satisfied?

($b$)Find the optimal control for $x_0$ in ($a$).

($c$) Let $t_f$ be free, i.e. consider the optimal control problem

$$\min\int_0^{t_f} u(t)dt\quad\mbox{s.t.}\quad\begin{cases}\dot{x}(t)=-x(t)+u(t),& x(0)=x_0\ x(t_f)=0\\ u\in [0,m];\ t_f\ge 0\end{cases}$$

Solve this optimal control problem for the case when $x_0<0$.

Answer:

(a) We can integrate to get $x(t)$. $$x(t)=e^{-t}x_0+\int_0^t e^{-(t-s)}u(s)ds$$

integrating part is positive, so $x>0$ is $x_0>0$. So let's assume $x_0<0$.

Then possible way is $$e^{-t_f}x_0+(1-e^{-t_f})m\ge 0$$ Finally, $x_0\in[-(e^{t_f}-1)m,0]$

(b)Let $\alpha=u$, payoff $P[\alpha(\cdot)]=-\int_0^{t_f} \alpha(t)dt$

$f(x,a)=-x+a$, $r(x,a)=-a$ $$H(x,p,a)=f(x,a)p+r=(-x+a)p-a=-xp+a(p-1)$$

Also $$H(x(t),p(t),\alpha(t)=\max_{a\in [0,m]} \{-xp+a(p-1)\}$$

Then $$\alpha(t)=\begin{cases} m & \mbox{if } p(t)>1\\ 0 & \mbox{if } p(t)\le 1\end{cases}$$

Let's solve $p(\cdot)$, $\dot{p}(t)=-H_x=-p(t)$, $p(t)=Ae^{-t}$. Then optimal control become $$\alpha^*(t)=\begin{cases}m&Ae^{-t}>-1\\0&Ae^{-e}<-1\end{cases}$$ Then

$$\alpha^*(t)=\begin{cases}0&0\in [0,\tau)\\m&t\in [\tau,0]\end{cases}$$

Solve boundary condition, $$0=x(t_f)=e^{-t_f}x_0+\int_\tau^{t_f}e^{-(f_f-x)}\alpha^*(x)dx=e^{-t_f}x_0+(1-e^{-(t_f-\tau)})m=0$$. Finally, $\tau=t_f+\ln(\frac{m+e^{-t_f}x_0}{m})$.

$-xp+m(p-1)=0$ makes $p=\frac{m}{m-x}<1$, $p$ can't be bigger than 1.

Then $p=0$, $\alpha(t)=0$. Optimal control is $u(t)=0$. (It means $f_f$ became infinity.)

5. Determine the optimal control for the following optimal control problem using PMP

$$\max \int_0^1 (\ln (u) + x)dt\quad\mbox{subj. to}\quad \begin{cases}\dot{x}=x-u,\ x(0)=0\\ 0<u\le 1\end{cases}$$

Answer: Similar to example 3.

Let $\alpha = u$ and payoff is $$P[\alpha(\cdot)]=\int_0^1 (\ln (\alpha) + x)dt$$.

Then $$f(x,a)=x-a,\ g\equiv 0,\ r(x,a)=\ln(a)+x$$ and hence $$H(x,p,a)=fp+r=(x-a)p+ln(a)+x$$ The maximality condition becomes $$H(x(t),p(t),\alpha(t))=\max_{\alpha \in (0,1]}\{(x-a)p+ln(a)+x\}$$

Differentiate by $a$, $H_a=-p+\frac{1}{a}=0$, $a=\frac{1}{p}$ and so $$\alpha(t)=\begin{cases}\frac{1}{p}&(p>1)\\1&(p<1)\end{cases}$$

Let's solve $p(t)$. $$\dot{p}(t)=-H_x=-p(t)-1$$ Moreover $x(0)=x^0$, and the terminal condition is $$p(1)=0$$.

Then $$p(t)=Ae^{t}-1=e^{1-t}-1$$

Optimal control is $$u(t)=\begin{cases}\frac{1}{e^{1-t}-1}&(t\in[0,1-\ln2))\\ 1&(t\in [1-\ln2, 1])\end{cases}$$

With god(WolframAlpha), $$x(t)=\begin{cases}-e^{t-1}(t-\ln(e-e^t)+\ln(e-1))&t\in[0,1-\ln2)\\ e^{t-1}(-2-\ln(e-1)+\ln(2e-1))+1&t\in[1-\ln2,1]\end{cases}$$

Reference

Berkely - control course

Exercise

Search This Blog

BALISADA

Понтрягин's maximal principle

Lagrange Multiplier

Pontryagin Maximum Principle

Proof of Pontryagin's Maximum Principle

Exercise for Pontryagian's Maximum Principle

Reference