MAS473 Optimization
Preliminary on Differentiation
Taylor's Theorem via the Integration by Parts
Taylor's expansion of $f$ is $$f(t)=\sum^n_{k=0}\frac{f^{(k)}(a)}{k!}(t-a)^k+\int^t_a\frac{1}{n!}(t-x)^nf^{(n+1)}(x)dx$$ or $$|f(t)-\sum^n_{k=0}\frac{f^{(k)}(a)}{k!}(t-a)^k|\le \max_{|u-a|\le R}|f^{(n+1)}(u)|\frac{R^n}{n!}\quad \forall |t-a|\le R.$$
Taylor Expansion of Order Two in Two Variables
$$f(\mathbf{x})=f(\mathbf{a})+\nabla f(\mathbf{a})^T (\mathbf{x}-\mathbf{a})+\frac{1}{2}(\mathbf{x}-\mathbf{a})^T\nabla^2 f(\mathbf{p})(\mathbf{x}-\mathbf{a})$$
Then $$\nabla^2 f(\mathbf{p})=\nabla^2 f(p_1,p_2)=\begin{bmatrix} f_{x_1x_1}(p_1,p_2)& f_{x_1x_2}(p_1,p_2)\\ f_{x_1x_2}(p_1,p_2)& f_{x_2x_2}(p_1,p_2) \end{bmatrix}$$
- $Tr\nabla^2f(\mathbf{p})>0,\ \det \nabla^2f(\mathbf{p})>0$: $\forall \mathbf{x}$, $\mathbf{x}^T\nabla^2 f(\mathbf{p})\mathbf{x}\ge 0$;
- $Tr\nabla^2f(\mathbf{p})<0,\ \det \nabla^2f(\mathbf{p})>0$: $\forall \mathbf{x}$, $\mathbf{x}^T\nabla^2 f(\mathbf{p})\mathbf{x}\le 0$.
Preliminary on Convexity
Convexity of Sets
Definition. A subset $C$ of $\mathbb{R}^n$ is convex if $$\lambda\mathbf{v}_1+(1-\lambda)\mathbf{v}_2\in C\quad \forall \mathbf{v}_1,\mathbf{v}_2\in C, 0<\lambda<1.$$
Fact. Let $C_1$ and $C_2$ be convex sets and $\alpha>0$. Then, both $\alpha C_1$ and $C_1+C_2$ are convex.
Fact. Let $C_1$ and $C_2$ be convex sets. Then, $C_1\cap C_2$ is convex.
Convexity of Functions
Definition. A real-valued function $f$ defined on a convex set $C$ is convex if $$f(\lambda\mathbf{v}_1+(1-\lambda)\mathbf{v}_2)\le \lambda f(\mathbf{v}_1)+(1-\lambda)f(\mathbf{v}_2)\quad \forall \mathbf{v}_1,\mathbf{v}_2\in C,\ 0<\lambda<1$$
Fact. Let $f_1$ and $f_2$ be convex functions and $\alpha>0$. Then both $\alpha f_1$ and $f_1+f_2$ are convex.
Fact. Let $f_1$ and $f_2$ be convex functions. Then, $g(\mathbf{v})=\max \{ f_1(\mathbf{v},f_2(\mathbf{v})\}$ is convex.
Fact. Let $f$ be a convex function on a vector space $\mathbb{V}$ and $l$ be a linear transformation from a vector space $\mathbb{W}$ into $\mathbb{V}$. Then, $f\circ l$ is a convex function on $\mathbb{W}$.
Convexity of Quadratic Forms
Definition. For an $n\times n$ matrix $A$, if $\mathbf{v}A\mathbf{v}\ge 0$ for all $\mathbf{v}\in\mathbb{R}^n$, then $A$ is positive semi-definite. If $\mathbf{v}^TA\mathbf{v}>0$ for all $\mathbf{0}\ne \mathbf{v}\in\mathbb{R}^n$, then $A$ is positive definite.
Theorem. If $A$ is a positive semi-definite matrix, then $\mathbf{x}^TA\mathbf{x}$ is a convex function in $\mathbf{x}$.
Continuity and Differentiability of Convex Functions
convex function is continuous, and one-side differentiation exists.
Lemma. Let $f$ be a differentiable convex function on a convex set $C$. Then, for any $\mathbf{x},\mathbf{y}\in C$, $$f(\mathbf{y})-f(\mathbf{x})\ge \nabla f(\mathbf{x})^T(\mathbf{y}-\mathbf{x})$$
Jensen's Inequality for Convex Functions
$$g(\mathbb{E}[X])\le \mathbb{E}[g(X)]$$
Principle of Optimiality
Let $$D_f(\mathbf{x})=\{\mathbf{v}:\mathbf{v}^T\nabla f(\mathbf{x})>0\}$$
Lemma. If $D_f(\mathbf{x})\ne \varnothing$, then $f$ increase along a path of its initial gradient $\mathbf{v}$ starting at $\mathbf{x}$ for any $\mathbf{v}\in D_f(\mathbf{x})$.
So optimal means $D_f(\mathbf{x})=\varnothing$.
Let $$D_S(\mathbf{x})=\{ \mathbf{r}'(0):\ \mathbf{r}(0)=\mathbf{x},\ \mathbf{r}(t)\in S,\ 0\le t\le \epsilon_\mathbf{r}\quad \exists \epsilon_\mathbf{r}>0\}$$
Lemma. If $\mathbf{x}$ is an optimal solution to $P$, then $D)f(\mathbf{x})\cap D_S(\mathbf{x})=\varnothing$.
This means just optimal in region $S$.
First Order Optimality Conditions
Unconstrained Optimization
$$\nabla f(\mathbf{x}_0)=\mathbf{0}$$
Method of Lagrange Multipliers
When $$S=\{\mathbf{x}:g_1(\mathbf{x}=0,\cdots, g_k(\mathbf{x})=0\}$$ then optimal in $$g_1(\mathbf{x}_0)=0,\cdots,g_k(\mathbf{x}_0)=0,\quad \nabla f(\mathbf{x}_0)=\lambda_1 \nabla g_1(\mathbf{x}_0)+\cdots + \lambda_k \nabla g_k(\mathbf{x}_0)$$ with Langrange Multiplier $ \exists \lambda_1,\cdots,\lambda_k$
Convexity in Optimization
In convex, local minimum is global minimum.
A Second Order Optimality Condition
Add Hessian condition.
$$f(\mathbf{x}_0+t\mathbf{v})=f(\mathbf{x}_0)+t\nabla f(\mathbf{x}_0)^T\mathbf{v}+\frac{t^2}{2}\mathbf{v}^T\nabla^2 f(\mathbf{x}_0+\tau_t\mathbf{v})\mathbf{v}\\ \ge f(\mathbf{x}_0)+\frac{a_tt^2|\mathbf{v}|^2}{2}$$
Karush-Kuhn-Tucker (KKT) Condition
For $$\max\{f(\mathbf[x}):g_1(\mathbf{x})=0,\cdots,g_m(\mathbf{x})=0,\ h_1(\mathbf{x})\le 0,\cdots,h_n(\mathbf{x})\le 0\}$$
$$\nabla f(\mathbf{x})=\sum^m_{i=1}\lambda_i\nabla g_i(\mathbf{x})+\sum^n_{j=1}\mu_j\nabla h_j(\mathbf{x})$$
$$\lambda_1,\cdots,\lambda_m\in\mathbb{R},\ \mu_1\ge 0,\cdots, \mu_n\ge 0\\ \mu_1h_1(\mathbf{x})=0,\cdots,\mu_nh_n(\mathbf{x})=0\\ g_1(\mathbf{x})=0,\cdots, g_m(\mathbf{x})=0\\ h_1(\mathbf{x})\le 0,\cdots, h_n(\mathbf{x})\le 0$$