MAS473 Concentration Inequalities
The Law of large Numbers
Markov's inequality: Upper Bound of Prob.
Let $X$ be a non-negative random variable. Then for any $a>0$, $$\mathbb{P}(X\ge a)\le \frac{\mathbb{E}[X]}{a}.$$
Proof) $\mathbb{E}[X]\ge \mathbb{E}[X\mathbf{1}_{\{X\ge a\}}]\ge \mathbb{E}[a\mathbf{1}_{\{X\ge a\}}]=a\mathbb{P}(X\ge a)$.
Corollary 1
Let $X$ be a non-negative random variable. Then for any $b>0$, $$\mathbb{P}(X\ge b\mathbb{E}[X])\le \frac{1}{b}.$$
Chebyshev's inequality: Concentrate at Mean
Let $X$ be a random variable. Then for any $a>0$, $$\mathbb{P}(|X-\mathbb{E}[X]|\ge a)=\mathbb{P}((X-\mathbb{E}[X])^2\ge a^2)\le \frac{\mathbb{V}ar(X)}{a^2}.$$
Proof) Set $Y=(X-\mathbb{E}[X])^2$, use Markov's inequality in $\{Y\ge a^2\}$.
Law of Large Numbers: Concentrate to Mean
Let $X_1,\cdots,X_n$ be n independent samples of a random variable $X$. Denote $S_n=X_1+\cdots X_n$. Then $\frac{1}{n}S_n$ concentrates on $\mathbb{E}[X]$ in the sense that for any $\epsilon>0$, $$\mathbb{P}\left(\left| \frac{1}{n}S_n - \mathbb{E}[X]\right| \ge \epsilon\right) \le \frac{1}{n}\cdot \frac{\mathbb{V}ar(X)}{\epsilon^2}.$$
Proof) We know $\mathbb{E}[S_n]=n\mathbb{E}[X]$ and $\mathbb{V}ar(S_n)=n\mathbb{V}ar(X)$. So $\mathbb{E}[\frac{1}{n}S_n]=\mathbb{E}[X]$ and $\mathbb{V}ar(\frac{1}{n}S_n)=\frac{1}{n}\mathbb{V}ar(X)$, use Chevbyshev's inequality.
Master Tail Bounds Theorem
Let $S_n=X_1+\cdots+X_n$ where $X_1,\cdots,X_n$ are mutually independent random variables (possible not identical) with zero mean and variance at most $\sigma^2$. Let $0<a\le \sqrt{2}n\sigma^2$. Assume that $|\mathbb{E}[X^r_i]|\le r! \sigma^2$ for $r=3,4,\cdots,\frac{a^2}{4n\sigma^2}$. Then $$\mathbb{P}(|S_n|\ge a)\le 3e^{-\frac{a^2}{12n\sigma^2}}.$$
Concentration under Independence
Let $\phi$ becomes strictly increasing positive function. Then for any $X$ and $t$, $$\{\omega:X(\omega)\ge t\}=\{\omega:\phi(X(\omega))\ge\phi(t)\}$$ with Markov's inequality makes $$\mathbb{P}(X\ge t)\le \phi(t)^{-1}\mathbb{E}[\phi(X)]$$
Chernoff's inequalities: Upper bound of Prob. (Strong)
For $\theta>0$, we set $\phi(x)=e^{\theta x}$.
In this case, $\mathbb{E}[\phi(X)]$ is called Moment Generating Function (MGF).
$$\mathbb{P}(X\ge t)\le e^{-\theta t}\mathbb{E}[e^{\theta X}]$$
We also define cumulant function $$\psi_X(\theta):=\ln \mathbb{E}[e^{\theta X}]$$
These are properties of cumulant function.
1. For $0<\lambda <1$, by Holder inequality, $$\mathbb{E}[e^{\lambda\theta_1 X+(1-\lambda)\theta_2X}]\le \mathbb{E}[e^{\theta_1X}]^\lambda \mathbb{E}[e^{\theta_2X}]^{1-\lambda}$$ so $\psi(\theta)$ is convex function.
2. Since $\psi(0)=\ln\mathbb{E}[e^{0\cdot X}]=0$ and $\psi'(\theta)=\frac{\mathbb{E}[Xe^{\theta X}]}{\mathbb{E}[e^{\theta X}]}$, then $\psi'(0)=\mathbb{E}[X]$
Put in definition, $$\mathbb{P}(X\ge t)\le e^{-\theta t}e^{\psi(\theta)}=e^{-\theta t+\psi(\theta)}$$, minimize with $\theta$, $$\mathbb{P}(X\ge t)\le e^{-\sum_{\theta>0}\{\theta t-\psi(\theta)\} }$$
Then we can set $$\psi^*(t)=\sum_{\theta>0}\{ t\theta - \psi(\theta)\}$$ and $$\mathbb{P}(X\ge t)\le e^{-\psi^*(t)}$$
Also, we can expand $\theta>0$ to $\theta \in \mathbb{R}$ when $t>\mathbb{E}[X]$ because $t-\psi'(0)=t-\mathbb{E}[X]>0$
Fenchel Duality
For any $f:\mathbb{R}^n\rightarrow \mathbb{R}$, Fenchel-Legendre dual $f^*:\mathbb{R}^n\rightarrow \mathbb{R}$ defined as $$f^*(\mathbb{y})=\sup_{\mathbf{x}\in\mathbb{R}^n}\mathbf{y}^T\mathbf{x}-f(\mathbf{x})$$
When $f$ is convex, $$f^**=f$$
In general, $$f^**(\mathbf{x})\le f(\mathbf{x})$$
Gaussian random variables
Let $X\sim N(0,\sigma^2)$, $$\psi(\theta)=\ln \mathbb{E}[e^{\theta X}]=\frac{1}{2}\theta^2\sigma^2$$ from calculate Gaussian integral.
For $t>0$ $$\theta_t = \text{argmax}_\theta\left\{ t\theta - \frac{1}{2}\sigma^2\theta^2\right\}=\frac{t}{\sigma^2}$$ then Fenchel-Legendre dual is $$\psi^*(t)=t\theta_t-\psi_X(\theta_t)=t\cdot \frac{t}{\sigma^2}-\frac{1}{2}\sigma^2\cdot\frac{t^2}{\sigma^4}=\frac{t^2}{2\sigma^2}$$
Then Chernoff bound of Gaussian is $$\mathbb{P}(X\ge t)\le e^{-\frac{t^2}{2\sigma^2}}$$
For $X\sim N(\mu,\sigma^2)$, $$\mathbb{P}(X-\mathbb{E}[X]\ge t)\le e^{-\frac{t^2}{2\sigma^2}}$$
Sums of independent random variables
Mean 0 and independent variables $X_1,\cdots, X_n$, $S=X_1+\cdots+X_n$. Then MGF of $S$ are $$e^{\psi_S(\theta)}=\mathbb{E}[e^{\theta S}]=\prod^n_{i=1}\mathbb{E}[e^{\theta X_i}]=\prod^n_{i=1}e^{\psi_{X_i}(\theta)}=e^{\sum^n_{i=1}\psi_{X_i}(\theta)}$$ and cumulant function is $$\psi_S(\theta)=\sum^n_{i=1}\psi_{X_i}(\theta)$$ then Fenchel-Legendre dual is $$\psi^*_S(t)=\sup_\theta \{ t\theta-n\psi_X(\theta)\}\\ =n\sup_\theta \{\frac{t}{n}\cdot \theta-\psi_X(\theta)\}\\ =n\psi^*_X(\frac{t}{n})$$ which is dual of each variables.
Sub-Gaussian random variables
For random variable $X$, when $X'=X-\mathbb{E}[X]$, $$e^{\psi_{X'}(\theta)}=\mathbb{E}[e^{\theta(X-\mathbb{E}[X])}]\le e^{\frac{\theta^2\sigma^2}{2}}\quad \text{ or }\quad \psi_{X'}(\theta)\le \frac{1}{2}\sigma^2\theta^2\quad \forall\theta \in \mathbb{R}$$ then sub-Gaussian.
Fenchel-Legendre dual is $$\psi^*_{X'}(t)=\sup_\theta \{t\theta-\psi_{X'}(\theta)\}\ge \sup_\theta \{t\theta -\frac{1}{2}\sigma^2\theta^2\}=\frac{t^2}{2\sigma^2}$$ then Chernoff bound is $$\mathbb{P}(X-\mathbb{E}[X]> t)=\mathbb{P}(X'>t)\le e^{-\psi^*_{X'}(t)}\le e^{-\frac{t^2}{2\sigma^2}}$$
Mean 0 and independent variables $X_1,\cdots,X_n$ are sub-Gaussian with $\sigma_i$, then for $S=X_1+\cdots+X_n$, $$\mathbb{E}[e^{\theta(S-\mathbb{E}[S])}]=\prod^n_{i=1}\mathbb{E}[e^{\theta(X_i-\mathbb{E}[X_i])}] \le \prod^n_{i=1}e^{\frac{\theta^2\sigma^2_i}{2}}=e^{\frac{\theta^2\sum^n_{i=1}\sigma^2_i}{2}}$$ so $S+X_1+\cdots+X_n$ is sub-Gaussian with parameter $\sqrt{\sigma^2_1+\cdots+\sigma^2_n}$, tail bound is
Hoeffding Bound
Let $X_1,\cdots,X_n$ be independent and sub-Gaussian with parameters $\sigma_i$. Then, for the sum of $S=X_1+\cdots+X_n$ and any positive $t$, $$\mathbb{P}(S-\mathbb{E}[S]>t) \vee \mathbb{P}(S-\mathbb{E}[S]<-t)\le \exp \left( -\frac{t^2}{2\sum^n_{i=1}\sigma^2_i} \right)$$
Maximal inequality
$X_1,X_2,\cdots$ are mean 0 and sub-Gaussian with $\sigma$ (not independent), $$\mathbb{E}[\max \{X_1,\cdots,X_n\}]\le \sigma \sqrt{2\ln n}$$
Proof: $e^{\theta\mathbb{E}[\max_{i\le n}X_i]}\le \mathbb{E}[e^{\max_{i\le n}\theta X_i}]=\mathbb{E}[\max_{i\le n}e^{\theta X_i}]\le \mathbb{E}[\sum_{i\le n}e^{\theta X_i}]=\sum_{i\le n}\mathbb{E}[e^{\theta X_i}]\le ne^{\frac{1}{2}\theta^2\sigma^2}$