MAS473 summery
HDS
Contentration near the equator
Then, at least 1-\frac{2}{c}e^{-\frac{1}{2}c^2} fraction of the volume of the d-dimensional unit ball has |x_1|\le \frac{c}{\sqrt{d-1}} where \mathbf{x}=(x_1,\cdots,x_d)^T\in B_1(\mathbf{0}).
Near otrhogonality
Consider drawing n points \mathbf{x}_1,\cdots,\mathbf{x}_n in \mathbb{R}^d at random from the unit ball B_1(\mathbf{0}). With probability 1-O(\frac{1}{n}),
1. |\mathbf{x}_i|\ge 1-\frac{2\ln n}{d} for all $i;
2. |\mathbf{x}_i\cdot \mathbf{x}_j|\le \frac{\sqrt{6\ln n}}{\sqrt{d-1}} for all i\ne j.
Gaussian Integral / Distribution
\int^\infty_{-\infty}e^{-a(x+b)^2}dx=\sqrt{\frac{\pi}{a}}
\int_{\mathbb{R}^n}\exp\left( -\frac{1}{2}x^TAx+b^Tx+c\right) d^nx=\sqrt{\det(2\pi A^{-1})}e^{\frac{1]{2}b^TA^{-1}b+c}
f(\mathbf{x})=\frac{1}{(2\pi)^{d/2}\sigma^d}\exp \left(-\frac{|\mathbf{x}|^2}{2\sigma^2}\right)
Concentration of Gaussian Distribution
For a d-dimensional spherical Gaussian with unit variance in each direction, for any \beta\le \sqrt{d}, at least 1-3e^{-c\beta^2} of the probability mass concentrates within the annulus \{ \mathbf{x}\in \mathbb{R}^d: \sqrt{d}-\beta\le |\mathbf{x}|\le \sqrt{d}+\beta\} = \{ \mathbf{x}\in\mathbb{R}^d: ||\mathbf{x}|-\sqrt{d}|\le \beta\} where c is a fixed positive constant which is at least \frac{1}{24}.
Random Projection Theorem
Let \mathbf{v} be a fixed vector in \mathbb{R}^d and let f(\cdot) be defined as above. There exists a constant c>0 such that for \epsilon\in(0,1), \mathbb{P}\left( \left|\left| \frac{1}{\sqrt{m}}f(\mathbf{v})\right| - |\mathbf{v}|\right| \ge \epsilon |\mathbf{v}|\right)\ge 3e^{-cm\epsilon^2} where the probability is taken over the random draws of vectors \mathbf{u}_i used to construct f.
CI
Markov Inequality
Let X be a non-negative random variable. Then for any a>0, \mathbb{P}(X\ge a)\le \frac{\mathbb{E}[X]}{a}.
Chebyshev's Inequality
Let X be a random variable. Then for any a>0, \mathbb{P}(|X-\mathbb{E}[X]|\ge a)=\mathbb{P}((X-\mathbb{E}[X])^2\ge a^2)\le \frac{\mathbb{V}ar(X)}{a^2}.
Master Tail Bound Theorem
Let S_n=X_1+\cdots+X_n where X_1,\cdots,X_n are mutually independent random variables (possible not identical) with zero mean and variance at most \sigma^2. Let 0<a\le \sqrt{2}n\sigma^2. Assume that |\mathbb{E}[X^r_i]|\le r! \sigma^2 for r=3,4,\cdots,\frac{a^2}{4n\sigma^2}. Then \mathbb{P}(|S_n|\ge a)\le 3e^{-\frac{a^2}{12n\sigma^2}}.
Chernoff's Inequality
cumulant function \psi_X(\theta):=\ln \mathbb{E}[e^{\theta X}]
Then we can set \psi^*(t)=\sup_{\theta>0}\{ t\theta - \psi(\theta)\} and \mathbb{P}(X\ge t)\le e^{-\psi^*(t)}
Gaussian Random Variables
Let X\sim N(0,\sigma^2), \psi(\theta)=\ln \mathbb{E}[e^{\theta X}]=\frac{1}{2}\theta^2\sigma^2 from calculate Gaussian integral.
Then Chernoff bound of Gaussian is \mathbb{P}(X\ge t)\le e^{-\frac{t^2}{2\sigma^2}}
Sub-Gaussian Random Variables
For random variable X, when X'=X-\mathbb{E}[X], e^{\psi_{X'}(\theta)}=\mathbb{E}[e^{\theta(X-\mathbb{E}[X])}]\le e^{\frac{\theta^2\sigma^2}{2}}\quad \text{ or }\quad \psi_{X'}(\theta)\le \frac{1}{2}\sigma^2\theta^2\quad \forall\theta \in \mathbb{R} then sub-Gaussian.
Fenchel-Legendre dual is \psi^*_{X'}(t)=\sup_\theta \{t\theta-\psi_{X'}(\theta)\}\ge \sup_\theta \{t\theta -\frac{1}{2}\sigma^2\theta^2\}=\frac{t^2}{2\sigma^2} then Chernoff bound is \mathbb{P}(X-\mathbb{E}[X]> t)=\mathbb{P}(X'>t)\le e^{-\psi^*_{X'}(t)}\le e^{-\frac{t^2}{2\sigma^2}}
Karush-Kuhn-Tucker (KKT) Condition
For \max \{ f(\mathbf{x}):g_1(\mathbf{x})=0,\cdots,g_m(\mathbf{x})=0,\ h_1(\mathbf{x})\le 0,\cdots,h_n(\mathbf{x})\le 0\}
\nabla f(\mathbf{x})=\sum^m_{i=1}\lambda_i\nabla g_i(\mathbf{x})+\sum^n_{j=1}\mu_j\nabla h_j(\mathbf{x})
\lambda_1,\cdots,\lambda_m\in\mathbb{R},\ \mu_1\ge 0,\cdots, \mu_n\ge 0\\ \mu_1h_1(\mathbf{x})=0,\cdots,\mu_nh_n(\mathbf{x})=0\\ g_1(\mathbf{x})=0,\cdots, g_m(\mathbf{x})=0\\ h_1(\mathbf{x})\le 0,\cdots, h_n(\mathbf{x})\le 0
IE
Cross Entropy and Kullback-Leibler Divergence (Relative Entropy)
H(\mathbb{P},\mathbb{Q})=\sum_i p_i\ln\frac{1}{q_i}
D(\mathbb{P}||\mathbb{Q})=H(\mathbb{P},\mathbb{Q})-H(\mathbb{P})=\sum_i p_i\ln\frac{1}{q_i}-\sum_i p_i\ln\frac{1}{p_i}=\sum_i p_i\ln\frac{p_i}{q_i}
Gaussian case: D(\mathbb{P}||\mathbb{Q})=\frac{(\mu_1-\mu_2)^2+\sigma^2_1-\sigma^2_2}{2\sigma^2_2}+\ln\frac{\sigma_2}{\sigma_1}
Distance Between Probability Measures
\frac{1}{2}||\mathbb{P}-\mathbb{Q}||_1=\sup_{A\in F}(\mathbb{P}(A)-\mathbb{Q}(A))=\int_{\{\frac{d\mathbb{P}}{a\mathbb{Q}}\ge 1\}} \left(\frac{d\mathbb{P}}{d\mathbb{Q}}-1\right)d\mathbb{Q}=\frac{1}{2}\int\left| \frac{d\mathbb{P}}{d\mathbb{Q}}-1\right| d\mathbb{Q}