High Dimensional Spaces

Volume of the Unit Ball

$$B_r(\mathbf{x})=\{\mathbf{y}\in \mathbb{R}^d:\ |\mathbf{x}-\mathbf{y}|\le r\}$$

In Euclidean distance, $|\mathbf{x}-\mathbf{y}|=\sqrt{\sum^d_{i=1}(x_i-y_i)^2}$.

Let volume of unit ball $B_1(\mathbf{0})$ is $V(d)=\text{volme}(B_1(\mathbf{0}))$. Then $$\text{volume}(B_r(\mathbf{x}))=V(d)\cdot r^d$$

$$V(d)=\int_{S^d}\int^1_0 r^{d-1}drd\Omega=\frac{1}{d}\int_{S^d}d\Omega=\frac{A(d)}{d}$$

Volume near the Equator

Theorem (Concentration near the Equator)

Assume $c\ge 1$ and $d\ge 3$. Then, at least $1-\frac{2}{c}e^{-\frac{1}{2}c^2}$ fraction of the volume of the $d$-dimensional unit ball has $|x_1|\le \frac{c}{\sqrt{d-1}}$ where $\mathbf{x}=(x_1,\cdots,x_d)^T\in B_1(\mathbf{0})$.

Proof)

WLOG, we can set axis $x_1$ and see $x_1\frac{c}{\sqrt{d-1}}$ in $x_1>0$.

$$A=B_1(\mathbf{0})\cap \left\{ \mathbf{x}\in \mathbb{R}^d:x_1\ge \frac{c}{\sqrt{d-1}}\right\}, H=B_1(\mathbf{0}\cap \{\mathbf{x}\in \mathbb{R}^d:x_1\ge 0\}$$

$$\text{volume}(A)=\int^1_{\frac{c}{\sqrt{d-1}}}(1-x^2_1)^{\frac{d-1}{2}}V(d-1)dx_1\\ \le \sum^1_{\frac{c}{\sqrt{d-1}}}e^{-\frac{d-1}{2}x^2_1}V(d-1)dx_1\\ \le \int^1_{\frac{c}{\sqrt{d-1}}}\frac{x_1\sqrt{d-1}}{c}e^{-\frac{d-1}{2}x_1^2}V(d-1)dx_1\\ \le V(d-1)\frac{\sqrt{d-1}}{c}\int^\infty_{\frac{c}{\sqrt{d-1}}}e^{-\frac{d-1}{2}x_1^2}x_1dx_1\\ =V(d-1)\frac{\sqrt{d-1}}{2c}\int^\infty_{\frac{c^2}{d-1}}e^{-\frac{d-1}{2}z}dz\\ =V(d-1)\frac{\sqrt{d-1}}{2c}\left( -\frac{2}{d-1}e^{-\frac{d-1}{2}z}\right)|^\infty_{\frac{c^2}{d-1}}\\ =\frac{V(d-1)}{c\sqrt{d-1}}e^{-\frac{c^2}{2}}$$

$H$ contains thin cylinder (equator) $$H\supset \left\{ \mathbf{x}\in\mathbb{R}^d:0\le x_1\le \frac{1}{\sqrt{d-1}},\ x^2_2+\cdots+x^2_d\le 1-\frac{1}{d-1} \right\}$$

So $$\text{volume}(H)\ge \text{volume}\left(\left\{\mathbf{x}\in\mathbb{R}^d:0\le x_1\le \frac{1}{\sqrt{d-1}}, x^2_2+\cdots+x^2_d\le 1-\frac{1}{d-1}\right\}\right)\\ =V(d-1)\left(1-\frac{1}{d-1}\right)^{\frac{d-1}{2}}\frac{1}{\sqrt{d-1}}\\ \ge \frac{V(d-1)}{2\sqrt{d-1}}$$

Finally, $$\frac{\text{volume}(A)}{\text{volume}(H)}\le \frac{2}{c}e^{-\frac{c^2}{2}}$$

For each $|x_i|$ are smaller then $\frac{c}{\sqrt{d-1}}$ with probability $\frac{2}{c}e^{-\frac{1}{2}c^2}$.

So by union bound, at leat one $|x_i|$ ($1\le i\le k$) is bigger than $\frac{c}{\sqrt{d-1}}$ with $\frac{2k}{c}e^{-\frac{1}{2}c^2}$.

Theorem (Concentration near the Intersection of Equators)

Assume $c\ge 1$, $d\ge 3$, and $k<d$. Then at least $1-\frac{2k}{c}e^{-\frac{1}{2}c^2}$ fraction of the volume of the $d$-dimensional unit ball has $\max \{ |x_1|,\cdots,|x_k|\} \le \frac{c}{\sqrt{d-1}}$ where $\mathbf{x}=(x_1,\cdots,x_d)^T\in B_1(\mathbf{0})$.

Near Orthogonality of Two Arbitrary Vectors

Almost points are on surface of sphere, and orthogonal to each other.

Theorem (Near Orthogonality)

Consider drawing $n$ points $\mathbf{x}_1,\cdots,\mathbf{x}_n$ in $\mathbb{R}^d$ at random from the unit ball $B_1(\mathbf{0})$. With probability $1-O(\frac{1}{n})$,

1. $|\mathbf{x}_i|\ge 1-\frac{2\ln n}{d}$ for all $i;

2. $|\mathbf{x}_i\cdot \mathbf{x}_j|\le \frac{\sqrt{6\ln n}}{\sqrt{d-1}}$ for all $i\ne j$.

Proof 1)

Ratio of volume of sphere $A$ over\ $(1-\epsilon)A$ is $$\frac{\text{volume}((1-\epsilon)A)}{\text{volume}(A)}=(1-\epsilon)^d\le e^{-\epsilon d}$$

So $$\mathbb{P}(|\mathbf{x}_i|< 1-\epsilon)\le e^{-\epsilon d}$$.

When $\epsilon=\frac{2\ln n}{d}$, $$\mathbb{P}\left( |\mathbf{x}_i|< 1-\frac{2\ln n}{d}\right)\le e^{-(\frac{2\ln n}{d})d}=\frac{1}{n^2}$$

By union bound, $$\mathbb{P}\left( |\mathbf{x}_i|< 1-\frac{2\ln n}{d}\text{ for some }i\right) \le \sum^n_{j=1}\mathbb{P}\left( |\mathbf{x}_j| < 1-\frac{2\ln n}{d}\right) \le n\times \frac{1}{n^2}=\frac{1}{n}$$

Proof 2)

비슷하게

Proof of that Volume of the Unit Ball goes to zero

시험에 나오나?

Gaussians in High Dimension: Also concentrate on surface

$d$-dimensional spherical Gaussian with zero mean and variance $\sigma^2$ in each coordinate $$f(\mathbf{x})=\frac{1}{(2\pi)^{d/2}\sigma^d}\exp \left(-\frac{|\mathbf{x}|^2}{2\sigma^2}\right)$$

Probability also concentrate on surface.

Theorem

For a $d$-dimensional spherical Gaussian with unit variance in each direction, for any $\beta\le \sqrt{d}$, at least $1-3e^{-c\beta^2}$ of the probability mass concentrates within the annulus $$\{ \mathbf{x}\in \mathbb{R}^d: \sqrt{d}-\beta\le |\mathbf{x}|\le \sqrt{d}+\beta\} = \{ \mathbf{x}\in\mathbb{R}^d: ||\mathbf{x}|-\sqrt{d}|\le \beta\}$$ where $c$ is a fixed positive constant which is at least $\frac{1}{24}$.

Proof)

Let $\mathbf{X}=(X_1,\cdots, X_d)$ with i.i.d. $X_i \sim N(0,1)$.

Then think complement $||\mathbf{X}|-\sqrt{d}|\ge \beta$, $$||\mathbf{X}|^2-d|=||\mathbf{X}|-\sqrt{d}|(|\mathbf{X}|+\sqrt{d})\ge \beta(|\mathbf{X}|+\sqrt{d})\ge \beta\sqrt{d}$$

Let $$W_i=\frac{1}{2}(X^2_i-1)$$, then $\mathbb{E}[W_i]=0$ and $$\frac{1}{2}(|\mathbf{X}|^2-d)=\frac{1}{2}\sum^d_{i=1}(X^2_i-1)=\sum^d_{i=1}W_i$$

Since $2|W_i| \le \max \{1, X^2_i\}$, for $r\in \mathbb{N}$, $$2^r |W_i|^r \le (\max \{ 1,X^2_i\})^r=\max \{ 1, X^{2r}_i\} \le 1 + X^{2r}_i$$

By formula $\int_0^\infty x^{2n}e^{-ax^2}dx=\frac{\Gamma(2n)}{2^{n+1}a^n}\sqrt{\frac{\pi}{a}}$ and $\int^\infty_{-\infty}e^{-ax^2}dx = \sqrt{\frac{\pi}{a}}$, $$\mathbb{E}[X^{2r}_i]=2^r\times (r-\frac{1}{2})\times \cdots\times \frac{3}{2}\times \frac{1}{2}$$ then $$|\mathbb{E}[W^r_i]|\le \mathbb{E}[|W_i|^r]\le 2^r + 2^r\mathbb{E}[|X_i|^{2r}]\\ =2^r + 2^r\times 2^r\times (r-\frac{1}{2})\times \cdots\times \frac{3}{2}\times \frac{1}{2}\\ \le 2r!$$

Also $\mathbb{V}ar(W_i)=\mathbb{E}[W_i^2]=\frac{1}{4}\mathbb{E}[X^4_i]-\frac{1}{2}\mathbb{E}[X^2_i]+\frac{1}{4}=\frac{3}{4}-\frac{1}{2}+\frac{1}{4}=\frac{1}{2}=\sigma^2_W$ then $$|\mathbb{E}[W^r_i]|=\frac{1}{2^r}|\mathbb{E}[Y^r_i]|\le \frac{1}{2^r}2^{r-1}r!=2^{-1}r!=\sigma^2_W\times r!$$

This satisfy requirements of master tail bound. By master tail bound theorem, $$\mathbb{P}\left(|W_1+\cdots+W_d|\ge \frac{\beta\sqrt{d}}{2}\right) \le 3e^{-\frac{\beta}{24}}$$

Finally, $$\left\{ |W_1+\cdots+W_d|\ge \frac{\beta\sqrt{d}}{2}\right\}=\{|Y_1+\cdots+Y_d|\ge \beta\sqrt{d}\} = \{ ||\mathbf{X}|^2-d|\ge \beta\sqrt{d}\}$$

마법인가?

Random Projection and Johnson-Lindenstrauss Lemma

For $\mathbf{u}_i\in \mathcal{N}(0,I_d)$, $$f(\mathbf{v})=(\mathbf{v}\cdot \mathbf{u}_1,\cdots, \mathbf{v}\cdots \mathbf{u}_m)^T$$ Is this just random Gaussian matrix??

Random Projection Theorem

Let $\mathbf{v}$ be a fixed vector in $\mathbb{R}^d$ and let $f(\cdot)$ be defined as above. There exists a constant $c>0$ such that for $\epsilon\in(0,1)$, $$\mathbb{P}\left( \left|\left| \frac{1}{\sqrt{m}}f(\mathbf{v})\right| - |\mathbf{v}|\right| \ge \epsilon |\mathbf{v}|\right)\ge 3e^{-cm\epsilon^2}$$ where the probability is taken over the random draws of vectors $\mathbf{u}_i$ used to construct $f$.

Proof)

$\{||f(\mathbf{v})|-\sqrt{m}|\mathbf{v}||\ge \epsilon \sqrt{m}|\mathbf{v}|\}$ divide with $|\mathbf{v}|$, $\{||f(\frac{1}{|\mathbf{v}|}\mathbf{v})|-\sqrt{m}|\ge \epsilon\sqrt{m}\}$, then WLOG, we can set $|\mathbf{v}|=1$.

Then $\mathbf{u}_i\cdot\mathbf{v}=\sum^d_{j=1}u_{ij}v_j$ are Gaussian distribution with mean 0 and independent, $$\mathbb{V}ar(\mathbf{u}_i\cdot \mathbf{v})=\mathbb{V}ar\left( \sum^d_{j=1}u_{ij}v_j\right) = \sum^d_{j=1}v^2_j \mathbb{V}ar(u_{ij})=\sum^d_{j=1}v^2_j=1$$ Use above theorem with $d=m$, $\beta=\epsilon\sqrt{m}$, $\mathbf{X}=f(\mathbf{v})=(\mathbf{u}_1\cdot \mathbf{v},\cdots,\mathbf{u}_m\cdot\mathbf{v})^T$.

Johnson-Lindenstrauss Lemma

For any $0<\epsilon <1$ and any positive integer $n$, let $m\ge \frac{3}{c\epsilon^2}\ln n$ with $c$ as in above theorem. For any set of $n$ points $\{\mathbf{v}_1,\cdots,\mathbf{v}_n\} \subset \mathbb{R}^d$, the random projection $f: \mathbb{R}^d\rightarrow \mathbb{R}^m$ defined above has the property that with probability at least $1-\frac{3}{2n}$, $$(1-\epsilon)|\mathbf{v}_i-\mathbf{v}_j|\le \left| \frac{1}{\sqrt{m}}f(\mathbf{v}_i)-\frac{1}{\sqrt{m}}f(\mathbf{v}_j)\right| \le (1+\epsilon)|\mathbf{v}_i-\mathbf{v}_j|\quad \text{for all } \mathbf{v}_i,\mathbf{v}_j.$$

Proof)

For any $(i,j)$, use random projection theorem for $\mathbf{v}_i-\mathbf{v}_j$. (Note that $m\ge \frac{3}{c\epsilon^2}\ln n$)

By linearity of $f(\cdot)$, $$\mathbb{P}(||f(\mathbf{v}_i)-f(\mathbf{v}_j)|-\sqrt{m}|\mathbf{v}_i-\mathbf{v}_j||\ge \epsilon\sqrt{m}|\mathbf{v}_i-\mathbf{v}_j|)\le \frac{3}{n^3}$$

# of combination is $_nC_2\le \frac{n^2}{2}$, by union bound, $\frac{3}{n^3}\cdot \frac{n^2}{2}=\frac{3}{2n}$.

Search This Blog

BALISADA

High Dimensional Spaces

Volume of the Unit Ball

Volume near the Equator

Theorem (Concentration near the Equator)

Proof)

Theorem (Concentration near the Intersection of Equators)

Near Orthogonality of Two Arbitrary Vectors

Theorem (Near Orthogonality)

Proof 1)

Proof 2)

Proof of that Volume of the Unit Ball goes to zero

Gaussians in High Dimension: Also concentrate on surface

Theorem

Proof)

Random Projection and Johnson-Lindenstrauss Lemma

Random Projection Theorem

Proof)

Johnson-Lindenstrauss Lemma

Proof)