MAS473 Information and Entropy
Entropy and Optimal Coding
Kraft's inequality
A code for $N$ words, $c:\{1,2,\cdots,N\}\rightarrow \{0,1\}^*$ is uniquely decodable if $i_1,\cdots,i_N\mapsto c(t_1),\cdots,c(i_N)$ is injective, where on the right-hand side the words are simply concatenated. Then, for any uniquely decodable code $c$, $$\sum^N_{i=1}2^{-l(c(i))}\le 1$$
~~
$$H(\mathbb{P})=\sum^N_{i=1}p_i\ln \frac{1}{p_i}$$
Cross Entropy and Kullback-Leibler Divergence (Relative Entropy)
$$H(\mathbb{P},\mathbb{Q})=\sum_i p_i\ln\frac{1}{q_i}$$
$$D(\mathbb{P}||\mathbb{Q})=H(\mathbb{P},\mathbb{Q})-H(\mathbb{P})=\sum_i p_i\ln\frac{1}{q_i}-\sum_i p_i\ln\frac{1}{p_i}=\sum_i p_i\ln\frac{p_i}{q_i}$$
Gaussian case: $D(\mathbb{P}||\mathbb{Q})=\frac{(\mu_1-\mu_2)^2+\sigma^2_1-\sigma^2_2}{2\sigma^2_2}+\ln\frac{\sigma_2}{\sigma_1}$
Data Processing Inequality
Theorem
$$D_F(\mathbb{P}_g||\mathbb{Q}_g)\le D_F(\mathbb{P}||\mathbb{Q})$$
Entropy Maximizing Distribution from Cross Entropy
uniform is maximizing entropy
Distance Between Probability Measures
$$\frac{1}{2}||\mathbb{P}-\mathbb{Q}||_1=\sup_{A\in F}(\mathbb{P}(A)-\mathbb{Q}(A))=\int_{\{\frac{d\mathbb{P}}{a\mathbb{Q}}\ge 1\}} \left(\frac{d\mathbb{P}}{d\mathbb{Q}}-1\right)d\mathbb{Q}=\frac{1}{2}\int\left| \frac{d\mathbb{P}}{d\mathbb{Q}}-1\right| d\mathbb{Q}$$
Entropic Inequalities
Pinsker's inequality
$$||\mathbb{P}-\mathbb{Q}||_1\le \sqrt{2D(\mathbb{P}||\mathbb{Q})}$$
$$||mathbb{P}-\mathbb{Q}||_1=\int |f-g|d\nu$$
$$\frac{1}{2}||\mathbb{P}-\mathbb{Q}||_1\le H(\mathbb{P},\mathbb{Q})\le ||\mathbb{P}-\mathbb{Q}||^{1/2}_1$$