MAS473 Information and Entropy

Entropy and Optimal Coding

Kraft's inequality

A code for $N$ words, $c:\{1,2,\cdots,N\}\rightarrow \{0,1\}^*$ is uniquely decodable if $i_1,\cdots,i_N\mapsto c(t_1),\cdots,c(i_N)$ is injective, where on the right-hand side the words are simply concatenated. Then, for any uniquely decodable code $c$, $$\sum^N_{i=1}2^{-l(c(i))}\le 1$$ 


~~

$$H(\mathbb{P})=\sum^N_{i=1}p_i\ln \frac{1}{p_i}$$


Cross Entropy and Kullback-Leibler Divergence (Relative Entropy)

$$H(\mathbb{P},\mathbb{Q})=\sum_i p_i\ln\frac{1}{q_i}$$

$$D(\mathbb{P}||\mathbb{Q})=H(\mathbb{P},\mathbb{Q})-H(\mathbb{P})=\sum_i p_i\ln\frac{1}{q_i}-\sum_i p_i\ln\frac{1}{p_i}=\sum_i p_i\ln\frac{p_i}{q_i}$$


Gaussian case: $D(\mathbb{P}||\mathbb{Q})=\frac{(\mu_1-\mu_2)^2+\sigma^2_1-\sigma^2_2}{2\sigma^2_2}+\ln\frac{\sigma_2}{\sigma_1}$


Data Processing Inequality


Theorem

$$D_F(\mathbb{P}_g||\mathbb{Q}_g)\le D_F(\mathbb{P}||\mathbb{Q})$$



Entropy Maximizing Distribution from Cross Entropy

uniform is maximizing entropy


Distance Between Probability Measures

$$\frac{1}{2}||\mathbb{P}-\mathbb{Q}||_1=\sup_{A\in F}(\mathbb{P}(A)-\mathbb{Q}(A))=\int_{\{\frac{d\mathbb{P}}{a\mathbb{Q}}\ge 1\}} \left(\frac{d\mathbb{P}}{d\mathbb{Q}}-1\right)d\mathbb{Q}=\frac{1}{2}\int\left| \frac{d\mathbb{P}}{d\mathbb{Q}}-1\right| d\mathbb{Q}$$


Entropic Inequalities

Pinsker's inequality

$$||\mathbb{P}-\mathbb{Q}||_1\le \sqrt{2D(\mathbb{P}||\mathbb{Q})}$$


$$||mathbb{P}-\mathbb{Q}||_1=\int |f-g|d\nu$$


$$\frac{1}{2}||\mathbb{P}-\mathbb{Q}||_1\le H(\mathbb{P},\mathbb{Q})\le ||\mathbb{P}-\mathbb{Q}||^{1/2}_1$$