Concentration Inequalities Homework

Problem 1: Prove Lemma 1

Lemma 1: Let $Y$ be a random variable with $E[Y] = 0$ and $Y \in [a, b]$. Define the cumulant generating function $\psi(\theta) = \log E[e^{\theta Y}]$. Then, for all $\theta \in \mathbb{R}$,


1. The second derivative satisfies:

   $$\psi''(\theta) \leq \frac{(b - a)^2}{4}.$$

2. Consequently, the function $\psi(\theta)$ is bounded by:

   $$\psi(\theta) \leq \frac{\theta^2 (b - a)^2}{8}.$$

3. Therefore, $Y$ is sub-Gaussian with parameter $\sigma = \frac{b - a}{2}$.


Proof:


1. Bounding $\psi''(\theta)$:

  The second derivative of $\psi(\theta)$ is the variance of $Y$ under the tilted measure:

   $$\psi''(\theta) = \frac{d^2}{d\theta^2} \log E[e^{\theta Y}] = \operatorname{Var}_\theta(Y).$$

   Since $Y \in [a, b]$, the variance of $Y$ under any probability measure is maximized when $Y$ takes values at the endpoints $a$ and $b$ with equal probability. The maximum variance is:

   $$\operatorname{Var}(Y) \leq \left( \frac{b - a}{2} \right)^2.$$

   Therefore,

   $$\psi''(\theta) \leq \frac{(b - a)^2}{4}.$$


2. Bounding $\psi(\theta)$:

   Since $\psi(0) = 0$ and $\psi'(0) = E[Y] = 0$, we can integrate $\psi''(\theta)$ twice to obtain an upper bound for $\psi(\theta)$:

   $$\psi'(\theta) = \int_0^\theta \psi''(s) \, ds \leq \int_0^\theta \frac{(b - a)^2}{4} \, ds = \frac{(b - a)^2 \theta}{4}.$$

   Integrating again:

   $$\psi(\theta) = \int_0^\theta \psi'(s) \, ds \leq \int_0^\theta \frac{(b - a)^2 s}{4} \, ds = \frac{(b - a)^2 \theta^2}{8}. $$


3. Concluding Sub-Gaussianity:

   A random variable $Y$ is sub-Gaussian with parameter $\sigma$ if:

   $$\psi(\theta) \leq \frac{\theta^2 \sigma^2}{2}.$$

   From the bound above, we have:

   $$\psi(\theta) \leq \frac{\theta^2 (b - a)^2}{8} = \frac{\theta^2}{2} \left( \frac{b - a}{2} \right)^2.$$

   Thus, $Y$ is sub-Gaussian with parameter $\sigma = \frac{b - a}{2}$.



Problem 2: Analyzing Tail Probabilities for a Binomial Distribution

Given a binomial distribution $S \sim \text{Binomial}(n, p)$ with parameters $(n, p) = (10, 0.2), (100, 0.2), (1000, 0.2), (10000, 0.2)$. We aim to compute:

$$P\left(|S_n - np| \geq 0.1 \, np\right).$$


Hoeffding Bound (Theorem 2)

For Bernoulli trials, each $X_i \in \{0, 1\}$, so $a_i = 0$ and $b_i = 1$. The Hoeffding bound is:

$$P\left(|S - np| \geq t\right) \leq 2 \exp\left( -\frac{2t^2}{\sum_{i=1}^n (b_i - a_i)^2} \right).$$

Here, $t = 0.1 \, np$. Substituting the values:

$$P\left(|S - np| \geq 0.1 \, np\right) \leq 2 \exp\left( -\frac{2 \times (0.1 \, np)^2}{n} \right) = 2 \exp\left( -\frac{0.02 \, n p^2}{1} \right) = 2 \exp\left( -0.02 \, n p^2 \right).$$

Given $p = 0.2$ and $p^2 = 0.04$:

$$P\left(|S - np| \geq 0.1 \, np\right) \leq 2 \exp\left( -0.0008 \, n \right).$$


Cramer-Chernoff Bound (Theorem 3)

The Cramer-Chernoff bound for $0 < \delta < 1$ and $\delta = 0.1$ is:

$$P\left( S \geq (1 + \delta) \, np \right) \leq \exp\left( -\frac{\delta^2 \, np}{3} \right),$$

$$P\left( S \leq (1 - \delta) \, np \right) \leq \exp\left( -\frac{\delta^2 \, np}{2} \right).$$

Calculating the upper tail:

$$P\left( S \geq 1.1 \, np \right) \leq \exp\left( -\frac{0.01 \, np}{3} \right) = \exp\left( -\frac{np}{300} \right).$$

Calculating the lower tail:

$$P\left( S \leq 0.9 \, np \right) \leq \exp\left( -\frac{0.01 \, np}{2} \right) = \exp\left( -\frac{np}{200} \right).$$

Total bound:

$$P\left(|S - np| \geq 0.1 \, np\right) \leq \exp\left( -\frac{np}{300} \right) + \exp\left( -\frac{np}{200} \right).$$


Approximating Tail Probabilities Using Python

To approximate the tail probabilities $P\left(|S - np| \geq 0.1 \, np\right)$ for different values of $n$, we can use Python's `scipy.stats` library. 


Comparing Calculated Values

Below is a comparison table of the Hoeffding bound, Cramer-Chernoff bound, and the approximated tail probabilities obtained from Python:


$n$ ,      Hoeffding Bound,   Cramer-Chernoff Bound,  Approximated Probability 


10, $2 \times e^{-0.008} \approx 1.984$,  $e^{-0.0067} + e^{-0.01} \approx 1.984$,  $\approx 1.000000$ 

100,  $2 \times e^{-0.08} \approx 1.846$,   $e^{-0.0667} + e^{-0.1} \approx 1.840$,   $\approx 0.708054$ 

1000,  $2 \times e^{-0.8} \approx 0.898$,    $e^{-0.6667} + e^{-1} \approx 0.881$ ,    $\approx 0.123032$ 

10000,  $2 \times e^{-8} \approx 0.00067$,   $e^{-6.6667} + e^{-10} \approx 0.0013$ ,  $\approx 0.000001$ 


- Hoeffding and Cramer-Chernoff Bounds:

  - Both bounds are not tight for smaller values of $n$ (e.g., $n = 10$), often exceeding $1$, which is not meaningful as probabilities cannot exceed $1$.

  - As $n$ increases, both bounds decrease exponentially, becoming more meaningful and tighter.


- Approximated Probabilities:

  - For small $n$, the approximated probabilities are significantly lower than the bounds, indicating that the bounds are conservative.

  - As $n$ increases, the approximated probabilities decrease, aligning more closely with the bounds.

  - For very large $n$ (e.g., $n = 10000$), the approximated probability is negligible, demonstrating the effectiveness of the concentration inequalities.


- Conclusion:

  - Cramer-Chernoff bound generally provides a tighter bound compared to the Hoeffding bound for this scenario.

  - Both bounds become more useful and accurate as the number of trials $n$ increases.

  - Exact or approximated probabilities (especially for large $n$) are significantly lower than the bounds, highlighting the conservativeness of Hoeffding and Cramer-Chernoff bounds.




Problem 3: Prove Hoeffding's Inequality

Hoeffding's Inequality: Let $X_1, X_2, \dots, X_n$ be independent random variables with $a_i \leq X_i \leq b_i$ almost surely for each $i = 1, 2, \dots, n$. Define the sum $S = X_1 + X_2 + \dots + X_n$. Then, for any $t > 0$,

$$P(S - E[S] \geq t) \leq \exp\left( -\frac{2t^2}{\sum_{i=1}^n (b_i - a_i)^2} \right),$$

and similarly,

$$P(S - E[S] \leq -t) \leq \exp\left( -\frac{2t^2}{\sum_{i=1}^n (b_i - a_i)^2} \right).$$

This inequality provides an exponential bound on the probability that the sum $S$ deviates from its expected value by at least $t$.


Proof of Hoeffding's Inequality:

To establish Hoeffding's inequality, we utilize the method of exponential generating functions combined with the Chernoff bound technique.


1. Exponential Markov's Inequality:

For any $\theta > 0$, apply Markov's inequality to the random variable $e^{\theta (S - E[S])}$:

$$P(S - E[S] \geq t) = P\left(e^{\theta (S - E[S])} \geq e^{\theta t}\right) \leq \frac{E\left[e^{\theta (S - E[S])}\right]}{e^{\theta t}}.$$

   Therefore,

 $$P(S - E[S] \geq t) \leq e^{-\theta t} E\left[e^{\theta (S - E[S])}\right].$$


2. Bounding the Moment Generating Function (MGF):

Since the random variables $X_i$ are independent, the MGF of the sum $S$ can be expressed as the product of the individual MGFs:

$$E\left[e^{\theta (S - E[S])}\right] = \prod_{i=1}^n E\left[e^{\theta (X_i - E[X_i])}\right].$$


   For each $X_i$, which is bounded $a_i \leq X_i \leq b_i$, we can bound its MGF using the following lemma.

Lemma: If a random variable $Y$ satisfies $a \leq Y \leq b$ almost surely and $E[Y] = \mu$, then for any $\theta \in \mathbb{R}$,

   $$E\left[e^{\theta (Y - \mu)}\right] \leq \exp\left( \frac{\theta^2 (b - a)^2}{8} \right).$$

Proof of Lemma:

   - Consider the function $f(Y) = e^{\theta (Y - \mu)}$.

   - Since $Y$ is bounded, $f(Y)$ is convex.

   - By Hoeffding's lemma (a specific case of the more general Bernstein's inequality), the bound holds.

   Applying this lemma to each $X_i$, we get:

   $$E\left[e^{\theta (X_i - E[X_i])}\right] \leq \exp\left( \frac{\theta^2 (b_i - a_i)^2}{8} \right).$$

   Therefore, the MGF of $S - E[S]$ satisfies:

 $$E\left[e^{\theta (S - E[S])}\right] \leq \exp\left( \sum_{i=1}^n \frac{\theta^2 (b_i - a_i)^2}{8} \right) = \exp\left( \frac{\theta^2 \sum_{i=1}^n (b_i - a_i)^2}{8} \right).$$


3. Optimizing the Bound:

Substituting the bound on the MGF back into the probability bound:

$$P(S - E[S] \geq t) \leq \exp\left( -\theta t + \frac{\theta^2 \sum_{i=1}^n (b_i - a_i)^2}{8} \right).$$

   To obtain the tightest bound, we minimize the exponent with respect to $\theta$. Let:

   $$f(\theta) = -\theta t + \frac{\theta^2 \sum_{i=1}^n (b_i - a_i)^2}{8}.$$

   Taking the derivative of $f(\theta)$ with respect to $\theta$ and setting it to zero:

   $$f'(\theta) = -t + \frac{\theta \sum_{i=1}^n (b_i - a_i)^2}{4} = 0 \implies \theta = \frac{4t}{\sum_{i=1}^n (b_i - a_i)^2}.$$

   Substituting this optimal $\theta$ back into the exponent:

   $$f(\theta) = -\frac{4t^2}{\sum_{i=1}^n (b_i - a_i)^2} + \frac{16t^2}{8 \sum_{i=1}^n (b_i - a_i)^2} = -\frac{2t^2}{\sum_{i=1}^n (b_i - a_i)^2}. $$

   Therefore,

   $$P(S - E[S] \geq t) \leq \exp\left( -\frac{2t^2}{\sum_{i=1}^n (b_i - a_i)^2} \right).$$

   Similarly, by applying the same argument to $-S$, we obtain:

   $$P(S - E[S] \leq -t) \leq \exp\left( -\frac{2t^2}{\sum_{i=1}^n (b_i - a_i)^2} \right). $$


4. Conclusion:

   Combining both the upper and lower tail bounds, Hoeffding's inequality is established:

   $$P(|S - E[S]| \geq t) \leq 2 \exp\left( -\frac{2t^2}{\sum_{i=1}^n (b_i - a_i)^2} \right).$$