Concentration Inequalities Homework

November 15, 2024

Concentration Inequalities Homework

Problem 1: Prove Lemma 1

Lemma 1: Let $Y$ be a random variable with $E[Y] = 0$ and $Y \in [a, b]$. Define the cumulant generating function $\psi(\theta) = \log E[e^{\theta Y}]$. Then, for all $\theta \in \mathbb{R}$,

1. The second derivative satisfies:

$$\psi''(\theta) \leq \frac{(b - a)^2}{4}.$$

2. Consequently, the function $\psi(\theta)$ is bounded by:

$$\psi(\theta) \leq \frac{\theta^2 (b - a)^2}{8}.$$

3. Therefore, $Y$ is sub-Gaussian with parameter $\sigma = \frac{b - a}{2}$.

Proof:

1. Bounding $\psi''(\theta)$:

The second derivative of $\psi(\theta)$ is the variance of $Y$ under the tilted measure:

$$\psi''(\theta) = \frac{d^2}{d\theta^2} \log E[e^{\theta Y}] = \operatorname{Var}_\theta(Y).$$

Since $Y \in [a, b]$, the variance of $Y$ under any probability measure is maximized when $Y$ takes values at the endpoints $a$ and $b$ with equal probability. The maximum variance is:

$$\operatorname{Var}(Y) \leq \left( \frac{b - a}{2} \right)^2.$$

Therefore,

$$\psi''(\theta) \leq \frac{(b - a)^2}{4}.$$

2. Bounding $\psi(\theta)$:

Since $\psi(0) = 0$ and $\psi'(0) = E[Y] = 0$, we can integrate $\psi''(\theta)$ twice to obtain an upper bound for $\psi(\theta)$:

$$\psi'(\theta) = \int_0^\theta \psi''(s) \, ds \leq \int_0^\theta \frac{(b - a)^2}{4} \, ds = \frac{(b - a)^2 \theta}{4}.$$

Integrating again:

$$\psi(\theta) = \int_0^\theta \psi'(s) \, ds \leq \int_0^\theta \frac{(b - a)^2 s}{4} \, ds = \frac{(b - a)^2 \theta^2}{8}. $$

3. Concluding Sub-Gaussianity:

A random variable $Y$ is sub-Gaussian with parameter $\sigma$ if:

$$\psi(\theta) \leq \frac{\theta^2 \sigma^2}{2}.$$

From the bound above, we have:

$$\psi(\theta) \leq \frac{\theta^2 (b - a)^2}{8} = \frac{\theta^2}{2} \left( \frac{b - a}{2} \right)^2.$$

Thus, $Y$ is sub-Gaussian with parameter $\sigma = \frac{b - a}{2}$.

Problem 2: Analyzing Tail Probabilities for a Binomial Distribution

Given a binomial distribution $S \sim \text{Binomial}(n, p)$ with parameters $(n, p) = (10, 0.2), (100, 0.2), (1000, 0.2), (10000, 0.2)$. We aim to compute:

$$P\left(|S_n - np| \geq 0.1 \, np\right).$$

Hoeffding Bound (Theorem 2)

For Bernoulli trials, each $X_i \in \{0, 1\}$, so $a_i = 0$ and $b_i = 1$. The Hoeffding bound is:

$$P\left(|S - np| \geq t\right) \leq 2 \exp\left( -\frac{2t^2}{\sum_{i=1}^n (b_i - a_i)^2} \right).$$

Here, $t = 0.1 \, np$. Substituting the values:

$$P\left(|S - np| \geq 0.1 \, np\right) \leq 2 \exp\left( -\frac{2 \times (0.1 \, np)^2}{n} \right) = 2 \exp\left( -\frac{0.02 \, n p^2}{1} \right) = 2 \exp\left( -0.02 \, n p^2 \right).$$

Given $p = 0.2$ and $p^2 = 0.04$:

$$P\left(|S - np| \geq 0.1 \, np\right) \leq 2 \exp\left( -0.0008 \, n \right).$$

Cramer-Chernoff Bound (Theorem 3)

The Cramer-Chernoff bound for $0 < \delta < 1$ and $\delta = 0.1$ is:

$$P\left( S \geq (1 + \delta) \, np \right) \leq \exp\left( -\frac{\delta^2 \, np}{3} \right),$$

$$P\left( S \leq (1 - \delta) \, np \right) \leq \exp\left( -\frac{\delta^2 \, np}{2} \right).$$

Calculating the upper tail:

$$P\left( S \geq 1.1 \, np \right) \leq \exp\left( -\frac{0.01 \, np}{3} \right) = \exp\left( -\frac{np}{300} \right).$$

Calculating the lower tail:

$$P\left( S \leq 0.9 \, np \right) \leq \exp\left( -\frac{0.01 \, np}{2} \right) = \exp\left( -\frac{np}{200} \right).$$

Total bound:

$$P\left(|S - np| \geq 0.1 \, np\right) \leq \exp\left( -\frac{np}{300} \right) + \exp\left( -\frac{np}{200} \right).$$

Approximating Tail Probabilities Using Python

To approximate the tail probabilities $P\left(|S - np| \geq 0.1 \, np\right)$ for different values of $n$, we can use Python's `scipy.stats` library.

Comparing Calculated Values

Below is a comparison table of the Hoeffding bound, Cramer-Chernoff bound, and the approximated tail probabilities obtained from Python:

$n$ , Hoeffding Bound, Cramer-Chernoff Bound, Approximated Probability

10, $2 \times e^{-0.008} \approx 1.984$, $e^{-0.0067} + e^{-0.01} \approx 1.984$, $\approx 1.000000$

100, $2 \times e^{-0.08} \approx 1.846$, $e^{-0.0667} + e^{-0.1} \approx 1.840$, $\approx 0.708054$

1000, $2 \times e^{-0.8} \approx 0.898$, $e^{-0.6667} + e^{-1} \approx 0.881$ , $\approx 0.123032$

10000, $2 \times e^{-8} \approx 0.00067$, $e^{-6.6667} + e^{-10} \approx 0.0013$ , $\approx 0.000001$

- Hoeffding and Cramer-Chernoff Bounds:

- Both bounds are not tight for smaller values of $n$ (e.g., $n = 10$), often exceeding $1$, which is not meaningful as probabilities cannot exceed $1$.

- As $n$ increases, both bounds decrease exponentially, becoming more meaningful and tighter.

- Approximated Probabilities:

- For small $n$, the approximated probabilities are significantly lower than the bounds, indicating that the bounds are conservative.

- As $n$ increases, the approximated probabilities decrease, aligning more closely with the bounds.

- For very large $n$ (e.g., $n = 10000$), the approximated probability is negligible, demonstrating the effectiveness of the concentration inequalities.

- Conclusion:

- Cramer-Chernoff bound generally provides a tighter bound compared to the Hoeffding bound for this scenario.

- Both bounds become more useful and accurate as the number of trials $n$ increases.

- Exact or approximated probabilities (especially for large $n$) are significantly lower than the bounds, highlighting the conservativeness of Hoeffding and Cramer-Chernoff bounds.

Problem 3: Prove Hoeffding's Inequality

Hoeffding's Inequality: Let $X_1, X_2, \dots, X_n$ be independent random variables with $a_i \leq X_i \leq b_i$ almost surely for each $i = 1, 2, \dots, n$. Define the sum $S = X_1 + X_2 + \dots + X_n$. Then, for any $t > 0$,

$$P(S - E[S] \geq t) \leq \exp\left( -\frac{2t^2}{\sum_{i=1}^n (b_i - a_i)^2} \right),$$

and similarly,

$$P(S - E[S] \leq -t) \leq \exp\left( -\frac{2t^2}{\sum_{i=1}^n (b_i - a_i)^2} \right).$$

This inequality provides an exponential bound on the probability that the sum $S$ deviates from its expected value by at least $t$.

Proof of Hoeffding's Inequality:

To establish Hoeffding's inequality, we utilize the method of exponential generating functions combined with the Chernoff bound technique.

1. Exponential Markov's Inequality:

For any $\theta > 0$, apply Markov's inequality to the random variable $e^{\theta (S - E[S])}$:

$$P(S - E[S] \geq t) = P\left(e^{\theta (S - E[S])} \geq e^{\theta t}\right) \leq \frac{E\left[e^{\theta (S - E[S])}\right]}{e^{\theta t}}.$$

Therefore,

$$P(S - E[S] \geq t) \leq e^{-\theta t} E\left[e^{\theta (S - E[S])}\right].$$

2. Bounding the Moment Generating Function (MGF):

Since the random variables $X_i$ are independent, the MGF of the sum $S$ can be expressed as the product of the individual MGFs:

$$E\left[e^{\theta (S - E[S])}\right] = \prod_{i=1}^n E\left[e^{\theta (X_i - E[X_i])}\right].$$

For each $X_i$, which is bounded $a_i \leq X_i \leq b_i$, we can bound its MGF using the following lemma.

Lemma: If a random variable $Y$ satisfies $a \leq Y \leq b$ almost surely and $E[Y] = \mu$, then for any $\theta \in \mathbb{R}$,

$$E\left[e^{\theta (Y - \mu)}\right] \leq \exp\left( \frac{\theta^2 (b - a)^2}{8} \right).$$

Proof of Lemma:

- Consider the function $f(Y) = e^{\theta (Y - \mu)}$.

- Since $Y$ is bounded, $f(Y)$ is convex.

- By Hoeffding's lemma (a specific case of the more general Bernstein's inequality), the bound holds.

Applying this lemma to each $X_i$, we get:

$$E\left[e^{\theta (X_i - E[X_i])}\right] \leq \exp\left( \frac{\theta^2 (b_i - a_i)^2}{8} \right).$$

Therefore, the MGF of $S - E[S]$ satisfies:

$$E\left[e^{\theta (S - E[S])}\right] \leq \exp\left( \sum_{i=1}^n \frac{\theta^2 (b_i - a_i)^2}{8} \right) = \exp\left( \frac{\theta^2 \sum_{i=1}^n (b_i - a_i)^2}{8} \right).$$

3. Optimizing the Bound:

Substituting the bound on the MGF back into the probability bound:

$$P(S - E[S] \geq t) \leq \exp\left( -\theta t + \frac{\theta^2 \sum_{i=1}^n (b_i - a_i)^2}{8} \right).$$

To obtain the tightest bound, we minimize the exponent with respect to $\theta$. Let:

$$f(\theta) = -\theta t + \frac{\theta^2 \sum_{i=1}^n (b_i - a_i)^2}{8}.$$

Taking the derivative of $f(\theta)$ with respect to $\theta$ and setting it to zero:

$$f'(\theta) = -t + \frac{\theta \sum_{i=1}^n (b_i - a_i)^2}{4} = 0 \implies \theta = \frac{4t}{\sum_{i=1}^n (b_i - a_i)^2}.$$

Substituting this optimal $\theta$ back into the exponent:

$$f(\theta) = -\frac{4t^2}{\sum_{i=1}^n (b_i - a_i)^2} + \frac{16t^2}{8 \sum_{i=1}^n (b_i - a_i)^2} = -\frac{2t^2}{\sum_{i=1}^n (b_i - a_i)^2}. $$

Therefore,

$$P(S - E[S] \geq t) \leq \exp\left( -\frac{2t^2}{\sum_{i=1}^n (b_i - a_i)^2} \right).$$

Similarly, by applying the same argument to $-S$, we obtain:

$$P(S - E[S] \leq -t) \leq \exp\left( -\frac{2t^2}{\sum_{i=1}^n (b_i - a_i)^2} \right). $$

4. Conclusion:

Combining both the upper and lower tail bounds, Hoeffding's inequality is established:

$$P(|S - E[S]| \geq t) \leq 2 \exp\left( -\frac{2t^2}{\sum_{i=1}^n (b_i - a_i)^2} \right).$$

Search This Blog

BALISADA