Concentration Inequalities Homework
Problem 1: Prove Lemma 1
Lemma 1: Let $Y$ be a random variable with $E[Y] = 0$ and $Y \in [a, b]$. Define the cumulant generating function $\psi(\theta) = \log E[e^{\theta Y}]$. Then, for all $\theta \in \mathbb{R}$,
1. The second derivative satisfies:
$$\psi''(\theta) \leq \frac{(b - a)^2}{4}.$$
2. Consequently, the function $\psi(\theta)$ is bounded by:
$$\psi(\theta) \leq \frac{\theta^2 (b - a)^2}{8}.$$
3. Therefore, $Y$ is sub-Gaussian with parameter $\sigma = \frac{b - a}{2}$.
Proof:
1. Bounding $\psi''(\theta)$:
The second derivative of $\psi(\theta)$ is the variance of $Y$ under the tilted measure:
$$\psi''(\theta) = \frac{d^2}{d\theta^2} \log E[e^{\theta Y}] = \operatorname{Var}_\theta(Y).$$
Since $Y \in [a, b]$, the variance of $Y$ under any probability measure is maximized when $Y$ takes values at the endpoints $a$ and $b$ with equal probability. The maximum variance is:
$$\operatorname{Var}(Y) \leq \left( \frac{b - a}{2} \right)^2.$$
Therefore,
$$\psi''(\theta) \leq \frac{(b - a)^2}{4}.$$
2. Bounding $\psi(\theta)$:
Since $\psi(0) = 0$ and $\psi'(0) = E[Y] = 0$, we can integrate $\psi''(\theta)$ twice to obtain an upper bound for $\psi(\theta)$:
$$\psi'(\theta) = \int_0^\theta \psi''(s) \, ds \leq \int_0^\theta \frac{(b - a)^2}{4} \, ds = \frac{(b - a)^2 \theta}{4}.$$
Integrating again:
$$\psi(\theta) = \int_0^\theta \psi'(s) \, ds \leq \int_0^\theta \frac{(b - a)^2 s}{4} \, ds = \frac{(b - a)^2 \theta^2}{8}. $$
3. Concluding Sub-Gaussianity:
A random variable $Y$ is sub-Gaussian with parameter $\sigma$ if:
$$\psi(\theta) \leq \frac{\theta^2 \sigma^2}{2}.$$
From the bound above, we have:
$$\psi(\theta) \leq \frac{\theta^2 (b - a)^2}{8} = \frac{\theta^2}{2} \left( \frac{b - a}{2} \right)^2.$$
Thus, $Y$ is sub-Gaussian with parameter $\sigma = \frac{b - a}{2}$.
Problem 2: Analyzing Tail Probabilities for a Binomial Distribution
Given a binomial distribution $S \sim \text{Binomial}(n, p)$ with parameters $(n, p) = (10, 0.2), (100, 0.2), (1000, 0.2), (10000, 0.2)$. We aim to compute:
$$P\left(|S_n - np| \geq 0.1 \, np\right).$$
Hoeffding Bound (Theorem 2)
For Bernoulli trials, each $X_i \in \{0, 1\}$, so $a_i = 0$ and $b_i = 1$. The Hoeffding bound is:
$$P\left(|S - np| \geq t\right) \leq 2 \exp\left( -\frac{2t^2}{\sum_{i=1}^n (b_i - a_i)^2} \right).$$
Here, $t = 0.1 \, np$. Substituting the values:
$$P\left(|S - np| \geq 0.1 \, np\right) \leq 2 \exp\left( -\frac{2 \times (0.1 \, np)^2}{n} \right) = 2 \exp\left( -\frac{0.02 \, n p^2}{1} \right) = 2 \exp\left( -0.02 \, n p^2 \right).$$
Given $p = 0.2$ and $p^2 = 0.04$:
$$P\left(|S - np| \geq 0.1 \, np\right) \leq 2 \exp\left( -0.0008 \, n \right).$$
Cramer-Chernoff Bound (Theorem 3)
The Cramer-Chernoff bound for $0 < \delta < 1$ and $\delta = 0.1$ is:
$$P\left( S \geq (1 + \delta) \, np \right) \leq \exp\left( -\frac{\delta^2 \, np}{3} \right),$$
$$P\left( S \leq (1 - \delta) \, np \right) \leq \exp\left( -\frac{\delta^2 \, np}{2} \right).$$
Calculating the upper tail:
$$P\left( S \geq 1.1 \, np \right) \leq \exp\left( -\frac{0.01 \, np}{3} \right) = \exp\left( -\frac{np}{300} \right).$$
Calculating the lower tail:
$$P\left( S \leq 0.9 \, np \right) \leq \exp\left( -\frac{0.01 \, np}{2} \right) = \exp\left( -\frac{np}{200} \right).$$
Total bound:
$$P\left(|S - np| \geq 0.1 \, np\right) \leq \exp\left( -\frac{np}{300} \right) + \exp\left( -\frac{np}{200} \right).$$
Approximating Tail Probabilities Using Python
To approximate the tail probabilities $P\left(|S - np| \geq 0.1 \, np\right)$ for different values of $n$, we can use Python's `scipy.stats` library.
Comparing Calculated Values
Below is a comparison table of the Hoeffding bound, Cramer-Chernoff bound, and the approximated tail probabilities obtained from Python:
$n$ , Hoeffding Bound, Cramer-Chernoff Bound, Approximated Probability
10, $2 \times e^{-0.008} \approx 1.984$, $e^{-0.0067} + e^{-0.01} \approx 1.984$, $\approx 1.000000$
100, $2 \times e^{-0.08} \approx 1.846$, $e^{-0.0667} + e^{-0.1} \approx 1.840$, $\approx 0.708054$
1000, $2 \times e^{-0.8} \approx 0.898$, $e^{-0.6667} + e^{-1} \approx 0.881$ , $\approx 0.123032$
10000, $2 \times e^{-8} \approx 0.00067$, $e^{-6.6667} + e^{-10} \approx 0.0013$ , $\approx 0.000001$
- Hoeffding and Cramer-Chernoff Bounds:
- Both bounds are not tight for smaller values of $n$ (e.g., $n = 10$), often exceeding $1$, which is not meaningful as probabilities cannot exceed $1$.
- As $n$ increases, both bounds decrease exponentially, becoming more meaningful and tighter.
- Approximated Probabilities:
- For small $n$, the approximated probabilities are significantly lower than the bounds, indicating that the bounds are conservative.
- As $n$ increases, the approximated probabilities decrease, aligning more closely with the bounds.
- For very large $n$ (e.g., $n = 10000$), the approximated probability is negligible, demonstrating the effectiveness of the concentration inequalities.
- Conclusion:
- Cramer-Chernoff bound generally provides a tighter bound compared to the Hoeffding bound for this scenario.
- Both bounds become more useful and accurate as the number of trials $n$ increases.
- Exact or approximated probabilities (especially for large $n$) are significantly lower than the bounds, highlighting the conservativeness of Hoeffding and Cramer-Chernoff bounds.
Problem 3: Prove Hoeffding's Inequality
Hoeffding's Inequality: Let $X_1, X_2, \dots, X_n$ be independent random variables with $a_i \leq X_i \leq b_i$ almost surely for each $i = 1, 2, \dots, n$. Define the sum $S = X_1 + X_2 + \dots + X_n$. Then, for any $t > 0$,
$$P(S - E[S] \geq t) \leq \exp\left( -\frac{2t^2}{\sum_{i=1}^n (b_i - a_i)^2} \right),$$
and similarly,
$$P(S - E[S] \leq -t) \leq \exp\left( -\frac{2t^2}{\sum_{i=1}^n (b_i - a_i)^2} \right).$$
This inequality provides an exponential bound on the probability that the sum $S$ deviates from its expected value by at least $t$.
Proof of Hoeffding's Inequality:
To establish Hoeffding's inequality, we utilize the method of exponential generating functions combined with the Chernoff bound technique.
1. Exponential Markov's Inequality:
For any $\theta > 0$, apply Markov's inequality to the random variable $e^{\theta (S - E[S])}$:
$$P(S - E[S] \geq t) = P\left(e^{\theta (S - E[S])} \geq e^{\theta t}\right) \leq \frac{E\left[e^{\theta (S - E[S])}\right]}{e^{\theta t}}.$$
Therefore,
$$P(S - E[S] \geq t) \leq e^{-\theta t} E\left[e^{\theta (S - E[S])}\right].$$
2. Bounding the Moment Generating Function (MGF):
Since the random variables $X_i$ are independent, the MGF of the sum $S$ can be expressed as the product of the individual MGFs:
$$E\left[e^{\theta (S - E[S])}\right] = \prod_{i=1}^n E\left[e^{\theta (X_i - E[X_i])}\right].$$
For each $X_i$, which is bounded $a_i \leq X_i \leq b_i$, we can bound its MGF using the following lemma.
Lemma: If a random variable $Y$ satisfies $a \leq Y \leq b$ almost surely and $E[Y] = \mu$, then for any $\theta \in \mathbb{R}$,
$$E\left[e^{\theta (Y - \mu)}\right] \leq \exp\left( \frac{\theta^2 (b - a)^2}{8} \right).$$
Proof of Lemma:
- Consider the function $f(Y) = e^{\theta (Y - \mu)}$.
- Since $Y$ is bounded, $f(Y)$ is convex.
- By Hoeffding's lemma (a specific case of the more general Bernstein's inequality), the bound holds.
Applying this lemma to each $X_i$, we get:
$$E\left[e^{\theta (X_i - E[X_i])}\right] \leq \exp\left( \frac{\theta^2 (b_i - a_i)^2}{8} \right).$$
Therefore, the MGF of $S - E[S]$ satisfies:
$$E\left[e^{\theta (S - E[S])}\right] \leq \exp\left( \sum_{i=1}^n \frac{\theta^2 (b_i - a_i)^2}{8} \right) = \exp\left( \frac{\theta^2 \sum_{i=1}^n (b_i - a_i)^2}{8} \right).$$
3. Optimizing the Bound:
Substituting the bound on the MGF back into the probability bound:
$$P(S - E[S] \geq t) \leq \exp\left( -\theta t + \frac{\theta^2 \sum_{i=1}^n (b_i - a_i)^2}{8} \right).$$
To obtain the tightest bound, we minimize the exponent with respect to $\theta$. Let:
$$f(\theta) = -\theta t + \frac{\theta^2 \sum_{i=1}^n (b_i - a_i)^2}{8}.$$
Taking the derivative of $f(\theta)$ with respect to $\theta$ and setting it to zero:
$$f'(\theta) = -t + \frac{\theta \sum_{i=1}^n (b_i - a_i)^2}{4} = 0 \implies \theta = \frac{4t}{\sum_{i=1}^n (b_i - a_i)^2}.$$
Substituting this optimal $\theta$ back into the exponent:
$$f(\theta) = -\frac{4t^2}{\sum_{i=1}^n (b_i - a_i)^2} + \frac{16t^2}{8 \sum_{i=1}^n (b_i - a_i)^2} = -\frac{2t^2}{\sum_{i=1}^n (b_i - a_i)^2}. $$
Therefore,
$$P(S - E[S] \geq t) \leq \exp\left( -\frac{2t^2}{\sum_{i=1}^n (b_i - a_i)^2} \right).$$
Similarly, by applying the same argument to $-S$, we obtain:
$$P(S - E[S] \leq -t) \leq \exp\left( -\frac{2t^2}{\sum_{i=1}^n (b_i - a_i)^2} \right). $$
4. Conclusion:
Combining both the upper and lower tail bounds, Hoeffding's inequality is established:
$$P(|S - E[S]| \geq t) \leq 2 \exp\left( -\frac{2t^2}{\sum_{i=1}^n (b_i - a_i)^2} \right).$$