On Hoeffding’s Inequality

1 minute read

Hoeffding inequality was first proved in 1963 for independent bounded random variables. It captures the effect of cancellation among independent random variables, which results in the concentration of their linear combination. It is further extended to the sub-Gaussian random variables, which shows that the linear combination of sub-Gaussian random variables is still sub-Gaussian, and a careful analysis can show the property of the $\psi_2$ norms of sub-Gaussian random variables. We leave the investigation on the generalization to later notes.

This note will focuses on the proof of Hoeffding’s inequality. First, Let’s state the theorem. For simplicity, we just assume all the random variables are bounded by the same quantity.

Theorem. Assume $X_1,\ldots,X_n$ are independent random variables in $[m,M]$ , with $\mathbb{E}X_i=0$ . Then

$\begin{equation} \mathbb{P}\left(\left|\sum_{i=1}^n X_i\right|>t\right)<2\exp\left(-\frac{2t^2}{n(M-m)^2}\right)\,. \end{equation}$

To prove the theorem, we first take off the absolute value by considering one-sided inequality. Then we use Laplace transform to deal with the independent sum, and apply the Markov inequality to get

$\begin{equation} \mathbb{P}\left(\sum_{i=1}^n X_i>t\right)<\frac{\prod_{i=1}^n\mathbb{E}\exp{\lambda X_i}}{\exp(\lambda t)}\,. \end{equation}$

Here comes the key trick in the proof. For $x\in[m,M]$ , the function $\exp(\lambda x)$ is always less than the secant line connecting the end points, that is,

$\begin{equation} e^{\lambda x}\le\frac{e^{\lambda M}-e^{\lambda m}}{M-m}(x-m)+e^{\lambda m}\,. \end{equation}$

Hence, we get

$\begin{equation} \mathbb{E}\exp{\lambda X_i}\le\frac{e^{\lambda M}-e^{\lambda m}}{M-m}(\mathbb{E}X_i-m)+e^{\lambda m}\,, \end{equation}$

and we can apply the fact $\mathbb{E}X_i=0$ . So far, the problem has been converted to a calculus question, to give an upper bound of the function

$\begin{equation} f(\lambda)=\frac{Me^{\lambda m}-me^{\lambda M}}{M-m}\,. \end{equation}$

$\mathbb{E}X_i=0$ implies that $m<0<M$ . Let’s make a substitution of variables,

$\begin{equation} p=\frac{M}{M-m}\quad q=\frac{-m}{M-m}\quad x=\lambda(M-m)\,. \end{equation}$

Then the function becomes $g(x)=pe^{-qx}+qe^{px}$ . We further take the log over the function and get $h(x)=-qx+\ln(p+qe^x)$ . Compare this function with $x^2/8$ . $h(0)=0$ , $h'(0)=0$ .

$\begin{equation} h''(x)=\frac{pqe^x}{(p+qe^x)^2}\le\frac{1}{4}\,. \end{equation}$

So we conclude that $h(x)\le x^2/8$ for all $x>0$ . And thus,

$\begin{equation} \prod_{i=1}^n\mathbb{E}\exp(\lambda X_i)\le\exp\left(\frac{n\lambda^2(M-m)^2}{8}\right)\,. \end{equation}$

By optimizing the function

$\begin{equation} \exp\left(\frac{n\lambda^2(M-m)^2}{8}-\lambda t\right)\,, \end{equation}$

we get the desired result.

Share on

Twitter Facebook LinkedIn

Yitong Sun

On Hoeffding’s Inequality

Share on

You May Also Enjoy

快速排序与快速选择算法

What Is Random Fourier Features Method?

如何阻止ssh重命名tmux窗口

HiDPI Chromebook上Crouton的设置