Capacity of Neural Networks (2): Contraction Inequality

1 minute read

In this section, we discuss the contraction inequality of Rademacher complexity. It is very useful for peeling off the loss function from the hypothesis class in the sample complexity analysis.

Contraction Inequality of $\mathfrak{R}_m$

The first inequality we consider here is that

$\begin{equation*} \mathfrak{R}_m(\phi\circ\mathcal{H}) \le L\mathfrak{R}_m(\mathcal{H})\,, \end{equation*}$

if $\phi$ is an $L$ -Lipschitz function. This inequality is attributed to (Ledoux & Talagrand, 1991) by (Bartlett & Mendelson, 2003; Mohri, Rostamizadeh, & Talwalkar, 2012). The inequality of current form is due to (Mohri, Rostamizadeh, & Talwalkar, 2012). The current form is more concise than the the original form, without assuming $\phi(0)=0$ and saving an extra factor 2 on the right hand side. The key step of proof is that

$\begin{equation} \frac{1}{2}r_1+\frac{1}{2}s_1+\frac{1}{2}\vert r_2 - s_2\vert \le \mathbb{E}\sup_{t\in T} t_1+\sigma t_2\,,\tag{1} \end{equation}$

for any $(r_1,r_2), (s_1,s_2)\in T$ . Then we can apply the Lipschitz property $\phi(s_1) - \phi(s_2) \le L\vert s_1 - s_2\vert$ . By (1), we know that

$\begin{equation*} \mathbb{E}\sup_{t\in T}\sum_{i=1}^m \sigma_i\phi(t_i) \le \mathbb{E}L\sup_{t\in T}\sum_{i=1}^m\sigma_i t_i\,.\tag{2} \end{equation*}$

Note that this one implies the contraction inequality by setting $T=\prod_i \{h(x_i)\mid h\in\mathcal{H}\}$ .

For Gaussian complexity, the contraction inequality also holds. Indeed, it holds for any symmetric random variables. Since

$\begin{align} \mathbb{E}\sup_{t\in T}\sum_{i=1}^m g_i\phi(t_i) & = \mathbb{E}_g\mathbb{E}_\sigma\sup_{t\in T}\sum_{i=1}^m \sigma_i g_i\phi(t_i) \\ & \le \mathbb{E}_g\mathbb{E}_\sigma\sup_{t\in T}\sum_{i=1}^m \sigma_i\vert g_i\vert Lt_i \\ & = \mathbb{E}_g\sup_{t\in T}\sum_{i=1}^m Lg_it_i\,. \end{align}$

The General Version of Contraction Inequality

In (Ledoux & Talagrand, 1991), the contraction inequality on Page 112 takes the form

$\begin{equation*} \mathbb{E}F\left(\sup_{t\in T}\left|\sum_{i=1}^m \sigma_i\phi(t_i)\right|\right) \le 2\mathbb{E}F\left(L\sup_{t\in T}\left|\sum_{i=1}^m \sigma_i t_i\right|\right)\,, \end{equation*}$

where $F$ is a convex increasing function and $T$ is a bounded subset of $\mathbb{R}^m$ . There are 3 differences between this form and the concise form given above. First, it considers the absolute value of the sum; second, it composes the supremum with another function $F$ , and the last, the original form requires $\phi(0)=0$ . It is not surprise that we can get rid of the factor 2 if we drop the absolute sign based on the discussion in the last note. The assumption $\phi(0)=0$ is essential for non-linear function $F$ . It can be verified by the counter-example: $T=\{0\}, F(x)=\exp(x), m=1, \phi(x) = x+1$ . The LHS equals $\frac{e+1/e}{2}$ , and the right hand side equals $1$ .

Reference

Ledoux, M., & Talagrand, M. (1991). Probability in Banach Spaces: Isoperimetry and Processes. Springer Berlin Heidelberg.
Bartlett, P. L., & Mendelson, S. (2003). Rademacher and Gaussian Complexities: Risk Bounds and Structural Results. J. Mach. Learn. Res., 3, 463–482.
Mohri, M., Rostamizadeh, A., & Talwalkar, A. (2012). Foundations of Machine Learning. The MIT Press.

Share on

Twitter Facebook LinkedIn

Yitong Sun

Capacity of Neural Networks (2): Contraction Inequality

Contraction Inequality of $\mathfrak{R}_m$

The General Version of Contraction Inequality

Reference

Share on

You May Also Enjoy

快速排序与快速选择算法

What Is Random Fourier Features Method?

如何阻止ssh重命名tmux窗口

HiDPI Chromebook上Crouton的设置

Yitong Sun

Contraction Inequality of \mathfrak{R}_m

The General Version of Contraction Inequality

Reference

Share on

You May Also Enjoy

快速排序与快速选择算法

What Is Random Fourier Features Method?

如何阻止ssh重命名tmux窗口

HiDPI Chromebook上Crouton的设置

Contraction Inequality of $\mathfrak{R}_m$