Capacity of Neural Networks (4): VC-Dimension and Rademacher Complexity

1 minute read

In this note, we discuss the relation between VC-dimension and Rademacher/Gaussian complexities. First, let’s look at the definition of VC-dimension.

Definition. (VC-dimension) For a hypothesis class $\mathcal{F}$ from $X$ to $\{-1,1\}$ , it is said that $\mathcal{F}$ shatters the set $\{x_i\}_{i=1}^m$ if for any $\{y_i\}_m\in\{-1,1\}^m$ assigned to $\{x_i\}$ there exists $f$ in $\mathcal{F}$ such that $f(x_i) = y_i$ for all $i$ . The maximum cardinality of the set $\{x_i\}$ that can be shattered by $\mathcal{F}$ is called the VC-dimension of $\mathcal{F}$ .

For example, the linear classifier over $\mathbb{R}^d$ has VC-dimension $d+1$ , which can be proved by induction. VC-dimension has the combinatorics nature and thus is hard to compute in many cases. Now we show that Rademacher/Gaussian complexity is equivalent to VC-dimension.

First, we set $X_f = \sum_{i=1}^m \frac{1}{m}\sigma_i f(x_i)$ . Then the $\psi_2$ norm of this random process is bounded by $K\Vert f-g\Vert_{2,\mathbb{P}_m}$ , which is the result of Hoeffding inequality. Here $\mathbb{P}_m$ denotes the empirical measure on $\{x_i\}_{i=1}^m$ . By Dudley’s integral inequality, we have

$\begin{equation} \mathbb{E}\sup_{f\in\mathcal{F}} X_f \le CK\int_0^\infty \sqrt{\log\mathcal{N} (\mathcal{F},d,\epsilon)}\operatorname{d}\epsilon\,. \end{equation}$

Now we need to control $\mathcal{N}(\mathcal{F},d,\epsilon)$ . Since under the empirical measure $\mathbb{P}_m$ , there are at most finite many distinct functions in $\mathcal{F}$ , we have definitely

$\begin{equation} \mathcal{N}(\mathcal{F},d,\epsilon) \le \vert\mathcal{F}\vert\,. \end{equation}$

And by Sauer’s lemma, we can control the cardinality of $\mathcal{F}$ by $(em/D)^D$ , where $m$ represents the number of points and $D$ is the VC-dimension of $\mathcal{F}$ . Since the radius of the hypothesis class under the empirical measure is only 1, the integral on the right hand side of Dudley’s inequality can be written as

$\begin{equation} \int_0^1 \sqrt{\log\mathcal{N} (\mathcal{F},d,\epsilon)}\operatorname{d}\epsilon\,. \end{equation}$

And it is bounded by $\sqrt{D\log(em/D)}$ according to our analysis above. An $\epsilon$ dependent analysis will help us get rid of $\log(m)$ in the result. We leave this more subtle analysis in the future.

Share on

Twitter Facebook LinkedIn

Yitong Sun

Capacity of Neural Networks (4): VC-Dimension and Rademacher Complexity

Share on

You May Also Enjoy

快速排序与快速选择算法

What Is Random Fourier Features Method?

如何阻止ssh重命名tmux窗口

HiDPI Chromebook上Crouton的设置