Capacity of Neural Networks (4): VC-Dimension and Rademacher Complexity
In this note, we discuss the relation between VC-dimension and Rademacher/Gaussian complexities. First, let’s look at the definition of VC-dimension.
Definition. (VC-dimension) For a hypothesis class from to , it is said that shatters the set if for any assigned to there exists in such that for all . The maximum cardinality of the set that can be shattered by is called the VC-dimension of .
For example, the linear classifier over has VC-dimension , which can be proved by induction. VC-dimension has the combinatorics nature and thus is hard to compute in many cases. Now we show that Rademacher/Gaussian complexity is equivalent to VC-dimension.
First, we set . Then the norm of this random process is bounded by , which is the result of Hoeffding inequality. Here denotes the empirical measure on . By Dudley’s integral inequality, we have
Now we need to control . Since under the empirical measure , there are at most finite many distinct functions in , we have definitely
And by Sauer’s lemma, we can control the cardinality of by , where represents the number of points and is the VC-dimension of . Since the radius of the hypothesis class under the empirical measure is only 1, the integral on the right hand side of Dudley’s inequality can be written as
And it is bounded by according to our analysis above. An dependent analysis will help us get rid of in the result. We leave this more subtle analysis in the future.