Capacity of Neural Networks (3): Main Results
In this note, we will look at the Rademacher complexity of 2-layer neural networks, and compare it with the result of kernel method. This is the main result of (Bartlett & Mendelson, 2003). We rephrase it as follows.
Theorem. Suppose that has Lipschitz constant . Define the class computed by a two-layer neural network with weights norm constraints as
Then for ,
where .
Compared with the original statement, we drop the assumption , which is not necessary in our concise form of contraction inequality. And we change the constraint over the weights of the bottom layer from 1-norm to 2-norm. This gives us the result of a better form and simpler proof.
With 2-norm constraint, the proof of this theorem can be easily obtained by the contraction inequality.
This case is actually similar with the Rademacher complexity of the kernel method, where instead of , controls the complexity.
In the original theorem, . Then (1) is less than . To control this quantity, Sudakov-Fernique’s inequality is used, which is a variant of Slepian’s inequality
Theorem. (Sudakov-Fernique’s inequality) Let and be two Gaussian processes. Assume that for all , we have
Then,
To use this theorem to prove the 1-norm constraint case, we set and . Then,
And according to Sudakov-Fernique’s inequality, we have
The Rademacher complexity can be easily obtained by using the fact that .
Reference
- Bartlett, P. L., & Mendelson, S. (2003). Rademacher and Gaussian Complexities: Risk Bounds and Structural Results. J. Mach. Learn. Res., 3, 463–482.