Capacity of Neural Networks (2): Contraction Inequality
In this section, we discuss the contraction inequality of Rademacher complexity. It is very useful for peeling off the loss function from the hypothesis class in the sample complexity analysis.
Contraction Inequality of
The first inequality we consider here is that
if is an -Lipschitz function. This inequality is attributed to (Ledoux & Talagrand, 1991) by (Bartlett & Mendelson, 2003; Mohri, Rostamizadeh, & Talwalkar, 2012). The inequality of current form is due to (Mohri, Rostamizadeh, & Talwalkar, 2012). The current form is more concise than the the original form, without assuming and saving an extra factor 2 on the right hand side. The key step of proof is that
for any . Then we can apply the Lipschitz property . By (1), we know that
Note that this one implies the contraction inequality by setting .
For Gaussian complexity, the contraction inequality also holds. Indeed, it holds for any symmetric random variables. Since
The General Version of Contraction Inequality
In (Ledoux & Talagrand, 1991), the contraction inequality on Page 112 takes the form
where is a convex increasing function and is a bounded subset of . There are 3 differences between this form and the concise form given above. First, it considers the absolute value of the sum; second, it composes the supremum with another function , and the last, the original form requires . It is not surprise that we can get rid of the factor 2 if we drop the absolute sign based on the discussion in the last note. The assumption is essential for non-linear function . It can be verified by the counter-example: . The LHS equals , and the right hand side equals .
Reference
- Ledoux, M., & Talagrand, M. (1991). Probability in Banach Spaces: Isoperimetry and Processes. Springer Berlin Heidelberg.
- Bartlett, P. L., & Mendelson, S. (2003). Rademacher and Gaussian Complexities: Risk Bounds and Structural Results. J. Mach. Learn. Res., 3, 463–482.
- Mohri, M., Rostamizadeh, A., & Talwalkar, A. (2012). Foundations of Machine Learning. The MIT Press.