On the Spectrum Decay Rate of Gaussian Kernel Operator

2 minute read

In the study of random Fourier features method, (Bach, 2017) proposed a way of generating random features so that the feature space generated randomly approximates the original RKHS very accurate with only small number of features. This idea is an analogue of column sampling to approximate the matrix multiplication.

is just like for some Hermitian operator, where is like indexed by “coordinates” and . If one samples columns of in an appropriate way and get a matrix , we can expect approximates the best rank approximation of . Similarly, we want to conclude that

The operator case is more complicated because the measure over the space for also impacts the operator as follows,

If the spectrum of decays geometrically, to achieve a error, we only need to approximate the operator with the first eigenvalues of . On the other hand, if the spectrum of decays slowly, then the approximation must have a higher rank.

So the decay rate of the kernel integral operator determined by is important for this method to work. (Bach, 2017) claims that when is Gaussian and is sub-Gaussian, then the spectrum of decays geometrically. However, there is only references (Widom, 1963) and no proofs. Actually, the results of (Widom, 1963) do not apply to the Gaussian kernel, because its Fourier transform decays too fast. But (Widom, 1964) also said that their assumptions on the decay rate may be too restrictive. And in a later work by the same author, (Widom, 1964), Gaussian kernel with uniform measure over an interval was discussed, but only in 1 dimensional case. (Bach & Jordan, 2003) also cited both references and claimed that the results in these two papers can be generalized to higher dimensional case, again without details.

The outline of the proof for the high dimensional and arbitrary measure case could be a combination of the methods using in both (Widom, 1963; Widom, 1964). First, Gaussian kernel with uniform measure over an interval can be generalized to the uniform measure over a box by taking tensor product. Then follow the general real analysis argument from box to simple functions and then to arbitrary functions.

Reference

  1. Bach, F. (2017). Breaking the Curse of Dimensionality with Convex Neural Networks. Journal of Machine Learning Research, 18, 1–53.
  2. Widom, H. (1963). Asymptotic Behavior of the Eigenvalues of Certain Integral Equations. Transactions of the American Mathematical Society, 109, 278–295.
  3. Widom, H. (1964). Asymptotic Behavior of the Eigenvalues of Certain Integral Equations. II. Archive for Rational Mechanics and Analysis, 17, 215–229.
  4. Bach, F. R., & Jordan, M. I. (2003). Kernel Independent Component Analysis. J. Mach. Learn. Res., 3, 1–48.