Universal Approximation Property of RKHS and Random Features (1)

2 minute read

On a subset $\mathcal{X}$ of $\mathbb{R}^d$ , a binary symmetric function $k(x,x')$ is called positive definite if the matrix $[k(x_i,x_j)]_{ij}$ is positive semi-definite for any list of elements $\{ x_i \}$ . It is clear that if $k(x,x'):= x^{\intercal}x$ , it is symmetric and positive definite. It is not surprise that function $k$ is used as some sort of inner product. Such a function is called a kernel function. And it determines a class of functions in the following way,

$\begin{equation} H_k = \left\{ f(x)=\sum_{i=1}^N c_i k(x,x_i) : N\in\mathbb{N}, \{x_i\}\subset\mathcal{X}, \{ c_i \}\subset\mathbb{R} \right\} \end{equation}\,.$

We can further define a binary functional on $H_k$ ,

$\begin{equation} \langle f,g \rangle_\mathcal{k} = \sum_{i,j} a_i b_j k(x_i,x'_j)\,, \end{equation}$

for $f(x)=\sum a_i k(x,x_i)$ and $g(x) = \sum b_j k(x,x'_j)$ . We can verify that such a binary functional is an inner product since $k(x,x')$ is positive semi-definite. So $(H_k,\langle\cdot,\cdot\rangle)$ is a linear space equipped with an inner product. Then we consider the completion of $H_k$ under the inner product and we get a Hilbert space $\mathcal{H}_k = \overline{H_k}$ . This is called the reproducing kernel Hilbert space (RKHS) of $k$ . The ‘reproducing’ indicates that $f(x)=\langle f,k(\cdot,x)\rangle$ , that is, the evaluation functional is bounded. If $k(x,x)$ is bounded over $\mathcal{X}$ and $k(\cdot,x)$ is continuous for all $x$ , then $\mathcal{H}_k$ consists of continuous functions, because the convergence under $\Vert\cdot\Vert_k$ implies the convergence under $\Vert\cdot\Vert_\infty$ .

We can see that the definition of RKHS has nothing to do with the underlying measure on $\mathcal{X}$ . But if $\mathcal{X}$ is compact and there is a measure $\mu$ such that $\mu(\mathcal{X}) < \infty$ , then all the functions in $\mathcal{H}_k$ are $L^2$ integrable, since for any $f\in\mathcal{H}_k$ ,

$\begin{align} \int_\mathcal{X} f^2(x)\,\mathrm{d}\mu(x) & = \int_\mathcal{X} \langle f,k(\cdot,x)\rangle^2\,\mathrm{d}\mu(x) \\ & \le \int_\mathcal{X} \Vert f\Vert_k^2 \Vert k(\cdot,x)\Vert_k^2\,\mathrm{d}\mu(x) \\ & = \Vert f\Vert_k^2 \int_\mathcal{X} k(x,x)\,\mathrm{d}\mu(x)\,. \end{align}$

$k$ also defines a Hermitian integral operator from $L^2(\mathcal{X},\mu)$ into itself,

$\begin{align} \Sigma: L^2(\mathcal{X},\mu) & \longrightarrow L^2(\mathcal{X},\mu) \\ f(x) & \longrightarrow \int_\mathcal{X} k(y,x)f(y)\,\mathrm{d}\mu(x)\,. \end{align}$

It can be shown that when $\Sigma$ is well-defined, compact and positive semi-definite. The compactness is proved by showing that the image of unit ball of $L^2(\mathcal{X},\mu)$ under $\Sigma$ is equi-continuous and Ascoli-Arzela’s theorem can be applied. The positivity can be shown via the Riemann sum approximation and the positivity of $k$ . See 3.1 in (Cucker & Smale, 2002).

Since $\Sigma$ is compact, we can apply spectral theorem and show that $\sum_{i=1}^\infty \lambda_i e_i(x)e_i(x')=k(x,x')$ absolutely and uniformly, where $\lambda_i > 0$ and $\{ e_i \}$ are orthonormal basis of $L^2(\mathcal{X},\mu)$ (see (Lax, 2002)). This is called Mercer’s theorem. By Mercer’s theorem, we can further show that

$\begin{align} \mathrm{tr}(\Sigma) & = \sum_i \lambda_i \\ & = \int_\mathcal{X} \lambda_i e_i^2(x) \,\mathrm{d}\mu(x) \\ & = \int_\mathcal{X} k(x,x)\,\mathrm{d}\mu(x) \\ & \le \sup_x \vert k(x,x)\vert \mu(\mathcal{X})\,. \end{align}$

Therefore we can define a Hilbert space $H_k$ by

$\begin{equation} H_k:=\left\{ \sum_i a_i e_i \in L^2 : \sum_i \left(\frac{a_i}{\sqrt{\lambda_i}}\right)^2 < \infty \right\}\,, \end{equation}$

with the inner product

$\begin{equation} \left\langle \sum_i a_i e_i, \sum_i b_i e_i \right\rangle_{H_k} = \sum_i \frac{a_i b_i}{\lambda_i}\,. \end{equation}$

Note that $H_k$ is still a linear subspace of $L^2$ , but not a subspace in consideration of the inner product. And $e_i$ s are still orthogonal but not orthonormal. We can further define an operator $\Sigma^{1/2}$ by

$\begin{align} \Sigma^{1/2}:L^2(\mathcal{X},\mu) & \longrightarrow H_k \\ \sum_i a_i e_i & \longrightarrow \sum_i a_i \sqrt{\lambda_i}e_i\,. \end{align}$

It is easy to see that this is an isomorphism between $L^2$ and the ‘smaller’ $H_k$ . And in the end of this note, we want to show that $H_k = \mathcal{H}_k$ . First of all, $k(\cdot,s) = \sum_i \lambda_i e_i(\cdot)e_i(s)$ and $\sum_i \lambda_i e_i^2(s) = k(s,s) < \infty$ implies that $k(\cdot,s)\in H_k$ . Second, it is easy to verify that their inner products coincide. And third, for any $f=\sum_i a_i e_i\in H_k$ ,

$\begin{align} \left\langle f,k(\cdot,s) \right\rangle_{H_k} & = \sum_i a_i e_i(s) \\ & = f(s)\,, \end{align}$

so if $\langle f,k(\cdot,s)\rangle = 0$ for all $s\in \mathcal{X}$ , $f$ is $0$ constantly. This implies that $\mathcal{H}_k$ is dense in $H_k$ .

Reference

Cucker, F., & Smale, S. (2002). On the Mathematical Foundations of Learning. Bulletin of the American Mathematical Society, 39, 1–49.
Lax, P. D. (2002). Functional Analysis. Wiley.

Share on

Twitter Facebook LinkedIn

Yitong Sun

Universal Approximation Property of RKHS and Random Features (1)

Reference

Share on

You May Also Enjoy

快速排序与快速选择算法

What Is Random Fourier Features Method?

如何阻止ssh重命名tmux窗口

HiDPI Chromebook上Crouton的设置