statistical learning

What Is Random Fourier Features Method?

2 minute read

Random Fourier features method, or more general random features method is a method to help transform data which are not linearly separable to linearly separa...

Universal Approximation Property of RKHS and Random Features (3)

2 minute read

We have seen the universal approximation property of RKHS generated by radial kernels and of one-hidden-layer neural networks with sigmoidal activation funct...

Universal Approximation Property of RKHS and Random Features (2)

3 minute read

Universal Approximation Property of RKHS In this note, we discuss the universal approximation property of RKHS and compare with the property of neural networ...

Universal Approximation Property of RKHS and Random Features (1)

2 minute read

On a subset of , a binary symmetric function is called positive definite if the matrix is positive semi-definite for any list of elements . It is clear th...

Log Loss and Multi-Class Hinge Loss (1)

2 minute read

Hinge loss functions are mainly used in support vector machines for classification problem, while cross-entropy loss functions are ubiquitous in neural netwo...

Capacity of Neural Networks (5): Sauer’s Lemma

1 minute read

In this note we prove the Sauer’s lemma which plays the key role in establishing the connection between VC-dimension and Rademacher complexity. We use the pr...

Capacity of Neural Networks (4): VC-Dimension and Rademacher Complexity

1 minute read

In this note, we discuss the relation between VC-dimension and Rademacher/Gaussian complexities. First, let’s look at the definition of VC-dimension.

Capacity of Neural Networks (3): Main Results

1 minute read

In this note, we will look at the Rademacher complexity of 2-layer neural networks, and compare it with the result of kernel method. This is the main result ...

Capacity of Neural Networks (2): Contraction Inequality

1 minute read

In this section, we discuss the contraction inequality of Rademacher complexity. It is very useful for peeling off the loss function from the hypothesis clas...

Capacity of Neural Networks (1): Rademacher Complexity

2 minute read

Current sample complexity analysis of supervised learning heavily depends on the capacity analysis of the hypothesis classes. There are many different quanti...

Upper and Lower Bounds of Sample Complexity of Supervised Learning

3 minute read

By the note on No-Free-Lunch, we concluded that there is no learning algorithms solving all problems with a fixed learning rate. It is because of the difficu...

Weird Function Learned by A Simple Rule

2 minute read

(Proof of Problem 6.3 of (Devroye, Györfi, & Lugosi, 1997)

On No Free Lunch Theorem

1 minute read

This note discusses the implications of the celebrated No-Free-Lunch Theorem on kernel SVM, RF-SVM and neural network.