notes

快速排序与快速选择算法

less than 1 minute read

之所以突然会对这个问题感兴趣是因为，大概一年前，在毫无准备的情况下去参加某互联网公司的面试，被问到了这样一个问题：“给定一个长度为n的数列，如何快速的找出其中第m大的元素。假设m远小于n。”因为对排序和选择算法完全不熟悉，只知道quicksort的时间复杂度应该是，以及从数列中找出最大值的复杂度是。只好回答最简...

Product Form of Euler’s Limit Formula for the Gamma Function

1 minute read

This note is a continuation of last one. By the definition of , for , we can rewrite it in the following form,

Asymptotic Expansion of Bessel Functions For Large Argument

less than 1 minute read

This note is based on Appendix B.7 and B.8 of (Grafakos, 2008). First, we show that Bessel function can be written into the following form

Euler’s Limit Formula for the Gamma Function

less than 1 minute read

This note is based on the Appendix A.7 of (Grafakos, 2008). The gamma function is defined by the integral . For integer this defines the factorial up to . ...

Holder’s and Young’s Inequalities

1 minute read

Classical Case A quick proof of general Holder’s inequality is as follows. First, by Jensen’s inequality, we have for positive and satisfying ,

Bessel Functions and Fourier Transform of Radial Functions

1 minute read

Definition of Bessel Functions

Universal Approximation Property of RKHS and Random Features (3)

2 minute read

We have seen the universal approximation property of RKHS generated by radial kernels and of one-hidden-layer neural networks with sigmoidal activation funct...

Universal Approximation Property of RKHS and Random Features (2)

3 minute read

Universal Approximation Property of RKHS In this note, we discuss the universal approximation property of RKHS and compare with the property of neural networ...

Universal Approximation Property of RKHS and Random Features (1)

2 minute read

On a subset of , a binary symmetric function is called positive definite if the matrix is positive semi-definite for any list of elements . It is clear th...

Log Loss and Multi-Class Hinge Loss (1)

2 minute read

Hinge loss functions are mainly used in support vector machines for classification problem, while cross-entropy loss functions are ubiquitous in neural netwo...

Capacity of Neural Networks (5): Sauer’s Lemma

1 minute read

In this note we prove the Sauer’s lemma which plays the key role in establishing the connection between VC-dimension and Rademacher complexity. We use the pr...

Proof of Stirling’s Formula

less than 1 minute read

Stirling’s formula is very useful in all kinds of asymptotic analysis. Here we present one of many proofs.

Capacity of Neural Networks (4): VC-Dimension and Rademacher Complexity

1 minute read

In this note, we discuss the relation between VC-dimension and Rademacher/Gaussian complexities. First, let’s look at the definition of VC-dimension.

Capacity of Neural Networks (3): Main Results

1 minute read

In this note, we will look at the Rademacher complexity of 2-layer neural networks, and compare it with the result of kernel method. This is the main result ...

Capacity of Neural Networks (2): Contraction Inequality

1 minute read

In this section, we discuss the contraction inequality of Rademacher complexity. It is very useful for peeling off the loss function from the hypothesis clas...

Capacity of Neural Networks (1): Rademacher Complexity

2 minute read

Current sample complexity analysis of supervised learning heavily depends on the capacity analysis of the hypothesis classes. There are many different quanti...

On the Spectrum Decay Rate of Gaussian Kernel Operator

2 minute read

In the study of random Fourier features method, (Bach, 2017) proposed a way of generating random features so that the feature space generated randomly approx...

Upper and Lower Bounds of Sample Complexity of Supervised Learning

3 minute read

By the note on No-Free-Lunch, we concluded that there is no learning algorithms solving all problems with a fixed learning rate. It is because of the difficu...

On Hoeffding’s Inequality

1 minute read

Hoeffding inequality was first proved in 1963 for independent bounded random variables. It captures the effect of cancellation among independent random varia...

Weird Function Learned by A Simple Rule

2 minute read

(Proof of Problem 6.3 of (Devroye, Györfi, & Lugosi, 1997)

On No Free Lunch Theorem

1 minute read

This note discusses the implications of the celebrated No-Free-Lunch Theorem on kernel SVM, RF-SVM and neural network.

On Inverse Dependence On Training Data Size

2 minute read

This is a note on the paper SVM optimization: inverse dependence on training data size ((Shalev-Shwartz & Srebro, 2008)).