快速排序与快速选择算法
之所以突然会对这个问题感兴趣是因为,大概一年前,在毫无准备的情况下去参加某互联网公司的面试,被问到了这样一个问题:“给定一个长度为n的数列,如何快速的找出其中第m大的元素。假设m远小于n。”因为对排序和选择算法完全不熟悉,只知道quicksort的时间复杂度应该是,以及从数列中找出最大值的复杂度是 。只好回答最简...
之所以突然会对这个问题感兴趣是因为,大概一年前,在毫无准备的情况下去参加某互联网公司的面试,被问到了这样一个问题:“给定一个长度为n的数列,如何快速的找出其中第m大的元素。假设m远小于n。”因为对排序和选择算法完全不熟悉,只知道quicksort的时间复杂度应该是,以及从数列中找出最大值的复杂度是 。只好回答最简...
This note is a continuation of last one. By the definition of , for , we can rewrite it in the following form,
This note is based on Appendix B.7 and B.8 of (Grafakos, 2008). First, we show that Bessel function can be written into the following form
This note is based on the Appendix A.7 of (Grafakos, 2008). The gamma function is defined by the integral . For integer this defines the factorial up to . ...
Classical Case A quick proof of general Holder’s inequality is as follows. First, by Jensen’s inequality, we have for positive and satisfying ,
Definition of Bessel Functions
We have seen the universal approximation property of RKHS generated by radial kernels and of one-hidden-layer neural networks with sigmoidal activation funct...
Universal Approximation Property of RKHS In this note, we discuss the universal approximation property of RKHS and compare with the property of neural networ...
On a subset of , a binary symmetric function is called positive definite if the matrix is positive semi-definite for any list of elements . It is clear th...
Hinge loss functions are mainly used in support vector machines for classification problem, while cross-entropy loss functions are ubiquitous in neural netwo...
In this note we prove the Sauer’s lemma which plays the key role in establishing the connection between VC-dimension and Rademacher complexity. We use the pr...
Stirling’s formula is very useful in all kinds of asymptotic analysis. Here we present one of many proofs.
In this note, we discuss the relation between VC-dimension and Rademacher/Gaussian complexities. First, let’s look at the definition of VC-dimension.
In this note, we will look at the Rademacher complexity of 2-layer neural networks, and compare it with the result of kernel method. This is the main result ...
In this section, we discuss the contraction inequality of Rademacher complexity. It is very useful for peeling off the loss function from the hypothesis clas...
Current sample complexity analysis of supervised learning heavily depends on the capacity analysis of the hypothesis classes. There are many different quanti...
In the study of random Fourier features method, (Bach, 2017) proposed a way of generating random features so that the feature space generated randomly approx...
By the note on No-Free-Lunch, we concluded that there is no learning algorithms solving all problems with a fixed learning rate. It is because of the difficu...
Hoeffding inequality was first proved in 1963 for independent bounded random variables. It captures the effect of cancellation among independent random varia...
(Proof of Problem 6.3 of (Devroye, Györfi, & Lugosi, 1997)
This note discusses the implications of the celebrated No-Free-Lunch Theorem on kernel SVM, RF-SVM and neural network.
This is a note on the paper SVM optimization: inverse dependence on training data size ((Shalev-Shwartz & Srebro, 2008)).