To explain Mercer's theorem, we first consider an important special case; see below for a more general formulation.
A kernel, in this context, is a symmetric continuous function
for all finite sequences of points x1, ..., xn of [a, b] and all choices of real numbers c1, ..., cn. Note that the term "positive-definite" is well-established in literature despite the weak inequality in the definition.[2][3]
The fundamental characterization of stationary positive-definite kernels (where ) is given by Bochner's theorem. It states that a continuous function is positive-definite if and only if it can be expressed as the Fourier transform of a finite non-negative measure :
This spectral representation reveals the connection between positive definiteness and harmonic analysis, providing a stronger and more direct characterization of positive definiteness than the abstract definition in terms of inequalities when the kernel is stationary, e.g, when it can be expressed as a 1-variable function of the distance between points rather than the 2-variable function of the positions of pairs of points.
We assume can range through the space
of real-valued square-integrable functionsL2[a, b]; however, in many cases the associated RKHS can be strictly larger than L2[a, b]. Since TK is a linear operator, the eigenvalues and eigenfunctions of TK exist.
Theorem. Suppose K is a continuous symmetric positive-definite kernel. Then there is an orthonormal basis
{ei}i of L2[a, b] consisting of eigenfunctions of TK such that the corresponding
sequence of eigenvalues {λi}i is nonnegative. The eigenfunctions corresponding to non-zero eigenvalues are continuous on [a, b] and K has the representation
where the convergence is absolute and uniform.
Details
We now explain in greater detail the structure of the proof of
Mercer's theorem, particularly how it relates to spectral theory of compact operators.
TK is a non-negative symmetric compact operator on L2[a,b]; moreover K(x, x) ≥ 0.
To show compactness, show that the image of the unit ball of L2[a,b] under TK is equicontinuous and apply Ascoli's theorem, to show that the image of the unit ball is relatively compact in C([a,b]) with the uniform norm and a fortiori in L2[a,b].
Now apply the spectral theorem for compact operators on Hilbert
spaces to TK to show the existence of the
orthonormal basis {ei}i of
L2[a,b]
If λi ≠ 0, the eigenvector (eigenfunction) ei is seen to be continuous on [a,b]. Now
which shows that the sequence
converges absolutely and uniformly to a kernel K0 which is easily seen to define the same operator as the kernel K. Hence K=K0 from which Mercer's theorem follows.
Finally, to show non-negativity of the eigenvalues one can write and expressing the right hand side as an integral well-approximated by its Riemann sums, which are non-negative
by positive-definiteness of K, implying , implying .
Trace
The following is immediate:
Theorem. Suppose K is a continuous symmetric positive-definite kernel; TK has a sequence of nonnegative
eigenvalues {λi}i. Then
This shows that the operator TK is a trace class operator and
The first generalization replaces the interval [a, b] with any compact Hausdorff space and Lebesgue measure on [a, b] is replaced by a finite countably additive measure μ on the Borel algebra of X whose support is X. This means that μ(U) > 0 for any nonempty open subset U of X.
A recent generalization replaces these conditions by the following: the set X is a first-countable topological space endowed with a Borel (complete) measure μ. X is the support of μ and, for all x in X, there is an open set U containing x and having finite measure. Then essentially the same result holds:
Theorem. Suppose K is a continuous symmetric positive-definite kernel on X. If the function κ is L1μ(X), where κ(x)=K(x,x), for all x in X, then there is an orthonormal set
{ei}i of L2μ(X) consisting of eigenfunctions of TK such that corresponding
sequence of eigenvalues {λi}i is nonnegative. The eigenfunctions corresponding to non-zero eigenvalues are continuous on X and K has the representation
where the convergence is absolute and uniform on compact subsets of X.
The next generalization deals with representations of measurable kernels.
Let (X, M, μ) be a σ-finite measure space. An L2 (or square-integrable) kernel on X is a function
L2 kernels define a bounded operator TK by the formula
TK is a compact operator (actually it is even a Hilbert–Schmidt operator). If the kernel K is symmetric, by the spectral theorem, TK has an orthonormal basis of eigenvectors. Those eigenvectors that correspond to non-zero eigenvalues can be arranged in a sequence {ei}i (regardless of separability).
Theorem. If K is a symmetric positive-definite kernel on (X, M, μ), then
where the convergence in the L2 norm. Note that when continuity of the kernel is not assumed, the expansion no longer converges uniformly.
^Bartlett, Peter (2008). "Reproducing Kernel Hilbert Spaces"(PDF). Lecture notes of CS281B/Stat241B Statistical Learning Theory. University of California at Berkeley.
Adriaan Zaanen, Linear Analysis, North Holland Publishing Co., 1960,
Ferreira, J. C., Menegatto, V. A., Eigenvalues of integral operators defined by smooth positive definite kernels, Integral equation and Operator Theory, 64 (2009), no. 1, 61–81. (Gives the generalization of Mercer's theorem for metric spaces. The result is easily adapted to first countable topological spaces)
Konrad Jörgens, Linear integral operators, Pitman, Boston, 1982,
Robert Ash, Information Theory, Dover Publications, 1990,
Mercer, J. (1909), "Functions of positive and negative type and their connection with the theory of integral equations", Philosophical Transactions of the Royal Society A, 209 (441–458): 415–446, Bibcode:1909RSPTA.209..415M, doi:10.1098/rsta.1909.0016,